The Wayback Machine - https://web.archive.org/web/20220422180229/https://github.com/NVIDIA/thrust/issues/1663
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve diagnostic when the (default) CUDA device system is used in non-CUDA builds. #1663

Open
jeffhammond opened this issue Apr 20, 2022 · 9 comments
Labels
backend: CUDA enhancement good first issue P2: nice to have repro: verified

Comments

@jeffhammond
Copy link

@jeffhammond jeffhammond commented Apr 20, 2022

I can't tell where the problem is. Why does the version of Thrust that I get with stdpar=gpu work, whereas the version from GitHub doesn't?

Thanks

MCVE

#include <thrust/universal_vector.h>

thrust::universal_vector<float> m_x ;

void AllocateNodePersistent(int numNode)
{
  m_x.resize(numNode);
}

It works with stdpar=gpu

$ nvc++ -std=c++17 -stdpar=gpu -c bug.cc && echo OKAY
OKAY

It fails with 25547a4

$ nvc++ -std=c++17 -I/home/jhammond/NVIDIA/thrust -c bug.cc
"/home/jhammond/NVIDIA/thrust/thrust/system/detail/generic/for_each.h", line 65: error: static assertion failed with "unimplemented for this system"
    THRUST_STATIC_ASSERT_MSG(
    ^
          detected during:
            instantiation of "InputIterator thrust::system::detail::generic::for_each_n(thrust::execution_policy<DerivedPolicy> &, InputIterator, Size, UnaryFunction) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t, UnaryFunction=thrust::detail::device_generate_functor<thrust::detail::fill_functor<float>>]" at line 67 of "/home/jhammond/NVIDIA/thrust/thrust/detail/for_each.inl"
            instantiation of "InputIterator thrust::for_each_n(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, Size, UnaryFunction) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t, UnaryFunction=thrust::detail::device_generate_functor<thrust::detail::fill_functor<float>>]" at line 93 of "/home/jhammond/NVIDIA/thrust/thrust/system/detail/generic/generate.inl"
            instantiation of "OutputIterator thrust::system::detail::generic::generate_n(thrust::execution_policy<ExecutionPolicy> &, OutputIterator, Size, Generator) [with ExecutionPolicy=thrust::cuda_cub::tag, OutputIterator=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t, Generator=thrust::detail::fill_functor<float>]" at line 56 of "/home/jhammond/NVIDIA/thrust/thrust/detail/generate.inl"
            instantiation of "OutputIterator thrust::generate_n(const thrust::detail::execution_policy_base<DerivedPolicy> &, OutputIterator, Size, Generator) [with DerivedPolicy=thrust::cuda_cub::tag, OutputIterator=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t, Generator=thrust::detail::fill_functor<float>]" at line 42 of "/home/jhammond/NVIDIA/thrust/thrust/system/detail/generic/fill.h"
            instantiation of "OutputIterator thrust::system::detail::generic::fill_n(thrust::execution_policy<DerivedPolicy> &, OutputIterator, Size, const T &) [with DerivedPolicy=thrust::cuda_cub::tag, OutputIterator=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t, T=float]" at line 51 of "/home/jhammond/NVIDIA/thrust/thrust/detail/fill.inl"
            [ 4 instantiation contexts not shown ]
            instantiation of "thrust::detail::disable_if<thrust::detail::allocator_traits_detail::needs_default_construct_via_allocator<Allocator, thrust::detail::pointer_element<Pointer>::type>::value, void>::type thrust::detail::allocator_traits_detail::default_construct_range(Allocator &, Pointer, Size) [with Allocator=thrust::cuda_cub::universal_allocator<float>, Pointer=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t]" at line 106 of "/home/jhammond/NVIDIA/thrust/thrust/detail/allocator/default_construct_range.inl"
            instantiation of "void thrust::detail::default_construct_range(Allocator &, Pointer, Size) [with Allocator=thrust::cuda_cub::universal_allocator<float>, Pointer=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t]" at line 254 of "/home/jhammond/NVIDIA/thrust/thrust/detail/contiguous_storage.inl"
            instantiation of "void thrust::detail::contiguous_storage<T, Alloc>::default_construct_n(thrust::detail::contiguous_storage<T, Alloc>::iterator, thrust::detail::contiguous_storage<T, Alloc>::size_type) [with T=float, Alloc=thrust::cuda_cub::universal_allocator<float>]" at line 877 of "/home/jhammond/NVIDIA/thrust/thrust/detail/vector_base.inl"
            instantiation of "void thrust::detail::vector_base<T, Alloc>::append(thrust::detail::vector_base<T, Alloc>::size_type) [with T=float, Alloc=thrust::cuda_cub::universal_allocator<float>]" at line 321 of "/home/jhammond/NVIDIA/thrust/thrust/detail/vector_base.inl"
            instantiation of "void thrust::detail::vector_base<T, Alloc>::resize(thrust::detail::vector_base<T, Alloc>::size_type) [with T=float, Alloc=thrust::cuda_cub::universal_allocator<float>]" at line 7 of "bug.cc"

1 error detected in the compilation of "bug.cc".
@brycelelbach
Copy link
Collaborator

@brycelelbach brycelelbach commented Apr 20, 2022

You're compiling bug.cc. That means C++ mode, CUDA mode isn't enabled. Do -cuda or name the file bug.cu.

@brycelelbach brycelelbach added the unverified label Apr 20, 2022
@brycelelbach brycelelbach reopened this Apr 20, 2022
@brycelelbach
Copy link
Collaborator

@brycelelbach brycelelbach commented Apr 20, 2022

Although... this raises the question - why would this be broken for the default backend?

@brycelelbach brycelelbach removed the unverified label Apr 20, 2022
@brycelelbach brycelelbach changed the title why does universal_vector resize only work with stdpar=gpu? universal_vector::resize fails with the default non-CUDA backend Apr 20, 2022
@allisonvacanti allisonvacanti added the compiler: nvc++ label Apr 22, 2022
@allisonvacanti
Copy link
Collaborator

@allisonvacanti allisonvacanti commented Apr 22, 2022

why would this be broken for the default backend?

By default, Thrust selects HOST=CPP and DEVICE=CUDA. This can be changed by explicitly defining the THRUST_HOST_SYSTEM and THRUST_DEVICE_SYSTEM macros.

Perhaps nvc++ should set THRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CPP when CUDA and multicore are disabled. @dkolsen-pgi does this sound reasonable to you?

@dkolsen-pgi
Copy link
Collaborator

@dkolsen-pgi dkolsen-pgi commented Apr 22, 2022

NVC++ shouldn't set THRUST_DEVICE_SYSTEM or related macros when neither CUDA nor stdpar modes are enabled. When NVC++ is in normal C++ mode, it should treat Thrust the same as any other compiler (that is not NVCC).

@brycelelbach
Copy link
Collaborator

@brycelelbach brycelelbach commented Apr 22, 2022

@allisonvacanti
Copy link
Collaborator

@allisonvacanti allisonvacanti commented Apr 22, 2022

When NVC++ is in normal C++ mode, it should treat Thrust the same as any other compiler (that is not NVCC).

This is exactly how it works with other C++ compilers. Trying to build the above reproducer with g++ fails in the exact same way, unless the device system is explicitly set to CPP. The only way to fix that on our side is to change the default system, which would be very disruptive.

universal_vector should work with all backends

universal_vector is tested on all combinations of host/device backends and there haven't been any issues. The problem in this case is that the default CUDA backend is being used without compiler support for CUDA.

@allisonvacanti
Copy link
Collaborator

@allisonvacanti allisonvacanti commented Apr 22, 2022

I was conflating the nvc++ stdpar usage of Thrust with the actual issue here, which is trying to use Thrust functionality in user code when building with nvc++. I agree that nvc++ shouldn't be modifying any Thrust configs in this case, as it's the user's responsibility to configure this.

@jeffhammond There are two ways to fix this, depending on what you need:

  1. As Bryce mentioned, enabling CUDA mode will fix this if your goal is to use universal_vector's memory on a CUDA device.
  2. If your goal is to not target CUDA but still use universal_vector, you'll need to explicitly set THRUST_DEVICE_SYSTEM to use CPP, OMP, or TBB instead of CUDA.

Let me know if there's anything I missed or if we can close this.

@jeffhammond
Copy link
Author

@jeffhammond jeffhammond commented Apr 22, 2022

Why isn't not setting CUDA mode sufficient to enable a non-CUDA back end, or, at the very least, trigger a warning that it creates a scenario where Thrust doesn't work?

@allisonvacanti
Copy link
Collaborator

@allisonvacanti allisonvacanti commented Apr 22, 2022

Why isn't not setting CUDA mode sufficient to enable a non-CUDA backend?

It's up to the user to pick the system that they want to use -- we can't predict whether they want to use serial, OpenMP, or TBB for non-CUDA builds.

or, at the very least, trigger a warning that it creates a scenario where Thrust doesn't work?

It does emit a diagnostic stating that functionality is missing for a currently selected system:

static assertion failed with "unimplemented for this system"

I agree that this could be improved. I'll update this issue to reflect that we need a better diagnostic in this case.

@allisonvacanti allisonvacanti changed the title universal_vector::resize fails with the default non-CUDA backend Improve diagnostic when the (default) CUDA device system is used in non-CUDA builds. Apr 22, 2022
@allisonvacanti allisonvacanti added enhancement P2: nice to have good first issue repro: verified backend: CUDA and removed compiler: nvc++ labels Apr 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: CUDA enhancement good first issue P2: nice to have repro: verified
4 participants