Improve diagnostic when the (default) CUDA device system is used in non-CUDA builds. #1663

jeffhammond · 2022-04-20T11:16:30Z

I can't tell where the problem is. Why does the version of Thrust that I get with stdpar=gpu work, whereas the version from GitHub doesn't?

Thanks

MCVE

#include <thrust/universal_vector.h>

thrust::universal_vector<float> m_x ;

void AllocateNodePersistent(int numNode)
{
  m_x.resize(numNode);
}

It works with stdpar=gpu

$ nvc++ -std=c++17 -stdpar=gpu -c bug.cc && echo OKAY
OKAY

It fails with `25547a4`

$ nvc++ -std=c++17 -I/home/jhammond/NVIDIA/thrust -c bug.cc
"/home/jhammond/NVIDIA/thrust/thrust/system/detail/generic/for_each.h", line 65: error: static assertion failed with "unimplemented for this system"
    THRUST_STATIC_ASSERT_MSG(
    ^
          detected during:
            instantiation of "InputIterator thrust::system::detail::generic::for_each_n(thrust::execution_policy<DerivedPolicy> &, InputIterator, Size, UnaryFunction) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t, UnaryFunction=thrust::detail::device_generate_functor<thrust::detail::fill_functor<float>>]" at line 67 of "/home/jhammond/NVIDIA/thrust/thrust/detail/for_each.inl"
            instantiation of "InputIterator thrust::for_each_n(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, Size, UnaryFunction) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t, UnaryFunction=thrust::detail::device_generate_functor<thrust::detail::fill_functor<float>>]" at line 93 of "/home/jhammond/NVIDIA/thrust/thrust/system/detail/generic/generate.inl"
            instantiation of "OutputIterator thrust::system::detail::generic::generate_n(thrust::execution_policy<ExecutionPolicy> &, OutputIterator, Size, Generator) [with ExecutionPolicy=thrust::cuda_cub::tag, OutputIterator=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t, Generator=thrust::detail::fill_functor<float>]" at line 56 of "/home/jhammond/NVIDIA/thrust/thrust/detail/generate.inl"
            instantiation of "OutputIterator thrust::generate_n(const thrust::detail::execution_policy_base<DerivedPolicy> &, OutputIterator, Size, Generator) [with DerivedPolicy=thrust::cuda_cub::tag, OutputIterator=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t, Generator=thrust::detail::fill_functor<float>]" at line 42 of "/home/jhammond/NVIDIA/thrust/thrust/system/detail/generic/fill.h"
            instantiation of "OutputIterator thrust::system::detail::generic::fill_n(thrust::execution_policy<DerivedPolicy> &, OutputIterator, Size, const T &) [with DerivedPolicy=thrust::cuda_cub::tag, OutputIterator=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t, T=float]" at line 51 of "/home/jhammond/NVIDIA/thrust/thrust/detail/fill.inl"
            [ 4 instantiation contexts not shown ]
            instantiation of "thrust::detail::disable_if<thrust::detail::allocator_traits_detail::needs_default_construct_via_allocator<Allocator, thrust::detail::pointer_element<Pointer>::type>::value, void>::type thrust::detail::allocator_traits_detail::default_construct_range(Allocator &, Pointer, Size) [with Allocator=thrust::cuda_cub::universal_allocator<float>, Pointer=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t]" at line 106 of "/home/jhammond/NVIDIA/thrust/thrust/detail/allocator/default_construct_range.inl"
            instantiation of "void thrust::detail::default_construct_range(Allocator &, Pointer, Size) [with Allocator=thrust::cuda_cub::universal_allocator<float>, Pointer=thrust::pointer<float, thrust::cuda_cub::tag, float &, thrust::use_default>, Size=std::size_t]" at line 254 of "/home/jhammond/NVIDIA/thrust/thrust/detail/contiguous_storage.inl"
            instantiation of "void thrust::detail::contiguous_storage<T, Alloc>::default_construct_n(thrust::detail::contiguous_storage<T, Alloc>::iterator, thrust::detail::contiguous_storage<T, Alloc>::size_type) [with T=float, Alloc=thrust::cuda_cub::universal_allocator<float>]" at line 877 of "/home/jhammond/NVIDIA/thrust/thrust/detail/vector_base.inl"
            instantiation of "void thrust::detail::vector_base<T, Alloc>::append(thrust::detail::vector_base<T, Alloc>::size_type) [with T=float, Alloc=thrust::cuda_cub::universal_allocator<float>]" at line 321 of "/home/jhammond/NVIDIA/thrust/thrust/detail/vector_base.inl"
            instantiation of "void thrust::detail::vector_base<T, Alloc>::resize(thrust::detail::vector_base<T, Alloc>::size_type) [with T=float, Alloc=thrust::cuda_cub::universal_allocator<float>]" at line 7 of "bug.cc"

1 error detected in the compilation of "bug.cc".

The text was updated successfully, but these errors were encountered:

brycelelbach · 2022-04-20T17:52:57Z

You're compiling bug.cc. That means C++ mode, CUDA mode isn't enabled. Do -cuda or name the file bug.cu.

brycelelbach · 2022-04-20T17:54:28Z

Although... this raises the question - why would this be broken for the default backend?

allisonvacanti · 2022-04-22T16:06:52Z

why would this be broken for the default backend?

By default, Thrust selects HOST=CPP and DEVICE=CUDA. This can be changed by explicitly defining the THRUST_HOST_SYSTEM and THRUST_DEVICE_SYSTEM macros.

Perhaps nvc++ should set THRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CPP when CUDA and multicore are disabled. @dkolsen-pgi does this sound reasonable to you?

dkolsen-pgi · 2022-04-22T16:32:19Z

NVC++ shouldn't set THRUST_DEVICE_SYSTEM or related macros when neither CUDA nor stdpar modes are enabled. When NVC++ is in normal C++ mode, it should treat Thrust the same as any other compiler (that is not NVCC).

brycelelbach · 2022-04-22T16:36:22Z

Allison - regardless of how Thrust got into that mode, `universal_vector` should work with all backends, so this is probably a bug of some sort.

…

On Fri, Apr 22, 2022 at 12:32 PM David Olsen ***@***.***> wrote: NVC++ shouldn't set THRUST_DEVICE_SYSTEM or related macros when neither CUDA nor stdpar modes are enabled. When NVC++ is in normal C++ mode, it should treat Thrust the same as any other compiler (that is not NVCC). — Reply to this email directly, view it on GitHub <#1663 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADBG4QSP6ON4WTYER4OZCLVGLIB3ANCNFSM5T3WIMPQ> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

-- Bryce Adelstein Lelbach aka wash (he/him/his) US Programming Language Standards (PL22) Chair ISO C++ Library Evolution Chair HPC Programming Models Architect @ NVIDIA --

allisonvacanti · 2022-04-22T16:56:43Z

When NVC++ is in normal C++ mode, it should treat Thrust the same as any other compiler (that is not NVCC).

This is exactly how it works with other C++ compilers. Trying to build the above reproducer with g++ fails in the exact same way, unless the device system is explicitly set to CPP. The only way to fix that on our side is to change the default system, which would be very disruptive.

universal_vector should work with all backends

universal_vector is tested on all combinations of host/device backends and there haven't been any issues. The problem in this case is that the default CUDA backend is being used without compiler support for CUDA.

allisonvacanti · 2022-04-22T17:47:59Z

I was conflating the nvc++ stdpar usage of Thrust with the actual issue here, which is trying to use Thrust functionality in user code when building with nvc++. I agree that nvc++ shouldn't be modifying any Thrust configs in this case, as it's the user's responsibility to configure this.

@jeffhammond There are two ways to fix this, depending on what you need:

As Bryce mentioned, enabling CUDA mode will fix this if your goal is to use universal_vector's memory on a CUDA device.
If your goal is to not target CUDA but still use universal_vector, you'll need to explicitly set THRUST_DEVICE_SYSTEM to use CPP, OMP, or TBB instead of CUDA.

Let me know if there's anything I missed or if we can close this.

jeffhammond · 2022-04-22T17:50:19Z

Why isn't not setting CUDA mode sufficient to enable a non-CUDA back end, or, at the very least, trigger a warning that it creates a scenario where Thrust doesn't work?

allisonvacanti · 2022-04-22T17:59:23Z

Why isn't not setting CUDA mode sufficient to enable a non-CUDA backend?

It's up to the user to pick the system that they want to use -- we can't predict whether they want to use serial, OpenMP, or TBB for non-CUDA builds.

or, at the very least, trigger a warning that it creates a scenario where Thrust doesn't work?

It does emit a diagnostic stating that functionality is missing for a currently selected system:

static assertion failed with "unimplemented for this system"

I agree that this could be improved. I'll update this issue to reflect that we need a better diagnostic in this case.

brycelelbach added the unverified label Apr 20, 2022

brycelelbach closed this Apr 20, 2022

brycelelbach reopened this Apr 20, 2022

brycelelbach removed the unverified label Apr 20, 2022

brycelelbach changed the title ~~why does universal_vector resize only work with stdpar=gpu?~~ universal_vector::resize fails with the default non-CUDA backend Apr 20, 2022

allisonvacanti added the compiler: nvc++ label Apr 22, 2022

allisonvacanti changed the title ~~universal_vector::resize fails with the default non-CUDA backend~~ Improve diagnostic when the (default) CUDA device system is used in non-CUDA builds. Apr 22, 2022

allisonvacanti added enhancement P2: nice to have good first issue repro: verified backend: CUDA and removed compiler: nvc++ labels Apr 22, 2022

Mar	APR	May
	22
2021	2022	2023

NVIDIA / thrust Public

Improve diagnostic when the (default) CUDA device system is used in non-CUDA builds. #1663

Improve diagnostic when the (default) CUDA device system is used in non-CUDA builds. #1663

jeffhammond commented Apr 20, 2022

brycelelbach commented Apr 20, 2022

brycelelbach commented Apr 20, 2022

allisonvacanti commented Apr 22, 2022

dkolsen-pgi commented Apr 22, 2022

brycelelbach commented Apr 22, 2022

allisonvacanti commented Apr 22, 2022 •

edited

allisonvacanti commented Apr 22, 2022

jeffhammond commented Apr 22, 2022

allisonvacanti commented Apr 22, 2022

NVIDIA / thrust Public

Improve diagnostic when the (default) CUDA device system is used in non-CUDA builds. #1663

Improve diagnostic when the (default) CUDA device system is used in non-CUDA builds. #1663

Comments

jeffhammond commented Apr 20, 2022

MCVE

It works with stdpar=gpu

It fails with 25547a4

brycelelbach commented Apr 20, 2022

brycelelbach commented Apr 20, 2022

allisonvacanti commented Apr 22, 2022

dkolsen-pgi commented Apr 22, 2022

brycelelbach commented Apr 22, 2022

allisonvacanti commented Apr 22, 2022 • edited

allisonvacanti commented Apr 22, 2022

jeffhammond commented Apr 22, 2022

allisonvacanti commented Apr 22, 2022

It fails with `25547a4`

allisonvacanti commented Apr 22, 2022 •

edited