gpu-programming

(Found this while reviewing #907)

Currently Expr::operator=(const Expr &o) can have two meanings.

https://github.com/taichi-dev/taichi/blob/f5373b1395a66c78506de0fd3c172dffe0d444d0/taichi/ir/expr.cpp#L54-L69

Let's say we have

Expr a, b;
a = b;

If inside a kernel definition, this creates a FrontendAssignStmt in the AST
Otherwise, this lets a hold the express

Describe the bug
We previously had a test coverage of 82% but now it drops to 75%. Ideally, code coverage should be > 80% for a healthy repo.

To Reproduce
Steps to reproduce the behavior:

go to https://codecov.io/gh/uber/aresdb you will see detailed overage for each package, file, method, and lines.

Expected behavior
Ideally, code coverage should be > 80% for a healthy

There are several internal things that make Emu's performance potentially suboptimal. This issue is a place to discuss them.

wgpu::Device::poll is used here and right now it blocks in an async context. I'm not sure what the solution is but there is some discussion [here](gfx-rs/wgpu-rs#214 (comment)

There are several ValueErrors, NotImplementedErrors, etc, across the codebase.

We should create a mechanism to get clean, and uniform error messages.

Is your feature request related to a problem? Please describe.
This may just be a matter of looking in all the wrong places, but documentation of CuSparse arrays, and their support in packages like Flux, are sorely needed. They can be as minimal as possible, but if you’re not familiar with Nvidia’s libraries (I am not, so bear with me) it can be hard to even discover Cusparse.jl.

In the ne

It seems that there is a bug with the call to gpufit within Matlab when including the user_info parameter. Using the included linear_1d model (which utilizes the user_info parameter), I created a simple program in Matlab to model the equation y=x from x=0 to x=10 and called gpufit on the data. This should return the parameters 0 and 1, but results in 4.3467 and 0.8711 instead.

Additionally, if

The following PTX instructions don't have wrapper functions (nor builtins:: templated functions where relevant). Add them!

lop3 - Logical operation on 3 operands using an immediate 3-parameter lookup table.
prefetching instructions?
cvt.pack
fns - find n'th bit set
Sub-32-bit dot product with accumulation: dp4a, dp2a for bytes and halfword, respecti

Apr	MAY	Jun
	26
2019	2020	2021

gpu-programming

Here are 124 public repositories matching this topic...

taichi-dev / taichi

cpp-taskflow / cpp-taskflow

uber / aresdb

QianMo / Game-Programmer-Study-Notes

calebwin / emu

QianMo / GPU-Gems-Book-Source-Code

geomstats / geomstats

JuliaGPU / CuArrays.jl

QianMo / GPU-Pro-Books-Source-Code

brucefan1983 / CUDA-Programming

Glavnokoman / vuh

gpufit / Gpufit

fastflow / fastflow

johannesugb / VolumetricLinesUnity

stetre / moonlibs

adamnemecek / awesome-metal

zilliztech / arctern

hollance / metal-gpgpu

Heteroflow / Heteroflow

ysh329 / OpenCL-101

eyalroz / cuda-kat

xmartlabs / cuda-calculator

Glavnokoman / vulkan-compute-example

weissenberger / gpuhd

andi611 / Apriori-and-Eclat-Frequent-Itemset-Mining

WenqiJiang / Convolution-Neural-Network-by-pyCUDA

KunyiLockeLin / AngryEngine

CoderYQ / GPUImageFilters

Dron-elektron / MAI

ParaGroup / WindFlow

Improve this page

Add this topic to your repo