9
votes
My blit function for my own graphics library
it seems to be optimized quite well by the compiler, as it uses a lot of vectorized instructions
Well, kind of. GCC didn't use vectorization at all, and Clang used some of it but in a very strange ...
8
votes
Accepted
Implementation of linear regression in Python
Just reviewing normalizeFeatures.
Instead of a comment explaining what the function does, write a docstring. (Docstrings are available from the interactive ...
8
votes
AVX Vectorized Multi-threaded Mandelbrot Renderer
There are some & 0xff operations that are not necessary:
(aMask & (~iMask & 0xff)), because the bits reset by ...
7
votes
My blit function for my own graphics library
There are a few things that can be lifted out of loops.
The three screen related variables, screen_height, screen_width, and <...
6
votes
Accepted
Compute a numerical derivative
You can use np.roll to compute the centered differences as a vectorised operation rather than in a for loop:
...
5
votes
Generic pixel class to seamlessly alpha-blend and convert between different pixel structure layouts
The code seems to get more and more questionable as we read downward. Starting at the bottom:
...
5
votes
Accepted
Vectorized 16-bit addition in Standard C
You Have an XY-Problem
Let’s take a look at a very naïve loop to do this in C:
...
5
votes
Calculating premium splits for policies
Remove your warnings ignore. The warnings are there for a reason.
Remove all of your unused datetime imports. Any half-decent ...
4
votes
Accepted
Converting Array of Floats to UINT8 (`char`) or UINT16 (`unsigned short`) Using SSE4
why not load 4 packed __m128 and then store 32 Pixels at once
I think that's 16 pixels, but it's a good plan. The pack instructions were used inefficiently (in the linked question that was not really ...
4
votes
4
votes
Accepted
Determining whether a list of pathways and its genes are all included in another list
If you were to add a cat("hello") at the top of your all_in function, you would find that your function is called 25 times, once ...
4
votes
Accepted
Find the minimum value that data could have had before it was rounded
You can firstly change by difference != 0 and then use na.locf to replace NAs by last ...
4
votes
Accepted
Vectorizing matrix operation instead of for loop in circular matrices
This is easy using numpy.roll, for example:
zx = np.roll(x, 1) * (np.roll(x, 2) + np.roll(x, -1)) - x
4
votes
Accepted
PyTorch Vectorized Implementation for Thresholding and Computing Jaccard Index
I think a different approach is needed to achieve a better performance. The current approach recomputes the Jaccard similarity from scratch for each possible threshold value. However, going from one ...
4
votes
Accepted
Vectorizing equations for fsolve
Yes. You can simplify this code. I made two main changes.
First, I forget that \$x_{50}\$ is known and write an equation for it just like all the other variables, then I replace that equation by the ...
4
votes
Accepted
Vectorized crosstabulation in Python for two arrays with two categories each
Algorithm
If you look at your code, and follow the if-elif part, you see that there are 4 combinations of i and ...
4
votes
Snake game from the viewpoint of the snake
UX
It is not obvious what the user should do when the GUI opens up.
You should display some simple instructions in the GUI, such as:
...
3
votes
Converting Array of `Float32` (`float`) to Array of `UINT8` (`unsigned char`) Using AVX2
Harold's comment is correct.
Consider what happens for float inputs like 5000000000 * 1.0. Conversion to int32_t with ...
3
votes
Remove outliers from a point cloud
Please add a docstring.
Simplify_by_avg_weighted might be named points_near_centroid (or ...
3
votes
Accepted
3
votes
Accepted
Function that fills a time series row-by-row by using the values in the row before
Use Panda's masks,
df_buy_sell[condition]
lets you select all rows in the dataframe that matches your condition. You could then apply your entire function block ...
3
votes
Rolling regressions in R
Here is another solution which uses the rollRegres package
...
3
votes
Accepted
Vectorization, 7-bit encoding
My implementation works just fine, until we go over 2^31 due to compare not doing unsigned comparison.
The "incompleteness" of the set of comparisons is an old problem, and the workarounds are also ...
3
votes
N-Body Optimization
Data layout
You have already experienced first-hand a disadvantage of using "1 physics vector = 1 SIMD vector" (such as __m256d pos), causing some ...
3
votes
Vectorizing a working custom similarity function further using numpy
The comprehension statements are far too long and complicated and need to be broken up (but shouldn't exist at all).
The easy vectorisation pass involves replacing all of the comprehensions with ...
3
votes
Tips to Finetuning to increase the GFLOPS of a SIMD kernel
Fine-grained profiling results normally need to be taken with a grain of salt. A very common situation is that (eg) a load takes a while, but the time is attributed to a later instruction instead. I ...
3
votes
Finding specific promotions from two columns
The main issue with numpy's vectorize function is that it's actually not vectorized. It's an unfortunate misnomer:
The vectorize...
3
votes
Accepted
C - SIMD Code to invert a transformation matrix
Putting the code through LLVM MCA or https://uica.uops.info/ yields one unsurprising (I think) result and one surprise (for me anyway).
The not-surprise
The bottleneck (on Intel Skylake in this ...
2
votes
Vectorized and Multi Threaded Image Convolution
Something to look into: data access patterns. You go across all image lines, processing the first few pixels (were the kernel straddles the boundary), then again across all image lines, processing the ...
Only top scored, non community-wiki answers of a minimum length are eligible
Related Tags
vectorization × 101python × 50
performance × 44
numpy × 32
matlab × 21
c × 11
r × 11
pandas × 8
c++ × 6
python-3.x × 6
matrix × 6
simd × 6
statistics × 5
array × 4
image × 4
c# × 3
random × 3
mathematics × 3
simulation × 3
iteration × 3
machine-learning × 3
numerical-methods × 3
x86 × 3
sse × 3
beginner × 2