Skip to main content
9 votes

My blit function for my own graphics library

it seems to be optimized quite well by the compiler, as it uses a lot of vectorized instructions Well, kind of. GCC didn't use vectorization at all, and Clang used some of it but in a very strange ...
user555045's user avatar
  • 12.4k
8 votes
Accepted

Implementation of linear regression in Python

Just reviewing normalizeFeatures. Instead of a comment explaining what the function does, write a docstring. (Docstrings are available from the interactive ...
Gareth Rees's user avatar
  • 50.1k
8 votes

AVX Vectorized Multi-threaded Mandelbrot Renderer

There are some & 0xff operations that are not necessary: (aMask & (~iMask & 0xff)), because the bits reset by ...
user555045's user avatar
  • 12.4k
7 votes

My blit function for my own graphics library

There are a few things that can be lifted out of loops. The three screen related variables, screen_height, screen_width, and <...
1201ProgramAlarm's user avatar
6 votes
Accepted

Compute a numerical derivative

You can use np.roll to compute the centered differences as a vectorised operation rather than in a for loop: ...
301_Moved_Permanently's user avatar
5 votes

Generic pixel class to seamlessly alpha-blend and convert between different pixel structure layouts

The code seems to get more and more questionable as we read downward. Starting at the bottom: ...
Quuxplusone's user avatar
  • 19.7k
5 votes
Accepted

Vectorized 16-bit addition in Standard C

You Have an XY-Problem Let’s take a look at a very naïve loop to do this in C: ...
Davislor's user avatar
  • 9,115
5 votes

Calculating premium splits for policies

Remove your warnings ignore. The warnings are there for a reason. Remove all of your unused datetime imports. Any half-decent ...
Reinderien's user avatar
  • 71.1k
4 votes
Accepted

Converting Array of Floats to UINT8 (`char`) or UINT16 (`unsigned short`) Using SSE4

why not load 4 packed __m128 and then store 32 Pixels at once I think that's 16 pixels, but it's a good plan. The pack instructions were used inefficiently (in the linked question that was not really ...
user555045's user avatar
  • 12.4k
4 votes

Compute a numerical derivative

You can vectorize the calculation ...
Maarten Fabré's user avatar
4 votes
Accepted

Determining whether a list of pathways and its genes are all included in another list

If you were to add a cat("hello") at the top of your all_in function, you would find that your function is called 25 times, once ...
flodel's user avatar
  • 3,555
4 votes
Accepted

Find the minimum value that data could have had before it was rounded

You can firstly change by difference != 0 and then use na.locf to replace NAs by last ...
m0nhawk's user avatar
  • 366
4 votes
Accepted

Vectorizing matrix operation instead of for loop in circular matrices

This is easy using numpy.roll, for example: zx = np.roll(x, 1) * (np.roll(x, 2) + np.roll(x, -1)) - x
Gareth Rees's user avatar
  • 50.1k
4 votes
Accepted

PyTorch Vectorized Implementation for Thresholding and Computing Jaccard Index

I think a different approach is needed to achieve a better performance. The current approach recomputes the Jaccard similarity from scratch for each possible threshold value. However, going from one ...
GZ0's user avatar
  • 2,361
4 votes
Accepted

Vectorizing equations for fsolve

Yes. You can simplify this code. I made two main changes. First, I forget that \$x_{50}\$ is known and write an equation for it just like all the other variables, then I replace that equation by the ...
David's user avatar
  • 241
4 votes
Accepted

Vectorized crosstabulation in Python for two arrays with two categories each

Algorithm If you look at your code, and follow the if-elif part, you see that there are 4 combinations of i and ...
Maarten Fabré's user avatar
4 votes

Snake game from the viewpoint of the snake

UX It is not obvious what the user should do when the GUI opens up. You should display some simple instructions in the GUI, such as: ...
toolic's user avatar
  • 15.8k
3 votes

Converting Array of `Float32` (`float`) to Array of `UINT8` (`unsigned char`) Using AVX2

Harold's comment is correct. Consider what happens for float inputs like 5000000000 * 1.0. Conversion to int32_t with ...
Peter Cordes's user avatar
  • 3,761
3 votes

Remove outliers from a point cloud

Please add a docstring. Simplify_by_avg_weighted might be named points_near_centroid (or ...
J_H's user avatar
  • 42.3k
3 votes
Accepted

Normalise list of N dimensional numpy arrays

The trick is to use the keepdims parameter. ...
Seanny123's user avatar
  • 1,617
3 votes
Accepted

Function that fills a time series row-by-row by using the values in the row before

Use Panda's masks, df_buy_sell[condition] lets you select all rows in the dataframe that matches your condition. You could then apply your entire function block ...
mochi's user avatar
  • 1,144
3 votes

Rolling regressions in R

Here is another solution which uses the rollRegres package ...
Benjamin Christoffersen's user avatar
3 votes
Accepted

Vectorization, 7-bit encoding

My implementation works just fine, until we go over 2^31 due to compare not doing unsigned comparison. The "incompleteness" of the set of comparisons is an old problem, and the workarounds are also ...
user555045's user avatar
  • 12.4k
3 votes

N-Body Optimization

Data layout You have already experienced first-hand a disadvantage of using "1 physics vector = 1 SIMD vector" (such as __m256d pos), causing some ...
user555045's user avatar
  • 12.4k
3 votes

Vectorizing a working custom similarity function further using numpy

The comprehension statements are far too long and complicated and need to be broken up (but shouldn't exist at all). The easy vectorisation pass involves replacing all of the comprehensions with ...
Reinderien's user avatar
  • 71.1k
3 votes

Tips to Finetuning to increase the GFLOPS of a SIMD kernel

Fine-grained profiling results normally need to be taken with a grain of salt. A very common situation is that (eg) a load takes a while, but the time is attributed to a later instruction instead. I ...
user555045's user avatar
  • 12.4k
3 votes

Finding specific promotions from two columns

The main issue with numpy's vectorize function is that it's actually not vectorized. It's an unfortunate misnomer: The vectorize...
tdy's user avatar
  • 2,266
3 votes
Accepted

C - SIMD Code to invert a transformation matrix

Putting the code through LLVM MCA or https://uica.uops.info/ yields one unsurprising (I think) result and one surprise (for me anyway). The not-surprise The bottleneck (on Intel Skylake in this ...
user555045's user avatar
  • 12.4k
2 votes

Vectorized and Multi Threaded Image Convolution

Something to look into: data access patterns. You go across all image lines, processing the first few pixels (were the kernel straddles the boundary), then again across all image lines, processing the ...
Cris Luengo's user avatar
  • 7,021

Only top scored, non community-wiki answers of a minimum length are eligible