1D convolution optimization and general codegen tweaks #1477
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1477 +/- ##
==========================================
- Coverage 83.55% 83.48% -0.08%
==========================================
Files 741 740 -1
Lines 32462 32559 +97
Branches 3648 3652 +4
==========================================
+ Hits 27125 27181 +56
- Misses 4625 4665 +40
- Partials 712 713 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
| @@ -90,4 +94,4 @@ public static void Compress(ref Vector4 vector) | |||
| [MethodImpl(InliningOptions.ShortMethod)] | |||
| public static float Compress(float channel) => channel <= 0.0031308F ? 12.92F * channel : (1.055F * MathF.Pow(channel, 0.416666666666667F)) - 0.055F; | |||
JimBobSquarePants
Dec 16, 2020
Member
If we ever figure out how to do an accurate SIMD enable approximation of this we would be laughing.
If we ever figure out how to do an accurate SIMD enable approximation of this we would be laughing.
benaadams
Dec 16, 2020
pow(channel, 0.416666666666667F) => exp(channel * log(0.416666666666667F))
log(0.416666666666667F) == -0.875468737353899935628f
So...
public static void Compress(ref Vector4 vector)
{
var channels = Unsafe.As<Vector4, Vector128<float>>(ref vector);
var log = Vector128.Create(-0.875468737353899935628f);
channels = Sse.Multiply(channels, log);
channels = Exp(channels); // Isn't simd intrinsic
if (Fma.IsSupported)
{
channels = Fma.MultiplyAdd(Vector128.Create(1.055F), channels, Vector128.Create(-0.055F));
}
else
{
channels = Sse.Add(Sse.Multiply(Vector128.Create(1.055F), channels), Vector128.Create(-0.055F));
}
Unsafe.As<Vector4, Vector128<float>>(ref vector) = channels;
}
But Exp isn't a Simd intrinsic; however you can approximate it with these sequences sse_mathfun or avx_mathfun?
pow(channel, 0.416666666666667F) => exp(channel * log(0.416666666666667F))
log(0.416666666666667F) == -0.875468737353899935628f
So...
public static void Compress(ref Vector4 vector)
{
var channels = Unsafe.As<Vector4, Vector128<float>>(ref vector);
var log = Vector128.Create(-0.875468737353899935628f);
channels = Sse.Multiply(channels, log);
channels = Exp(channels); // Isn't simd intrinsic
if (Fma.IsSupported)
{
channels = Fma.MultiplyAdd(Vector128.Create(1.055F), channels, Vector128.Create(-0.055F));
}
else
{
channels = Sse.Add(Sse.Multiply(Vector128.Create(1.055F), channels), Vector128.Create(-0.055F));
}
Unsafe.As<Vector4, Vector128<float>>(ref vector) = channels;
}But Exp isn't a Simd intrinsic; however you can approximate it with these sequences sse_mathfun or avx_mathfun?
...p/Processing/Processors/Convolution/Convolution2PassProcessor{TPixel}.cs
Show resolved
Hide resolved
|
Very, very nice! |
f84d525
into
master

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

Prerequisites
Description
This PR does a few things:
Benchmarks
Here's a preview of the current improvements for the gaussian blur benchmark:
And here's some more bokeh blur optimizations compared to master, after #1475 got merged: