1D convolution optimization and general codegen tweaks #1477

Sergio0694 · 2020-12-15T21:12:30Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

This PR does a few things:

Speed optimizations to the 2D pass convolution processor (powering gaussian blur, sharpen, etc.)
Speed optimizations to the bokeh blur
Some general codegen optimizations that should apply to all common pixel conversions, etc.

Benchmarks

Here's a preview of the current improvements for the gaussian blur benchmark:

And here's some more bokeh blur optimizations compared to master, after #1475 got merged:

codecov · 2020-12-15T21:27:27Z

Codecov Report

Merging #1477 (5601559) into master (a8cae3f) will decrease coverage by 0.07%.
The diff coverage is 78.37%.

@@            Coverage Diff             @@
##           master    #1477      +/-   ##
==========================================
- Coverage   83.55%   83.48%   -0.08%     
==========================================
  Files         741      740       -1     
  Lines       32462    32559      +97     
  Branches     3648     3652       +4     
==========================================
+ Hits        27125    27181      +56     
- Misses       4625     4665      +40     
- Partials      712      713       +1

Flag	Coverage Δ
unittests	`83.48% <78.37%> (-0.08%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...s/Convolution/Convolution2PassProcessor{TPixel}.cs	`60.86% <58.85%> (-39.14%)`	⬇️
...mageSharp/ColorSpaces/Companding/SRgbCompanding.cs	`100.00% <100.00%> (ø)`
src/ImageSharp/Common/Helpers/Numerics.cs	`97.80% <100.00%> (+0.15%)`	⬆️
...rp/PixelFormats/Utils/Vector4Converters.Default.cs	`100.00% <100.00%> (ø)`
...ssing/Processors/Convolution/BokehBlurProcessor.cs	`100.00% <100.00%> (ø)`
...ocessors/Convolution/BokehBlurProcessor{TPixel}.cs	`99.35% <100.00%> (+0.01%)`	⬆️
...Processors/Convolution/BoxBlurProcessor{TPixel}.cs	`100.00% <100.00%> (ø)`
...cessors/Convolution/ConvolutionProcessorHelpers.cs	`100.00% <100.00%> (ø)`
...ssors/Convolution/GaussianBlurProcessor{TPixel}.cs	`100.00% <100.00%> (ø)`
...rs/Convolution/GaussianSharpenProcessor{TPixel}.cs	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f1a0fb6...5601559. Read the comment docs.

JimBobSquarePants · 2020-12-16T00:03:27Z

src/ImageSharp/ColorSpaces/Companding/SRgbCompanding.cs

@@ -90,4 +94,4 @@ public static void Compress(ref Vector4 vector)
        [MethodImpl(InliningOptions.ShortMethod)]
        public static float Compress(float channel) => channel <= 0.0031308F ? 12.92F * channel : (1.055F * MathF.Pow(channel, 0.416666666666667F)) - 0.055F;


If we ever figure out how to do an accurate SIMD enable approximation of this we would be laughing.

pow(channel, 0.416666666666667F) => exp(channel * log(0.416666666666667F))

log(0.416666666666667F) == -0.875468737353899935628f

So...

public static void Compress(ref Vector4 vector) { var channels = Unsafe.As<Vector4, Vector128<float>>(ref vector); var log = Vector128.Create(-0.875468737353899935628f); channels = Sse.Multiply(channels, log); channels = Exp(channels); // Isn't simd intrinsic if (Fma.IsSupported) { channels = Fma.MultiplyAdd(Vector128.Create(1.055F), channels, Vector128.Create(-0.055F)); } else { channels = Sse.Add(Sse.Multiply(Vector128.Create(1.055F), channels), Vector128.Create(-0.055F)); } Unsafe.As<Vector4, Vector128<float>>(ref vector) = channels; }

But Exp isn't a Simd intrinsic; however you can approximate it with these sequences sse_mathfun or avx_mathfun?

...p/Processing/Processors/Convolution/Convolution2PassProcessor{TPixel}.cs

JimBobSquarePants

Very, very nice! 🚀

Sergio0694 added 9 commits Dec 15, 2020

Port horizontal convolution processor, remove Y loop

8e67153

Port vertical convolution processor, remove X loop

a618b76

Remove unnecessary inner loop coordinate sampling

f52802d

Switch to shared sampling map for convolution passes

a9c1652

Remove convolution state, more optimizations

e60827f

Remove transposed 1D kernels, switch to float[] type

e574232

Remove leftover ConvolutionRowOperation<TPixel> type

5a38307

Minor code tweaks

e11adc6

More performance improvements to 2 pass convolution

Loading status checks…

cb5c868

Sergio0694 added the area:performance label Dec 15, 2020

Sergio0694 added this to the 1.1.0 milestone Dec 15, 2020

Sergio0694 added this to To Do in ImageSharp via automation Dec 15, 2020

Sergio0694 added 3 commits Dec 15, 2020

More codegen improvements to bokeh blur

Loading status checks…

979baf7

More codegen improvements to shared methods

Loading status checks…

1a3e1e7

Codegen improvements to Numerics.Clamp

Loading status checks…

5601559

Sergio0694 marked this pull request as ready for review Dec 15, 2020

Sergio0694 requested a review from JimBobSquarePants Dec 15, 2020

JimBobSquarePants reviewed Dec 16, 2020

View changes

...p/Processing/Processors/Convolution/Convolution2PassProcessor{TPixel}.cs Show resolved Hide resolved

JimBobSquarePants approved these changes Dec 16, 2020

View changes

ImageSharp automation moved this from To Do to Done Dec 16, 2020

JimBobSquarePants deleted the sp/2pass-convolution-speedup branch Dec 16, 2020

Nov	DEC	Jan
	19
2019	2020	2021

SixLabors / ImageSharp

1D convolution optimization and general codegen tweaks #1477

1D convolution optimization and general codegen tweaks #1477

Sergio0694 commented Dec 15, 2020 •

edited

codecov bot commented Dec 15, 2020 •

edited

This comment has been minimized.

This comment has been minimized.

JimBobSquarePants left a comment

		@@ -90,4 +94,4 @@ public static void Compress(ref Vector4 vector)
		[MethodImpl(InliningOptions.ShortMethod)]
		public static float Compress(float channel) => channel <= 0.0031308F ? 12.92F * channel : (1.055F * MathF.Pow(channel, 0.416666666666667F)) - 0.055F;

SixLabors / ImageSharp

Sponsor SixLabors/ImageSharp

1D convolution optimization and general codegen tweaks #1477

1D convolution optimization and general codegen tweaks #1477

Conversation

Sergio0694 commented Dec 15, 2020 • edited

Prerequisites

Description

Benchmarks

codecov bot commented Dec 15, 2020 • edited

Codecov Report

This comment has been minimized.

JimBobSquarePants Dec 16, 2020 Member

This comment has been minimized.

benaadams Dec 16, 2020

JimBobSquarePants left a comment

Sergio0694 commented Dec 15, 2020 •

edited

codecov bot commented Dec 15, 2020 •

edited

JimBobSquarePants Dec 16, 2020
Member