Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
ParallelBlockCompressedOutputStream #922
Conversation
Codecov Report
@@ Coverage Diff @@
## master #922 +/- ##
===============================================
+ Coverage 65.118% 65.151% +0.033%
- Complexity 7278 7289 +11
===============================================
Files 528 530 +2
Lines 31971 32067 +96
Branches 5468 5478 +10
===============================================
+ Hits 20819 20892 +73
- Misses 8995 9013 +18
- Partials 2157 2162 +5
|
2f9fff3
to
d3101b4
d3101b4
to
b9ec2b4

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

We would like to represent new multithreading implementation of BlockCompressedOutptuStream.
ParallelBlockCompressedOutptuStreamprovides parallel zipping of GZ-blocks, which leads to performance gain by utilizing CPU cores.Base
AbstractBlockCompressedOutptuStreamclass was extracted.AbstractBlockCompressedOutptuStreamwas then extended by singlethreadBlockCompressedOutptuStreamand byParallelBlockCompressedOutptuStreamimplementation.ParallelBlockCompressedOutptuStreamimplementsdeflateBlockmethod, which is called at the moment the buffer is full and GZ-block should be compressed and be written. The ParallelBCOSdeflateBlockimplementation submits the task of zipping the GZ-block to the ThreadPoolExecutor, so it will be processed in another thread in parallel. The number of threads in ThreadPoolExecutor (number of blocks are processed in parallel) is controlled by setting-Dsamjdk.zip_threadsproperty. If the property is equal to 0 (by default), single thread implementation is used.After enough (64 * ZIP_THREADS) deflating tasks are submitted, the writing task will be submitted. Writing task will join all previous deflating tasks and write them in the original order.
Here are benchmarks for comparing performance results of BlockCompressedOutputStream and ParallelBlockCompressedOutputStream:
We just generate random block of data and write it to the output stream.
Here are also results of second part of SortSam where BlockCompressedOutputStream is used:
NOTE:
write(), close() and flush() methods can block the current thread because we have to wait until all tasks for zipping and writing will be completed.
ThreadPoolExecutor is static, for the reason to restrict the number of threads globally (in case several PBCOS are created).