Timeline for Fastest and most efficient way to get number of records (lines) in a gzip-compressed file
Current License: CC BY-SA 4.0
12 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Aug 29, 2023 at 22:39 | history | edited | mxmlnkn | CC BY-SA 4.0 |
Redo benchmarks with rapidgzip 0.9.0
|
| Jun 4, 2023 at 7:54 | history | edited | mxmlnkn | CC BY-SA 4.0 |
pragzip -> rapidgzip
|
| Mar 18, 2023 at 19:11 | comment | added | Stéphane Chazelas |
Note that in POSIX sed [ \t] is required to match on either space, backslash or t (GNU sed is only POSIX compliant in that regard when there's POSIXLY_CORRECT in its environment)
|
|
| Mar 18, 2023 at 19:09 | comment | added | Stéphane Chazelas |
If zgrep -c $ is off by one because there's extra data after the last line, that would be on non-text files. The behaviour of grep is unspecified on such input, some ignoring those extra bytes some treating them as an extra line (and I suppose some could report an error).
|
|
| Mar 18, 2023 at 17:40 | history | edited | mxmlnkn | CC BY-SA 4.0 |
Update benchmarks with unrelease pragzip 0.6.0, which uses rpmalloc and optimizes --count-lines further
|
| Jan 25, 2023 at 17:09 | comment | added | James Hirschorn | Nice. It works good. | |
| Jan 25, 2023 at 16:35 | comment | added | mxmlnkn | @JamesHirschorn I appended the answer to your question to my StackOverflow answer because it does not fit into a comment. | |
| Jan 25, 2023 at 16:34 | history | edited | mxmlnkn | CC BY-SA 4.0 |
Add usage from Python
|
| Jan 25, 2023 at 15:43 | comment | added | James Hirschorn |
How can I do the equivalent of pragzip --count-lines in a python script?
|
|
| Nov 12, 2022 at 18:12 | history | edited | mxmlnkn | CC BY-SA 4.0 |
Update benchmarks with pragzip 0.4.0 and new --count-lines option
|
| Aug 25, 2022 at 9:50 | history | edited | mxmlnkn | CC BY-SA 4.0 |
Redo benchmarks with pragzip 0.3.0
|
| Aug 9, 2022 at 19:52 | history | answered | mxmlnkn | CC BY-SA 4.0 |