1. Home
2. Questions
3. Unanswered
4. AI Assist Labs
5. Tags
7. Chat
8. Users
10. Companies
Teams

Ask questions, find answers and collaborate at work with Stack Overflow for Teams.
Try Teams for free Explore Teams
Teams
Ask questions, find answers and collaborate at work with Stack Overflow for Teams. Explore Teams

Return to Answer

added 307 characters in body

Source Link

edited May 8, 2017 at 8:37

Kusalananda ♦

355.8k
42
735
1.1k

The sed, perl and awk commands that you mention may be correct, but they all read the compressed data and counts newline characters in that. These newline characters have nothing to do with the newline characters in the uncompressed data.

To count the number of lines in the uncompressed data, there is no way around uncompressing it. Your approach with zcat is the correct approach and since the data is so large, it will take time to uncompress it.

Most utilities that deals with gzip compression and decompression will most likely use the same shared library routines to do so. The only way to speed it up would be to find an implementation of the zlib routines that are somehow faster than the default ones, and rebuild e.g. zcat to use those.

The sed, perl and awk commands that you mention may be correct, but they all read the compressed data and counts newline characters in that. These newline characters have nothing to do with the newline characters in the uncompressed data.

To count the number of lines in the uncompressed data, there is no way around uncompressing it. Your approach with zcat is the correct approach and since the data is so large, it will take time to uncompress it.

The sed, perl and awk commands that you mention may be correct, but they all read the compressed data and counts newline characters in that. These newline characters have nothing to do with the newline characters in the uncompressed data.

To count the number of lines in the uncompressed data, there is no way around uncompressing it. Your approach with zcat is the correct approach and since the data is so large, it will take time to uncompress it.

Most utilities that deals with gzip compression and decompression will most likely use the same shared library routines to do so. The only way to speed it up would be to find an implementation of the zlib routines that are somehow faster than the default ones, and rebuild e.g. zcat to use those.

Source Link

answered May 8, 2017 at 7:28

Kusalananda ♦

355.8k
42
735
1.1k

The sed, perl and awk commands that you mention may be correct, but they all read the compressed data and counts newline characters in that. These newline characters have nothing to do with the newline characters in the uncompressed data.

To count the number of lines in the uncompressed data, there is no way around uncompressing it. Your approach with zcat is the correct approach and since the data is so large, it will take time to uncompress it.