58

Near as I can tell the zip -T option only determines if files can be extracted -- it doesn't really test the archive for internal integrity. For example, I deliberately corrupted the local (not central directory) CRC for a file, and zip didn't care at all, reporting the archive as OK. Is there some other utility to do this?

There's a lot of internal redundancy in ZIP files, and it would be nice to have a way of checking it all. Of course, normally the central directory is all you need, but when repairing a corrupted archive often all you have is a fragment, with the central directory clobbered or missing. I'd like to know if archives I create are as recoverable as possible.

3
  • 4
    What about unzip -t? Commented Apr 18, 2015 at 20:09
  • 2
    Same behavior as zip. Commented Apr 18, 2015 at 20:15
  • 1
    unzip -t could be an option, but what if we need to use the Alpine Linux unzip where -t option is not available? Commented Oct 7, 2021 at 17:02

3 Answers 3

56

unzip -t

Test archive files.

This option extracts each specified file in memory and compares the CRC (cyclic redundancy check, an enhanced checksum) of the expanded file with the original's stored CRC value.

[ source: https://linux.die.net/man/1/unzip ]

5
  • 3
    There are 2 CRCs per file: local and central. unzip -t only tests the latter. Commented Apr 18, 2015 at 20:33
  • 2
    i don't know what you mean by "local" versus "central" (central to what?) but when i run "unzip -t myzip_file.zip" i see a line output for commenting on the integrity of each and every zipped file, like (imagine better formatting): "testing: AARiseTransitSet.cpp OK testing: AARiseTransitSet.h OK testing: AASaturn.cpp OK testing: AASaturn.h OK ... Commented Apr 18, 2015 at 20:39
  • 4
    Not the place to explain internal structure of ZIP files. Wikepedia article is pretty good on this. As I said, it is a misleading report that you are seeing. Commented Apr 18, 2015 at 20:55
  • If i go into a zip file with a hex-editor and change one byte, then i see for one file: testing: AA_sphere.htm bad CRC 7952862e (should be 44c6f7f8) while the rest are listed as "OK". you'll continue to declare this as "misleading", but that's exactly what i expect for a file-by-file CRC check of a zip file. now... good luck to you Sir. Commented Apr 18, 2015 at 21:02
  • 3
    I think you have changed the central directory CRC, at the end. Try changing the local one, before or after the file. Commented Apr 18, 2015 at 21:21
34

Using Info-ZIP, attempting to fix an archive will compare the local and central CRCs, and combining that with archive tests will allow all the CRCs to be checked. If you run

unzip -t archive.zip

and

zip -F archive.zip --out archivefix.zip

and neither complain, that means the archive’s contents match both the central and local CRCs. (You can delete archivefix.zip afterwards.)

To verify this, starting with the Info-ZIP source code for zip 3.0, I created a file as follows:

zip -9 test.zip zip.txt zipup.c

I then corrupted the central directory CRC for zip.txt by changing the byte at offset 0xB137. I got the opposite behaviour to what you observed; unzip -v reported the altered CRC from the central directory, but unzip -t and zip -T reported that the file was OK (checking against the local CRC).

But running

zip -F test --out testfix

reported

Fix archive (-F) - assume mostly intact archive
Zip entry offsets do not need adjusting
 copying: zip.txt
        zip warning: Local Entry CRC does not match CD: zip.txt
 copying: zipup.c

The "corrected" file still listed the altered CRC for zip.txt.

Altering the local CRC for zip.txt at offset 0x10 caused both unzip -t and zip -T to report a CRC error, but zip -F didn't spot anything wrong.

Thus from my experiments, mismatches between an archive entry's contents and its CRCs can be detected as follows:

  • local only: zip -T and unzip -t; zip -F will also complain about the local-central mismatch
  • local and central: zip -T and unzip -t
  • central only: zip -T and unzip -t will not complain, but zip -F will indicate a local-central mismatch

(Note that by default zip -T simply uses unzip -tqq, so zip -T and unzip -t really are equivalent. You can read the unzip source code to check that testing an archive really compares the local CRC, not the central one; look for extract_or_test_files(), extract_or_test_entrylist() and extract_or_test_member(), all in extract.c.)

12
  • Complicated. And no doubt very dependent on what versions (GNU, BSD, etc.) And CRC is only one of the numerous integrity checks that can be performed. Commented Apr 18, 2015 at 22:55
  • 4
    There aren't many versions of zip and unzip available on Unix-like platforms; Info-ZIP is used pretty much everywhere... Commented Apr 18, 2015 at 23:02
  • 4
    As far as it being complicated, it takes just two commands; if both unzip -t and zip -F run without error, you're OK and both CRCs have been checked. Commented Apr 18, 2015 at 23:04
  • Thanks! Will check this out. Also, forgot to mention: ZIP files are ZIP64. Commented Apr 18, 2015 at 23:18
  • Answers like this should be golden standard on this website: a thorough research, including experiments and digging through source code, summed up in a concise manner. Commented Dec 1, 2020 at 23:30
5

You might want to have a look at zipdetails. From its man page:

Zipdetails displays information about the internal record structure of the zip file. It is not concerned with displaying any details of the compressed data stored in the zip file.

I do not know if zipdetails will detect inconsistencies, but it should help you in finding/understanding inconsistencies. Here's a small sample from its output:

00000 LOCAL HEADER #1       04034B50
00004 Extract Zip Spec      14 '2.0'
00005 Extract OS            00 'MS-DOS'
00006 General Purpose Flag  0808
      [Bits 1-2]            0 'Normal Compression'
      [Bit  3]              1 'Streamed'
      [Bit 11]              1 'Language Encoding'
00008 Compression Method    0008 'Deflated'
0000A Last Mod Time         5352884C 'Mon Oct 18 17:02:24 2021'
0000E CRC                   00000000
00012 Compressed Length     00000000
00016 Uncompressed Length   00000000
0001A Filename Length       000B
0001C Extra Length          0000
0001E Filename              'graphic.svg'
00029 PAYLOAD

02947 STREAMING DATA HEADER 08074B50
0294B CRC                   C622C669
0294F Compressed Length     0000291E
02953 Uncompressed Length   0002F706

I can also confirm this bit from the man page:

Error handling is still a work in progress. If the program encounters a problem reading a zip file it is likely to terminate with an unhelpful error message.

1
  • 1
    zipdetails is indeed a useful tool; while it doesn’t flag inconsistencies, it does display both the local and central CRCs, and those values can be compared manually. Commented Dec 16, 2021 at 16:32

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.