I just spent about 90 minutes trying to report this problem through
the normal support channels with no useful result, so, in desperation,
I'm trying here, in the hope that someone can direct this report to some
useful place.
There appears to be a bug in the .zip archive reader used by Windows
Explorer in Windows 7 (and up, most likely).
An Info-ZIP Zip user recently reported a problem with an archive
created using our Zip program. The archive was valid, but it contained
a file which was larger than 4GiB. The complaint was that Windows
Explorer displayed (and, apparently believed) an absurdly large size
value for this large-file archive member. We have since reproduced the
problem.
The original .zip archive format includes uncompressed and compressed
sizes for archive members (files), and these sizes were stored in 32-bit
fields. This caused problems for files which are larger than 4GiB (or,
on some system types, where signed size values were used, 2GiB). The
solution to this fundamental limitation was to extend the .zip archive
format to allow storage of 64-bit member sizes, when necessary. (PKWARE
identifies this format extension as "Zip64".)
The .zip archive format includes a mechanism, the "Extra Field", for
storing various kinds of metadata which had no place in the normal
archive file headers. Examples include OS-specific file-attribute data,
such as Finder info and extended attributes for Apple Macintosh; record
format, record size, and record type data for VMS/OpenVMS; universal
file times and/or UID/GID for UNIX(-like) systems; and so on. The Extra
Field is where the 64-bit member sizes are stored, when the fixed 32-bit
size fields are too small.
An Extra Field has a structure which allows multiple types of extra
data to be included. It comprises one or more "Extra Blocks", each of
which has the following structure:
Size (bytes) | Description
--------------+------------
2 | Type code
2 | Number of data bytes to follow
(variable) | Extra block data
The problem with the .zip archive reader used by Windows Explorer is
that it appears to expect the Extra Block which includes the 64-bit
member sizes (type code = 0x0001) to be the first (or only) Extra Block
in the Extra Field. If some other Extra Block appears at the start of
the Extra Field, then its (non-size) data are being incorrectly
interpreted as the 64-bit sizes, while the actual 64-bit size data,
further along in the Extra Field, are ignored.
Perhaps the .zip archive _writer_ used by Windows Explorer always
places the Extra Block with the 64-bit sizes in this special location,
but the .zip specification does not demand any particular order or
placement of Extra Blocks in the Extra Field, and other programs
(Info-ZIP Zip, for example) should not be expected to abide by this
artificial restriction. For details, see section "4.5 Extensible data
fields" in the PKWARE APPNOTE:
http://www.pkware.com/documents/casestudies/APPNOTE.TXT
A .zip archive reader is expected to consider the Extra Block type
codes, and interpret accordingly the data which follow. In particular,
it's not sufficient to trust that any particular Extra Block will be the
first one in the Extra Field. It's generally safe to ignore any Extra
Block whose type code is not recognized, but it's crucial to scan the
Extra Field, identify each Extra Block, and handle it according to its
type.
Here are some relatively small (about 14MiB each) test archives which
illustrate the problem:
http://antinode.info/ftp/info-zip/ms_zip64/test_4g.zip
http://antinode.info/ftp/info-zip/ms_zip64/test_4g_V.zip
http://antinode.info/ftp/info-zip/ms_zip64/test_4g_W.zip
Correct info, from UnZip 6.00 ("unzip -lv"):
Archive: test_4g.zip
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
4362076160 Defl:X 14800839 100% 05-01-2014 15:33 6d8d2ece test_4g.txt
[...]
Archive: test_4g_V.zip
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
4362076160 Defl:X 14800839 100% 05-01-2014 15:33 6d8d2ece test_4g.txt
[...]
Archive: test_4g_W.zip
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
4362076160 Defl:X 14800839 100% 05-01-2014 15:33 6d8d2ece test_4g.txt
[...]
(In these reports, "Length" is the uncompressed size; "Size" is the
compressed size.)
Incorrect info, from (Windows 7) Windows Explorer:
Archive Name Compressed size Size
test_4g.zip test_4g.txt 14,454 KB 562,951,376,907,238 KB
test_4g_V.zip test_4g.txt 14,454 KB 8,796,110,221,518 KB
test_4g_W.zip test_4g.txt 14,454 KB 1,464,940,363,777 KB
Faced with these unrealistic sizes, Windows Explorer refuses to
extract the member file, for lack of (petabytes of) free disk space.
The archive test_4g.zip has the following Extra Blocks: universal
time (type = 0x5455) and 64-bit sizes (type = 0x0001). test_4g_V.zip
has: PWWARE VMS (type = 0x000c) and 64-bit sizes (type = 0x0001).
test_4g_W.zip has: NT security descriptor (type = 0x4453), universal
time (type = 0x5455), and 64-bit sizes (type = 0x0001). Obviously,
Info-ZIP UnZip has no trouble correctly finding the 64-bit size info in
these archives, but Windows Explorer is clearly confused. (Note that
"1,464,940,363,777 KB" translates to 0x0005545500000400 (bytes), and
"0x00055455" looks exactly like the size, "0x0005" and the type code
"0x5455" for a "UT" universal time Extra Block, which was present in
that archive. This is consistent with the hypothesis that the wrong
data in the Extra Field are being interpreted as the 64-bit size data.)
Without being able to see the source code involved here, it's hard to
know exactly what it's doing wrong, but it does appear that the .zip
reader used by Windows Explorer is using a very (too) simple-minded
method to extract 64-bit size data from the Extra Field, causing it to
get bad data from a properly formed archive.
I suspect that the engineer involved will have little trouble finding
and fixing the code which parses an Extra Field to extract the 64-bit
sizes correctly, but if anyone has any questions, we'd be happy to help.
For the Info-ZIP (http://info-zip.org/) team,
Steven Schweda