Class ZipArchiveInputStream
- All Implemented Interfaces:
Closeable
,AutoCloseable
,InputStreamStatistics
- Direct Known Subclasses:
JarArchiveInputStream
As of Apache Commons Compress it transparently supports Zip64 extensions and thus individual entries and archives larger than 4 GB or with more than 65536 entries.
The ZipFile
class is preferred when reading from files
as ZipArchiveInputStream
is limited by not being able to
read the central directory header before returning entries. In
particular ZipArchiveInputStream
- may return entries that are not part of the central directory at all and shouldn't be considered part of the archive.
- may return several entries with the same name.
- will not return internal or external attributes.
- may return incomplete extra field data.
- may return unknown sizes and CRC values for entries until the next entry has been reached if the archive uses the data descriptor feature.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate class
Bounded input stream adapted from commons-ioprivate static final class
Structure collecting information for the entry that is currently being read. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate boolean
Whether the stream will try to read STORED entries that use a data descriptor.private static final byte[]
private final ByteBuffer
Buffer used to read from the wrapped stream.private static final byte[]
private static final int
private boolean
Whether the stream has been closed.The entry that is currently being read.private static final byte[]
(package private) final String
private int
private boolean
Whether the stream has reached the central directory - and thus found all entries.private final InputStream
Wrapped stream, will always be a PushbackInputStream.private final Inflater
Inflater used for all deflated entries.private ByteArrayInputStream
When reading a stored entry that uses the data descriptor this stream has to read the full entry and caches it.private static final byte[]
private static final int
private final byte[]
private static final BigInteger
private final byte[]
private final byte[]
private final boolean
Whether the stream will try to skip the zip split signature(08074B50) at the beginningprivate static final long
private final byte[]
private long
Count decompressed bytes for current entryprivate static final String
private final boolean
Whether to look for and use Unicode extra fields.private final byte[]
private final ZipEncoding
The zip encoding to use for file names and the file comment. -
Constructor Summary
ConstructorsConstructorDescriptionZipArchiveInputStream
(InputStream inputStream) Create an instance using UTF-8 encodingZipArchiveInputStream
(InputStream inputStream, String encoding) Create an instance using the specified encodingZipArchiveInputStream
(InputStream inputStream, String encoding, boolean useUnicodeExtraFields) Create an instance using the specified encodingZipArchiveInputStream
(InputStream inputStream, String encoding, boolean useUnicodeExtraFields, boolean allowStoredEntriesWithDataDescriptor) Create an instance using the specified encodingZipArchiveInputStream
(InputStream inputStream, String encoding, boolean useUnicodeExtraFields, boolean allowStoredEntriesWithDataDescriptor, boolean skipSplitSig) Create an instance using the specified encoding -
Method Summary
Modifier and TypeMethodDescriptionprivate boolean
bufferContainsSignature
(ByteArrayOutputStream bos, int offset, int lastRead, int expectedDDLen) Checks whether the current buffer contains the signature of a "data descriptor", "local file header" or "central directory entry".private int
cacheBytesRead
(ByteArrayOutputStream bos, int offset, int lastRead, int expecteDDLen) If the last read bytes could hold a data descriptor and an incomplete signature then save the last bytes to the front of the buffer and cache everything in front of the potential data descriptor into the given ByteArrayOutputStream.boolean
Whether this class is able to read the given entry.private static boolean
checksig
(byte[] signature, byte[] expected) void
close()
private void
Closes the current ZIP archive entry and positions the underlying stream to the beginning of the next entry.private boolean
If the compressed size of the current entry is included in the entry header and there are any outstanding bytes in the underlying stream, then this returns true.private void
Read all data of the current entry from the underlying stream that hasn't been read, yet.private int
fill()
private boolean
Reads forward until the signature of the "End of central directory" record is found.private long
Get the number of bytes Inflater has actually processed.long
Returns the next Archive Entry in this Stream.long
private boolean
isApkSigningBlock
(byte[] suspectLocalFileHeader) Checks whether this might be an APK Signing Block.private boolean
isFirstByteOfEocdSig
(int b) static boolean
matches
(byte[] signature, int length) Checks if the signature matches what is expected for a zip file.private void
processZip64Extra
(ZipLong size, ZipLong cSize) Records whether a Zip64 extra is present and sets the size information from it if sizes are 0xFFFFFFFF and the entry doesn't use a data descriptor.private void
pushback
(byte[] buf, int offset, int length) int
read
(byte[] buffer, int offset, int length) private void
private int
readDeflated
(byte[] buffer, int offset, int length) Implementation of read for DEFLATED entries.private void
Fills the given array with the first local file header and deals with splitting/spanning markers that may prefix the first LFH.private int
readFromInflater
(byte[] buffer, int offset, int length) Potentially reads more bytes to fill the inflater's buffer and reads from it.private void
readFully
(byte[] b) private void
readFully
(byte[] b, int off) private int
Reads bytes by reading from the underlying stream rather than the (potentially inflating) archive stream - whichread(byte[], int, int)
would do.private byte[]
readRange
(int len) private int
readStored
(byte[] buffer, int offset, int length) Implementation of read for STORED entries.private void
Caches a stored entry that uses the data descriptor.private void
realSkip
(long value) Skips bytes by reading from the underlying stream rather than the (potentially inflating) archive stream - whichskip(long)
would do.long
skip
(long value) Skips over and discards value bytes of data from this input stream.private void
Reads the stream until it find the "End of central directory record" and consumes it as well.private boolean
Whether the compressed size for the entry is either known or not required by the compression method being used.private boolean
Whether this entry requires a data descriptor this library can work with.Methods inherited from class org.apache.commons.compress.archivers.ArchiveInputStream
count, count, getBytesRead, getCount, pushedBackBytes, read
Methods inherited from class java.io.InputStream
available, mark, markSupported, read, readAllBytes, readNBytes, reset, transferTo
-
Field Details
-
zipEncoding
The zip encoding to use for file names and the file comment. -
encoding
-
useUnicodeExtraFields
private final boolean useUnicodeExtraFieldsWhether to look for and use Unicode extra fields. -
in
Wrapped stream, will always be a PushbackInputStream. -
inf
Inflater used for all deflated entries. -
buf
Buffer used to read from the wrapped stream. -
current
The entry that is currently being read. -
closed
private boolean closedWhether the stream has been closed. -
hitCentralDirectory
private boolean hitCentralDirectoryWhether the stream has reached the central directory - and thus found all entries. -
lastStoredEntry
When reading a stored entry that uses the data descriptor this stream has to read the full entry and caches it. This is the cache. -
allowStoredEntriesWithDataDescriptor
private boolean allowStoredEntriesWithDataDescriptorWhether the stream will try to read STORED entries that use a data descriptor. Setting it to true means we will not stop reading a entry with the compressed size, instead we will stoping reading a entry when a data descriptor is met(by finding the Data Descriptor Signature). This will completely break down in some cases - like JARs in WARs.See also : https://issues.apache.org/jira/projects/COMPRESS/issues/COMPRESS-555 https://github.com/apache/commons-compress/pull/137#issuecomment-690835644
-
uncompressedCount
private long uncompressedCountCount decompressed bytes for current entry -
skipSplitSig
private final boolean skipSplitSigWhether the stream will try to skip the zip split signature(08074B50) at the beginning -
LFH_LEN
private static final int LFH_LEN- See Also:
-
CFH_LEN
private static final int CFH_LEN- See Also:
-
TWO_EXP_32
private static final long TWO_EXP_32- See Also:
-
lfhBuf
private final byte[] lfhBuf -
skipBuf
private final byte[] skipBuf -
shortBuf
private final byte[] shortBuf -
wordBuf
private final byte[] wordBuf -
twoDwordBuf
private final byte[] twoDwordBuf -
entriesRead
private int entriesRead -
USE_ZIPFILE_INSTEAD_OF_STREAM_DISCLAIMER
- See Also:
-
LFH
private static final byte[] LFH -
CFH
private static final byte[] CFH -
DD
private static final byte[] DD -
APK_SIGNING_BLOCK_MAGIC
private static final byte[] APK_SIGNING_BLOCK_MAGIC -
LONG_MAX
-
-
Constructor Details
-
ZipArchiveInputStream
Create an instance using UTF-8 encoding- Parameters:
inputStream
- the stream to wrap
-
ZipArchiveInputStream
Create an instance using the specified encoding- Parameters:
inputStream
- the stream to wrapencoding
- the encoding to use for file names, use null for the platform's default encoding- Since:
- 1.5
-
ZipArchiveInputStream
public ZipArchiveInputStream(InputStream inputStream, String encoding, boolean useUnicodeExtraFields) Create an instance using the specified encoding- Parameters:
inputStream
- the stream to wrapencoding
- the encoding to use for file names, use null for the platform's default encodinguseUnicodeExtraFields
- whether to use InfoZIP Unicode Extra Fields (if present) to set the file names.
-
ZipArchiveInputStream
public ZipArchiveInputStream(InputStream inputStream, String encoding, boolean useUnicodeExtraFields, boolean allowStoredEntriesWithDataDescriptor) Create an instance using the specified encoding- Parameters:
inputStream
- the stream to wrapencoding
- the encoding to use for file names, use null for the platform's default encodinguseUnicodeExtraFields
- whether to use InfoZIP Unicode Extra Fields (if present) to set the file names.allowStoredEntriesWithDataDescriptor
- whether the stream will try to read STORED entries that use a data descriptor- Since:
- 1.1
-
ZipArchiveInputStream
public ZipArchiveInputStream(InputStream inputStream, String encoding, boolean useUnicodeExtraFields, boolean allowStoredEntriesWithDataDescriptor, boolean skipSplitSig) Create an instance using the specified encoding- Parameters:
inputStream
- the stream to wrapencoding
- the encoding to use for file names, use null for the platform's default encodinguseUnicodeExtraFields
- whether to use InfoZIP Unicode Extra Fields (if present) to set the file names.allowStoredEntriesWithDataDescriptor
- whether the stream will try to read STORED entries that use a data descriptorskipSplitSig
- Whether the stream will try to skip the zip split signature(08074B50) at the beginning. You will need to set this to true if you want to read a split archive.- Since:
- 1.20
-
-
Method Details
-
getNextZipEntry
- Throws:
IOException
-
readFirstLocalFileHeader
Fills the given array with the first local file header and deals with splitting/spanning markers that may prefix the first LFH.- Throws:
IOException
-
processZip64Extra
Records whether a Zip64 extra is present and sets the size information from it if sizes are 0xFFFFFFFF and the entry doesn't use a data descriptor.- Throws:
ZipException
-
getNextEntry
Description copied from class:ArchiveInputStream
Returns the next Archive Entry in this Stream.- Specified by:
getNextEntry
in classArchiveInputStream
- Returns:
- the next entry,
or
null
if there are no more entries - Throws:
IOException
- if the next entry could not be read
-
canReadEntryData
Whether this class is able to read the given entry.May return false if it is set up to use encryption or a compression method that hasn't been implemented yet.
- Overrides:
canReadEntryData
in classArchiveInputStream
- Parameters:
ae
- the entry to test- Returns:
- This implementation always returns true.
- Since:
- 1.1
-
read
- Overrides:
read
in classInputStream
- Throws:
IOException
-
getCompressedCount
public long getCompressedCount()- Specified by:
getCompressedCount
in interfaceInputStreamStatistics
- Returns:
- the amount of raw or compressed bytes read by the stream
- Since:
- 1.17
-
getUncompressedCount
public long getUncompressedCount()- Specified by:
getUncompressedCount
in interfaceInputStreamStatistics
- Returns:
- the amount of decompressed bytes returned by the stream
- Since:
- 1.17
-
readStored
Implementation of read for STORED entries.- Throws:
IOException
-
readDeflated
Implementation of read for DEFLATED entries.- Throws:
IOException
-
readFromInflater
Potentially reads more bytes to fill the inflater's buffer and reads from it.- Throws:
IOException
-
close
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Overrides:
close
in classInputStream
- Throws:
IOException
-
skip
Skips over and discards value bytes of data from this input stream.This implementation may end up skipping over some smaller number of bytes, possibly 0, if and only if it reaches the end of the underlying stream.
The actual number of bytes skipped is returned.
- Overrides:
skip
in classInputStream
- Parameters:
value
- the number of bytes to be skipped.- Returns:
- the actual number of bytes skipped.
- Throws:
IOException
- - if an I/O error occurs.IllegalArgumentException
- - if value is negative.
-
matches
public static boolean matches(byte[] signature, int length) Checks if the signature matches what is expected for a zip file. Does not currently handle self-extracting zips which may have arbitrary leading content.- Parameters:
signature
- the bytes to checklength
- the number of bytes to check- Returns:
- true, if this stream is a zip archive stream, false otherwise
-
checksig
private static boolean checksig(byte[] signature, byte[] expected) -
closeEntry
Closes the current ZIP archive entry and positions the underlying stream to the beginning of the next entry. All per-entry variables and data structures are cleared.If the compressed size of this entry is included in the entry header, then any outstanding bytes are simply skipped from the underlying stream without uncompressing them. This allows an entry to be safely closed even if the compression method is unsupported.
In case we don't know the compressed size of this entry or have already buffered too much data from the underlying stream to support uncompression, then the uncompression process is completed and the end position of the stream is adjusted based on the result of that process.
- Throws:
IOException
- if an error occurs
-
currentEntryHasOutstandingBytes
private boolean currentEntryHasOutstandingBytes()If the compressed size of the current entry is included in the entry header and there are any outstanding bytes in the underlying stream, then this returns true.- Returns:
- true, if current entry is determined to have outstanding bytes, false otherwise
-
drainCurrentEntryData
Read all data of the current entry from the underlying stream that hasn't been read, yet.- Throws:
IOException
-
getBytesInflated
private long getBytesInflated()Get the number of bytes Inflater has actually processed.for Java < Java7 the getBytes* methods in Inflater/Deflater seem to return unsigned ints rather than longs that start over with 0 at 2^32.
The stream knows how many bytes it has read, but not how many the Inflater actually consumed - it should be between the total number of bytes read for the entry and the total number minus the last read operation. Here we just try to make the value close enough to the bytes we've read by assuming the number of bytes consumed must be smaller than (or equal to) the number of bytes read but not smaller by more than 2^32.
-
fill
- Throws:
IOException
-
readFully
- Throws:
IOException
-
readFully
- Throws:
IOException
-
readRange
- Throws:
IOException
-
readDataDescriptor
- Throws:
IOException
-
supportsDataDescriptorFor
Whether this entry requires a data descriptor this library can work with.- Returns:
- true if allowStoredEntriesWithDataDescriptor is true, the entry doesn't require any data descriptor or the method is DEFLATED or ENHANCED_DEFLATED.
-
supportsCompressedSizeFor
Whether the compressed size for the entry is either known or not required by the compression method being used. -
readStoredEntry
Caches a stored entry that uses the data descriptor.- Reads a stored entry until the signature of a local file header, central directory header or data descriptor has been found.
- Stores all entry data in lastStoredEntry.
- Rewinds the stream to position at the data descriptor.
- reads the data descriptor
After calling this method the entry should know its size, the entry's data is cached and the stream is positioned at the next local file or central directory header.
- Throws:
IOException
-
bufferContainsSignature
private boolean bufferContainsSignature(ByteArrayOutputStream bos, int offset, int lastRead, int expectedDDLen) throws IOException Checks whether the current buffer contains the signature of a "data descriptor", "local file header" or "central directory entry".If it contains such a signature, reads the data descriptor and positions the stream right after the data descriptor.
- Throws:
IOException
-
cacheBytesRead
If the last read bytes could hold a data descriptor and an incomplete signature then save the last bytes to the front of the buffer and cache everything in front of the potential data descriptor into the given ByteArrayOutputStream.Data descriptor plus incomplete signature (3 bytes in the worst case) can be 20 bytes max.
-
pushback
- Throws:
IOException
-
skipRemainderOfArchive
Reads the stream until it find the "End of central directory record" and consumes it as well.- Throws:
IOException
-
findEocdRecord
Reads forward until the signature of the "End of central directory" record is found.- Throws:
IOException
-
realSkip
Skips bytes by reading from the underlying stream rather than the (potentially inflating) archive stream - whichskip(long)
would do. Also updates bytes-read counter.- Throws:
IOException
-
readOneByte
Reads bytes by reading from the underlying stream rather than the (potentially inflating) archive stream - whichread(byte[], int, int)
would do. Also updates bytes-read counter.- Throws:
IOException
-
isFirstByteOfEocdSig
private boolean isFirstByteOfEocdSig(int b) -
isApkSigningBlock
Checks whether this might be an APK Signing Block.Unfortunately the APK signing block does not start with some kind of signature, it rather ends with one. It starts with a length, so what we do is parse the suspect length, skip ahead far enough, look for the signature and if we've found it, return true.
- Parameters:
suspectLocalFileHeader
- the bytes read from the underlying stream in the expectation that they would hold the local file header of the next entry.- Returns:
- true if this looks like a APK signing block
- Throws:
IOException
- See Also:
-