Class ZipArchiveInputStream

java.lang.Object
java.io.InputStream
org.apache.commons.compress.archivers.ArchiveInputStream
org.apache.commons.compress.archivers.zip.ZipArchiveInputStream
All Implemented Interfaces:
Closeable, AutoCloseable, InputStreamStatistics
Direct Known Subclasses:
JarArchiveInputStream

public class ZipArchiveInputStream extends ArchiveInputStream implements InputStreamStatistics
Implements an input stream that can read Zip archives.

As of Apache Commons Compress it transparently supports Zip64 extensions and thus individual entries and archives larger than 4 GB or with more than 65536 entries.

The ZipFile class is preferred when reading from files as ZipArchiveInputStream is limited by not being able to read the central directory header before returning entries. In particular ZipArchiveInputStream

  • may return entries that are not part of the central directory at all and shouldn't be considered part of the archive.
  • may return several entries with the same name.
  • will not return internal or external attributes.
  • may return incomplete extra field data.
  • may return unknown sizes and CRC values for entries until the next entry has been reached if the archive uses the data descriptor feature.
See Also:
  • Field Details

    • zipEncoding

      private final ZipEncoding zipEncoding
      The zip encoding to use for file names and the file comment.
    • encoding

      final String encoding
    • useUnicodeExtraFields

      private final boolean useUnicodeExtraFields
      Whether to look for and use Unicode extra fields.
    • in

      private final InputStream in
      Wrapped stream, will always be a PushbackInputStream.
    • inf

      private final Inflater inf
      Inflater used for all deflated entries.
    • buf

      private final ByteBuffer buf
      Buffer used to read from the wrapped stream.
    • current

      The entry that is currently being read.
    • closed

      private boolean closed
      Whether the stream has been closed.
    • hitCentralDirectory

      private boolean hitCentralDirectory
      Whether the stream has reached the central directory - and thus found all entries.
    • lastStoredEntry

      private ByteArrayInputStream lastStoredEntry
      When reading a stored entry that uses the data descriptor this stream has to read the full entry and caches it. This is the cache.
    • allowStoredEntriesWithDataDescriptor

      private boolean allowStoredEntriesWithDataDescriptor
      Whether the stream will try to read STORED entries that use a data descriptor. Setting it to true means we will not stop reading a entry with the compressed size, instead we will stoping reading a entry when a data descriptor is met(by finding the Data Descriptor Signature). This will completely break down in some cases - like JARs in WARs.

      See also : https://issues.apache.org/jira/projects/COMPRESS/issues/COMPRESS-555 https://github.com/apache/commons-compress/pull/137#issuecomment-690835644

    • uncompressedCount

      private long uncompressedCount
      Count decompressed bytes for current entry
    • skipSplitSig

      private final boolean skipSplitSig
      Whether the stream will try to skip the zip split signature(08074B50) at the beginning
    • LFH_LEN

      private static final int LFH_LEN
      See Also:
    • CFH_LEN

      private static final int CFH_LEN
      See Also:
    • TWO_EXP_32

      private static final long TWO_EXP_32
      See Also:
    • lfhBuf

      private final byte[] lfhBuf
    • skipBuf

      private final byte[] skipBuf
    • shortBuf

      private final byte[] shortBuf
    • wordBuf

      private final byte[] wordBuf
    • twoDwordBuf

      private final byte[] twoDwordBuf
    • entriesRead

      private int entriesRead
    • USE_ZIPFILE_INSTEAD_OF_STREAM_DISCLAIMER

      private static final String USE_ZIPFILE_INSTEAD_OF_STREAM_DISCLAIMER
      See Also:
    • LFH

      private static final byte[] LFH
    • CFH

      private static final byte[] CFH
    • DD

      private static final byte[] DD
    • APK_SIGNING_BLOCK_MAGIC

      private static final byte[] APK_SIGNING_BLOCK_MAGIC
    • LONG_MAX

      private static final BigInteger LONG_MAX
  • Constructor Details

    • ZipArchiveInputStream

      public ZipArchiveInputStream(InputStream inputStream)
      Create an instance using UTF-8 encoding
      Parameters:
      inputStream - the stream to wrap
    • ZipArchiveInputStream

      public ZipArchiveInputStream(InputStream inputStream, String encoding)
      Create an instance using the specified encoding
      Parameters:
      inputStream - the stream to wrap
      encoding - the encoding to use for file names, use null for the platform's default encoding
      Since:
      1.5
    • ZipArchiveInputStream

      public ZipArchiveInputStream(InputStream inputStream, String encoding, boolean useUnicodeExtraFields)
      Create an instance using the specified encoding
      Parameters:
      inputStream - the stream to wrap
      encoding - the encoding to use for file names, use null for the platform's default encoding
      useUnicodeExtraFields - whether to use InfoZIP Unicode Extra Fields (if present) to set the file names.
    • ZipArchiveInputStream

      public ZipArchiveInputStream(InputStream inputStream, String encoding, boolean useUnicodeExtraFields, boolean allowStoredEntriesWithDataDescriptor)
      Create an instance using the specified encoding
      Parameters:
      inputStream - the stream to wrap
      encoding - the encoding to use for file names, use null for the platform's default encoding
      useUnicodeExtraFields - whether to use InfoZIP Unicode Extra Fields (if present) to set the file names.
      allowStoredEntriesWithDataDescriptor - whether the stream will try to read STORED entries that use a data descriptor
      Since:
      1.1
    • ZipArchiveInputStream

      public ZipArchiveInputStream(InputStream inputStream, String encoding, boolean useUnicodeExtraFields, boolean allowStoredEntriesWithDataDescriptor, boolean skipSplitSig)
      Create an instance using the specified encoding
      Parameters:
      inputStream - the stream to wrap
      encoding - the encoding to use for file names, use null for the platform's default encoding
      useUnicodeExtraFields - whether to use InfoZIP Unicode Extra Fields (if present) to set the file names.
      allowStoredEntriesWithDataDescriptor - whether the stream will try to read STORED entries that use a data descriptor
      skipSplitSig - Whether the stream will try to skip the zip split signature(08074B50) at the beginning. You will need to set this to true if you want to read a split archive.
      Since:
      1.20
  • Method Details

    • getNextZipEntry

      public ZipArchiveEntry getNextZipEntry() throws IOException
      Throws:
      IOException
    • readFirstLocalFileHeader

      private void readFirstLocalFileHeader() throws IOException
      Fills the given array with the first local file header and deals with splitting/spanning markers that may prefix the first LFH.
      Throws:
      IOException
    • processZip64Extra

      private void processZip64Extra(ZipLong size, ZipLong cSize) throws ZipException
      Records whether a Zip64 extra is present and sets the size information from it if sizes are 0xFFFFFFFF and the entry doesn't use a data descriptor.
      Throws:
      ZipException
    • getNextEntry

      public ArchiveEntry getNextEntry() throws IOException
      Description copied from class: ArchiveInputStream
      Returns the next Archive Entry in this Stream.
      Specified by:
      getNextEntry in class ArchiveInputStream
      Returns:
      the next entry, or null if there are no more entries
      Throws:
      IOException - if the next entry could not be read
    • canReadEntryData

      public boolean canReadEntryData(ArchiveEntry ae)
      Whether this class is able to read the given entry.

      May return false if it is set up to use encryption or a compression method that hasn't been implemented yet.

      Overrides:
      canReadEntryData in class ArchiveInputStream
      Parameters:
      ae - the entry to test
      Returns:
      This implementation always returns true.
      Since:
      1.1
    • read

      public int read(byte[] buffer, int offset, int length) throws IOException
      Overrides:
      read in class InputStream
      Throws:
      IOException
    • getCompressedCount

      public long getCompressedCount()
      Specified by:
      getCompressedCount in interface InputStreamStatistics
      Returns:
      the amount of raw or compressed bytes read by the stream
      Since:
      1.17
    • getUncompressedCount

      public long getUncompressedCount()
      Specified by:
      getUncompressedCount in interface InputStreamStatistics
      Returns:
      the amount of decompressed bytes returned by the stream
      Since:
      1.17
    • readStored

      private int readStored(byte[] buffer, int offset, int length) throws IOException
      Implementation of read for STORED entries.
      Throws:
      IOException
    • readDeflated

      private int readDeflated(byte[] buffer, int offset, int length) throws IOException
      Implementation of read for DEFLATED entries.
      Throws:
      IOException
    • readFromInflater

      private int readFromInflater(byte[] buffer, int offset, int length) throws IOException
      Potentially reads more bytes to fill the inflater's buffer and reads from it.
      Throws:
      IOException
    • close

      public void close() throws IOException
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Overrides:
      close in class InputStream
      Throws:
      IOException
    • skip

      public long skip(long value) throws IOException
      Skips over and discards value bytes of data from this input stream.

      This implementation may end up skipping over some smaller number of bytes, possibly 0, if and only if it reaches the end of the underlying stream.

      The actual number of bytes skipped is returned.

      Overrides:
      skip in class InputStream
      Parameters:
      value - the number of bytes to be skipped.
      Returns:
      the actual number of bytes skipped.
      Throws:
      IOException - - if an I/O error occurs.
      IllegalArgumentException - - if value is negative.
    • matches

      public static boolean matches(byte[] signature, int length)
      Checks if the signature matches what is expected for a zip file. Does not currently handle self-extracting zips which may have arbitrary leading content.
      Parameters:
      signature - the bytes to check
      length - the number of bytes to check
      Returns:
      true, if this stream is a zip archive stream, false otherwise
    • checksig

      private static boolean checksig(byte[] signature, byte[] expected)
    • closeEntry

      private void closeEntry() throws IOException
      Closes the current ZIP archive entry and positions the underlying stream to the beginning of the next entry. All per-entry variables and data structures are cleared.

      If the compressed size of this entry is included in the entry header, then any outstanding bytes are simply skipped from the underlying stream without uncompressing them. This allows an entry to be safely closed even if the compression method is unsupported.

      In case we don't know the compressed size of this entry or have already buffered too much data from the underlying stream to support uncompression, then the uncompression process is completed and the end position of the stream is adjusted based on the result of that process.

      Throws:
      IOException - if an error occurs
    • currentEntryHasOutstandingBytes

      private boolean currentEntryHasOutstandingBytes()
      If the compressed size of the current entry is included in the entry header and there are any outstanding bytes in the underlying stream, then this returns true.
      Returns:
      true, if current entry is determined to have outstanding bytes, false otherwise
    • drainCurrentEntryData

      private void drainCurrentEntryData() throws IOException
      Read all data of the current entry from the underlying stream that hasn't been read, yet.
      Throws:
      IOException
    • getBytesInflated

      private long getBytesInflated()
      Get the number of bytes Inflater has actually processed.

      for Java < Java7 the getBytes* methods in Inflater/Deflater seem to return unsigned ints rather than longs that start over with 0 at 2^32.

      The stream knows how many bytes it has read, but not how many the Inflater actually consumed - it should be between the total number of bytes read for the entry and the total number minus the last read operation. Here we just try to make the value close enough to the bytes we've read by assuming the number of bytes consumed must be smaller than (or equal to) the number of bytes read but not smaller by more than 2^32.

    • fill

      private int fill() throws IOException
      Throws:
      IOException
    • readFully

      private void readFully(byte[] b) throws IOException
      Throws:
      IOException
    • readFully

      private void readFully(byte[] b, int off) throws IOException
      Throws:
      IOException
    • readRange

      private byte[] readRange(int len) throws IOException
      Throws:
      IOException
    • readDataDescriptor

      private void readDataDescriptor() throws IOException
      Throws:
      IOException
    • supportsDataDescriptorFor

      private boolean supportsDataDescriptorFor(ZipArchiveEntry entry)
      Whether this entry requires a data descriptor this library can work with.
      Returns:
      true if allowStoredEntriesWithDataDescriptor is true, the entry doesn't require any data descriptor or the method is DEFLATED or ENHANCED_DEFLATED.
    • supportsCompressedSizeFor

      private boolean supportsCompressedSizeFor(ZipArchiveEntry entry)
      Whether the compressed size for the entry is either known or not required by the compression method being used.
    • readStoredEntry

      private void readStoredEntry() throws IOException
      Caches a stored entry that uses the data descriptor.
      • Reads a stored entry until the signature of a local file header, central directory header or data descriptor has been found.
      • Stores all entry data in lastStoredEntry.

      • Rewinds the stream to position at the data descriptor.
      • reads the data descriptor

      After calling this method the entry should know its size, the entry's data is cached and the stream is positioned at the next local file or central directory header.

      Throws:
      IOException
    • bufferContainsSignature

      private boolean bufferContainsSignature(ByteArrayOutputStream bos, int offset, int lastRead, int expectedDDLen) throws IOException
      Checks whether the current buffer contains the signature of a "data descriptor", "local file header" or "central directory entry".

      If it contains such a signature, reads the data descriptor and positions the stream right after the data descriptor.

      Throws:
      IOException
    • cacheBytesRead

      private int cacheBytesRead(ByteArrayOutputStream bos, int offset, int lastRead, int expecteDDLen)
      If the last read bytes could hold a data descriptor and an incomplete signature then save the last bytes to the front of the buffer and cache everything in front of the potential data descriptor into the given ByteArrayOutputStream.

      Data descriptor plus incomplete signature (3 bytes in the worst case) can be 20 bytes max.

    • pushback

      private void pushback(byte[] buf, int offset, int length) throws IOException
      Throws:
      IOException
    • skipRemainderOfArchive

      private void skipRemainderOfArchive() throws IOException
      Reads the stream until it find the "End of central directory record" and consumes it as well.
      Throws:
      IOException
    • findEocdRecord

      private boolean findEocdRecord() throws IOException
      Reads forward until the signature of the "End of central directory" record is found.
      Throws:
      IOException
    • realSkip

      private void realSkip(long value) throws IOException
      Skips bytes by reading from the underlying stream rather than the (potentially inflating) archive stream - which skip(long) would do. Also updates bytes-read counter.
      Throws:
      IOException
    • readOneByte

      private int readOneByte() throws IOException
      Reads bytes by reading from the underlying stream rather than the (potentially inflating) archive stream - which read(byte[], int, int) would do. Also updates bytes-read counter.
      Throws:
      IOException
    • isFirstByteOfEocdSig

      private boolean isFirstByteOfEocdSig(int b)
    • isApkSigningBlock

      private boolean isApkSigningBlock(byte[] suspectLocalFileHeader) throws IOException
      Checks whether this might be an APK Signing Block.

      Unfortunately the APK signing block does not start with some kind of signature, it rather ends with one. It starts with a length, so what we do is parse the suspect length, skip ahead far enough, look for the signature and if we've found it, return true.

      Parameters:
      suspectLocalFileHeader - the bytes read from the underlying stream in the expectation that they would hold the local file header of the next entry.
      Returns:
      true if this looks like a APK signing block
      Throws:
      IOException
      See Also: