java.lang.Object
org.apache.commons.compress.archivers.zip.ZipFile
All Implemented Interfaces:
Closeable, AutoCloseable

public class ZipFile extends Object implements Closeable
Replacement for java.util.ZipFile.

This class adds support for file name encodings other than UTF-8 (which is required to work on ZIP files created by native zip tools and is able to skip a preamble like the one found in self extracting archives. Furthermore it returns instances of org.apache.commons.compress.archivers.zip.ZipArchiveEntry instead of java.util.zip.ZipEntry.

It doesn't extend java.util.zip.ZipFile as it would have to reimplement all methods anyway. Like java.util.ZipFile, it uses SeekableByteChannel under the covers and supports compressed and uncompressed entries. As of Apache Commons Compress 1.3 it also transparently supports Zip64 extensions and thus individual entries and archives larger than 4 GB or with more than 65536 entries.

The method signatures mimic the ones of java.util.zip.ZipFile, with a couple of exceptions:

  • There is no getName method.
  • entries has been renamed to getEntries.
  • getEntries and getEntry return org.apache.commons.compress.archivers.zip.ZipArchiveEntry instances.
  • close is allowed to throw IOException.
  • Field Details

    • HASH_SIZE

      private static final int HASH_SIZE
      See Also:
    • NIBLET_MASK

      static final int NIBLET_MASK
      See Also:
    • BYTE_SHIFT

      static final int BYTE_SHIFT
      See Also:
    • POS_0

      private static final int POS_0
      See Also:
    • POS_1

      private static final int POS_1
      See Also:
    • POS_2

      private static final int POS_2
      See Also:
    • POS_3

      private static final int POS_3
      See Also:
    • ONE_ZERO_BYTE

      private static final byte[] ONE_ZERO_BYTE
    • entries

      private final List<ZipArchiveEntry> entries
      List of entries in the order they appear inside the central directory.
    • nameMap

      private final Map<String,LinkedList<ZipArchiveEntry>> nameMap
      Maps String to list of ZipArchiveEntrys, name -> actual entries.
    • encoding

      private final String encoding
      The encoding to use for file names and the file comment.

      For a list of possible values see http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html. Defaults to UTF-8.

    • zipEncoding

      private final ZipEncoding zipEncoding
      The zip encoding to use for file names and the file comment.
    • archiveName

      private final String archiveName
      File name of actual source.
    • archive

      private final SeekableByteChannel archive
      The actual data source.
    • useUnicodeExtraFields

      private final boolean useUnicodeExtraFields
      Whether to look for and use Unicode extra fields.
    • closed

      private volatile boolean closed
      Whether the file is closed.
    • isSplitZipArchive

      private final boolean isSplitZipArchive
      Whether the zip archive is a split zip archive
    • dwordBuf

      private final byte[] dwordBuf
    • wordBuf

      private final byte[] wordBuf
    • cfhBuf

      private final byte[] cfhBuf
    • shortBuf

      private final byte[] shortBuf
    • dwordBbuf

      private final ByteBuffer dwordBbuf
    • wordBbuf

      private final ByteBuffer wordBbuf
    • cfhBbuf

      private final ByteBuffer cfhBbuf
    • shortBbuf

      private final ByteBuffer shortBbuf
    • centralDirectoryStartDiskNumber

      private long centralDirectoryStartDiskNumber
    • centralDirectoryStartRelativeOffset

      private long centralDirectoryStartRelativeOffset
    • centralDirectoryStartOffset

      private long centralDirectoryStartOffset
    • CFH_LEN

      private static final int CFH_LEN
      Length of a "central directory" entry structure without file name, extra fields or comment.
      See Also:
    • CFH_SIG

      private static final long CFH_SIG
    • MIN_EOCD_SIZE

      static final int MIN_EOCD_SIZE
      Length of the "End of central directory record" - which is supposed to be the last structure of the archive - without file comment.
      See Also:
    • MAX_EOCD_SIZE

      private static final int MAX_EOCD_SIZE
      Maximum length of the "End of central directory record" with a file comment.
      See Also:
    • CFD_LOCATOR_OFFSET

      private static final int CFD_LOCATOR_OFFSET
      Offset of the field that holds the location of the first central directory entry inside the "End of central directory record" relative to the start of the "End of central directory record".
      See Also:
    • CFD_DISK_OFFSET

      private static final int CFD_DISK_OFFSET
      Offset of the field that holds the disk number of the first central directory entry inside the "End of central directory record" relative to the start of the "End of central directory record".
      See Also:
    • CFD_LOCATOR_RELATIVE_OFFSET

      private static final int CFD_LOCATOR_RELATIVE_OFFSET
      Offset of the field that holds the location of the first central directory entry inside the "End of central directory record" relative to the "number of the disk with the start of the central directory".
      See Also:
    • ZIP64_EOCDL_LENGTH

      private static final int ZIP64_EOCDL_LENGTH
      Length of the "Zip64 end of central directory locator" - which should be right in front of the "end of central directory record" if one is present at all.
      See Also:
    • ZIP64_EOCDL_LOCATOR_OFFSET

      private static final int ZIP64_EOCDL_LOCATOR_OFFSET
      Offset of the field that holds the location of the "Zip64 end of central directory record" inside the "Zip64 end of central directory locator" relative to the start of the "Zip64 end of central directory locator".
      See Also:
    • ZIP64_EOCD_CFD_LOCATOR_OFFSET

      private static final int ZIP64_EOCD_CFD_LOCATOR_OFFSET
      Offset of the field that holds the location of the first central directory entry inside the "Zip64 end of central directory record" relative to the start of the "Zip64 end of central directory record".
      See Also:
    • ZIP64_EOCD_CFD_DISK_OFFSET

      private static final int ZIP64_EOCD_CFD_DISK_OFFSET
      Offset of the field that holds the disk number of the first central directory entry inside the "Zip64 end of central directory record" relative to the start of the "Zip64 end of central directory record".
      See Also:
    • ZIP64_EOCD_CFD_LOCATOR_RELATIVE_OFFSET

      private static final int ZIP64_EOCD_CFD_LOCATOR_RELATIVE_OFFSET
      Offset of the field that holds the location of the first central directory entry inside the "Zip64 end of central directory record" relative to the "number of the disk with the start of the central directory".
      See Also:
    • LFH_OFFSET_FOR_FILENAME_LENGTH

      private static final long LFH_OFFSET_FOR_FILENAME_LENGTH
      Number of bytes in local file header up to the "length of file name" entry.
      See Also:
    • offsetComparator

      private final Comparator<ZipArchiveEntry> offsetComparator
      Compares two ZipArchiveEntries based on their offset within the archive.

      Won't return any meaningful results if one of the entries isn't part of the archive at all.

      Since:
      1.1
  • Constructor Details

    • ZipFile

      public ZipFile(File f) throws IOException
      Opens the given file for reading, assuming "UTF8" for file names.
      Parameters:
      f - the archive.
      Throws:
      IOException - if an error occurs while reading the file.
    • ZipFile

      public ZipFile(String name) throws IOException
      Opens the given file for reading, assuming "UTF8".
      Parameters:
      name - name of the archive.
      Throws:
      IOException - if an error occurs while reading the file.
    • ZipFile

      public ZipFile(String name, String encoding) throws IOException
      Opens the given file for reading, assuming the specified encoding for file names, scanning unicode extra fields.
      Parameters:
      name - name of the archive.
      encoding - the encoding to use for file names, use null for the platform's default encoding
      Throws:
      IOException - if an error occurs while reading the file.
    • ZipFile

      public ZipFile(File f, String encoding) throws IOException
      Opens the given file for reading, assuming the specified encoding for file names and scanning for unicode extra fields.
      Parameters:
      f - the archive.
      encoding - the encoding to use for file names, use null for the platform's default encoding
      Throws:
      IOException - if an error occurs while reading the file.
    • ZipFile

      public ZipFile(File f, String encoding, boolean useUnicodeExtraFields) throws IOException
      Opens the given file for reading, assuming the specified encoding for file names.
      Parameters:
      f - the archive.
      encoding - the encoding to use for file names, use null for the platform's default encoding
      useUnicodeExtraFields - whether to use InfoZIP Unicode Extra Fields (if present) to set the file names.
      Throws:
      IOException - if an error occurs while reading the file.
    • ZipFile

      public ZipFile(File f, String encoding, boolean useUnicodeExtraFields, boolean ignoreLocalFileHeader) throws IOException
      Opens the given file for reading, assuming the specified encoding for file names.

      By default the central directory record and all local file headers of the archive will be read immediately which may take a considerable amount of time when the archive is big. The ignoreLocalFileHeader parameter can be set to true which restricts parsing to the central directory. Unfortunately the local file header may contain information not present inside of the central directory which will not be available when the argument is set to true. This includes the content of the Unicode extra field, so setting ignoreLocalFileHeader to true means useUnicodeExtraFields will be ignored effectively. Also getRawInputStream(org.apache.commons.compress.archivers.zip.ZipArchiveEntry) is always going to return null if ignoreLocalFileHeader is true.

      Parameters:
      f - the archive.
      encoding - the encoding to use for file names, use null for the platform's default encoding
      useUnicodeExtraFields - whether to use InfoZIP Unicode Extra Fields (if present) to set the file names.
      ignoreLocalFileHeader - whether to ignore information stored inside the local file header (see the notes in this method's javadoc)
      Throws:
      IOException - if an error occurs while reading the file.
      Since:
      1.19
    • ZipFile

      public ZipFile(SeekableByteChannel channel) throws IOException
      Opens the given channel for reading, assuming "UTF8" for file names.

      SeekableInMemoryByteChannel allows you to read from an in-memory archive.

      Parameters:
      channel - the archive.
      Throws:
      IOException - if an error occurs while reading the file.
      Since:
      1.13
    • ZipFile

      public ZipFile(SeekableByteChannel channel, String encoding) throws IOException
      Opens the given channel for reading, assuming the specified encoding for file names.

      SeekableInMemoryByteChannel allows you to read from an in-memory archive.

      Parameters:
      channel - the archive.
      encoding - the encoding to use for file names, use null for the platform's default encoding
      Throws:
      IOException - if an error occurs while reading the file.
      Since:
      1.13
    • ZipFile

      public ZipFile(SeekableByteChannel channel, String archiveName, String encoding, boolean useUnicodeExtraFields) throws IOException
      Opens the given channel for reading, assuming the specified encoding for file names.

      SeekableInMemoryByteChannel allows you to read from an in-memory archive.

      Parameters:
      channel - the archive.
      archiveName - name of the archive, used for error messages only.
      encoding - the encoding to use for file names, use null for the platform's default encoding
      useUnicodeExtraFields - whether to use InfoZIP Unicode Extra Fields (if present) to set the file names.
      Throws:
      IOException - if an error occurs while reading the file.
      Since:
      1.13
    • ZipFile

      public ZipFile(SeekableByteChannel channel, String archiveName, String encoding, boolean useUnicodeExtraFields, boolean ignoreLocalFileHeader) throws IOException
      Opens the given channel for reading, assuming the specified encoding for file names.

      SeekableInMemoryByteChannel allows you to read from an in-memory archive.

      By default the central directory record and all local file headers of the archive will be read immediately which may take a considerable amount of time when the archive is big. The ignoreLocalFileHeader parameter can be set to true which restricts parsing to the central directory. Unfortunately the local file header may contain information not present inside of the central directory which will not be available when the argument is set to true. This includes the content of the Unicode extra field, so setting ignoreLocalFileHeader to true means useUnicodeExtraFields will be ignored effectively. Also getRawInputStream(org.apache.commons.compress.archivers.zip.ZipArchiveEntry) is always going to return null if ignoreLocalFileHeader is true.

      Parameters:
      channel - the archive.
      archiveName - name of the archive, used for error messages only.
      encoding - the encoding to use for file names, use null for the platform's default encoding
      useUnicodeExtraFields - whether to use InfoZIP Unicode Extra Fields (if present) to set the file names.
      ignoreLocalFileHeader - whether to ignore information stored inside the local file header (see the notes in this method's javadoc)
      Throws:
      IOException - if an error occurs while reading the file.
      Since:
      1.19
    • ZipFile

      private ZipFile(SeekableByteChannel channel, String archiveName, String encoding, boolean useUnicodeExtraFields, boolean closeOnError, boolean ignoreLocalFileHeader) throws IOException
      Throws:
      IOException
  • Method Details

    • getEncoding

      public String getEncoding()
      The encoding to use for file names and the file comment.
      Returns:
      null if using the platform's default character encoding.
    • close

      public void close() throws IOException
      Closes the archive.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException - if an error occurs closing the archive.
    • closeQuietly

      public static void closeQuietly(ZipFile zipfile)
      close a zipfile quietly; throw no io fault, do nothing on a null parameter
      Parameters:
      zipfile - file to close, can be null
    • getEntries

      public Enumeration<ZipArchiveEntry> getEntries()
      Returns all entries.

      Entries will be returned in the same order they appear within the archive's central directory.

      Returns:
      all entries as ZipArchiveEntry instances
    • getEntriesInPhysicalOrder

      public Enumeration<ZipArchiveEntry> getEntriesInPhysicalOrder()
      Returns all entries in physical order.

      Entries will be returned in the same order their contents appear within the archive.

      Returns:
      all entries as ZipArchiveEntry instances
      Since:
      1.1
    • getEntry

      public ZipArchiveEntry getEntry(String name)
      Returns a named entry - or null if no entry by that name exists.

      If multiple entries with the same name exist the first entry in the archive's central directory by that name is returned.

      Parameters:
      name - name of the entry.
      Returns:
      the ZipArchiveEntry corresponding to the given name - or null if not present.
    • getEntries

      public Iterable<ZipArchiveEntry> getEntries(String name)
      Returns all named entries in the same order they appear within the archive's central directory.
      Parameters:
      name - name of the entry.
      Returns:
      the Iterable<ZipArchiveEntry> corresponding to the given name
      Since:
      1.6
    • getEntriesInPhysicalOrder

      public Iterable<ZipArchiveEntry> getEntriesInPhysicalOrder(String name)
      Returns all named entries in the same order their contents appear within the archive.
      Parameters:
      name - name of the entry.
      Returns:
      the Iterable<ZipArchiveEntry> corresponding to the given name
      Since:
      1.6
    • canReadEntryData

      public boolean canReadEntryData(ZipArchiveEntry ze)
      Whether this class is able to read the given entry.

      May return false if it is set up to use encryption or a compression method that hasn't been implemented yet.

      Parameters:
      ze - the entry
      Returns:
      whether this class is able to read the given entry.
      Since:
      1.1
    • getRawInputStream

      public InputStream getRawInputStream(ZipArchiveEntry ze)
      Expose the raw stream of the archive entry (compressed form).

      This method does not relate to how/if we understand the payload in the stream, since we really only intend to move it on to somewhere else.

      Parameters:
      ze - The entry to get the stream for
      Returns:
      The raw input stream containing (possibly) compressed data.
      Since:
      1.11
    • copyRawEntries

      public void copyRawEntries(ZipArchiveOutputStream target, ZipArchiveEntryPredicate predicate) throws IOException
      Transfer selected entries from this zipfile to a given #ZipArchiveOutputStream. Compression and all other attributes will be as in this file.

      This method transfers entries based on the central directory of the zip file.

      Parameters:
      target - The zipArchiveOutputStream to write the entries to
      predicate - A predicate that selects which entries to write
      Throws:
      IOException - on error
    • getInputStream

      public InputStream getInputStream(ZipArchiveEntry ze) throws IOException
      Returns an InputStream for reading the contents of the given entry.
      Parameters:
      ze - the entry to get the stream for.
      Returns:
      a stream to read the entry from. The returned stream implements InputStreamStatistics.
      Throws:
      IOException - if unable to create an input stream from the zipentry
    • getUnixSymlink

      public String getUnixSymlink(ZipArchiveEntry entry) throws IOException

      Convenience method to return the entry's content as a String if isUnixSymlink() returns true for it, otherwise returns null.

      This method assumes the symbolic link's file name uses the same encoding that as been specified for this ZipFile.

      Parameters:
      entry - ZipArchiveEntry object that represents the symbolic link
      Returns:
      entry's content as a String
      Throws:
      IOException - problem with content's input stream
      Since:
      1.5
    • finalize

      protected void finalize() throws Throwable
      Ensures that the close method of this zipfile is called when there are no more references to it.
      Overrides:
      finalize in class Object
      Throws:
      Throwable
      See Also:
    • populateFromCentralDirectory

      private Map<ZipArchiveEntry,ZipFile.NameAndComment> populateFromCentralDirectory() throws IOException
      Reads the central directory of the given archive and populates the internal tables with ZipArchiveEntry instances.

      The ZipArchiveEntrys will know all data that can be obtained from the central directory alone, but not the data that requires the local file header or additional data to be read.

      Returns:
      a map of zipentries that didn't have the language encoding flag set when read.
      Throws:
      IOException
    • readCentralDirectoryEntry

      private void readCentralDirectoryEntry(Map<ZipArchiveEntry,ZipFile.NameAndComment> noUTF8Flag) throws IOException
      Reads an individual entry of the central directory, creats an ZipArchiveEntry from it and adds it to the global maps.
      Parameters:
      noUTF8Flag - map used to collect entries that don't have their UTF-8 flag set and whose name will be set by data read from the local file header later. The current entry may be added to this map.
      Throws:
      IOException
    • sanityCheckLFHOffset

      private void sanityCheckLFHOffset(ZipArchiveEntry ze) throws IOException
      Throws:
      IOException
    • setSizesAndOffsetFromZip64Extra

      private void setSizesAndOffsetFromZip64Extra(ZipArchiveEntry ze) throws IOException
      If the entry holds a Zip64 extended information extra field, read sizes from there if the entry's sizes are set to 0xFFFFFFFFF, do the same for the offset of the local file header.

      Ensures the Zip64 extra either knows both compressed and uncompressed size or neither of both as the internal logic in ExtraFieldUtils forces the field to create local header data even if they are never used - and here a field with only one size would be invalid.

      Throws:
      IOException
    • positionAtCentralDirectory

      private void positionAtCentralDirectory() throws IOException
      Searches for either the "Zip64 end of central directory locator" or the "End of central dir record", parses it and positions the stream at the first central directory record.
      Throws:
      IOException
    • positionAtCentralDirectory64

      private void positionAtCentralDirectory64() throws IOException
      Parses the "Zip64 end of central directory locator", finds the "Zip64 end of central directory record" using the parsed information, parses that and positions the stream at the first central directory record. Expects stream to be positioned right behind the "Zip64 end of central directory locator"'s signature.
      Throws:
      IOException
    • positionAtCentralDirectory32

      private void positionAtCentralDirectory32() throws IOException
      Parses the "End of central dir record" and positions the stream at the first central directory record. Expects stream to be positioned at the beginning of the "End of central dir record".
      Throws:
      IOException
    • positionAtEndOfCentralDirectoryRecord

      private void positionAtEndOfCentralDirectoryRecord() throws IOException
      Searches for the and positions the stream at the start of the "End of central dir record".
      Throws:
      IOException
    • tryToLocateSignature

      private boolean tryToLocateSignature(long minDistanceFromEnd, long maxDistanceFromEnd, byte[] sig) throws IOException
      Searches the archive backwards from minDistance to maxDistance for the given signature, positions the RandomaccessFile right at the signature if it has been found.
      Throws:
      IOException
    • skipBytes

      private void skipBytes(int count) throws IOException
      Skips the given number of bytes or throws an EOFException if skipping failed.
      Throws:
      IOException
    • resolveLocalFileHeaderData

      private void resolveLocalFileHeaderData(Map<ZipArchiveEntry,ZipFile.NameAndComment> entriesWithoutUTF8Flag) throws IOException
      Walks through all recorded entries and adds the data available from the local file header.

      Also records the offsets for the data to read from the entries.

      Throws:
      IOException
    • fillNameMap

      private void fillNameMap()
    • setDataOffset

      private int[] setDataOffset(ZipArchiveEntry ze) throws IOException
      Throws:
      IOException
    • getDataOffset

      private long getDataOffset(ZipArchiveEntry ze) throws IOException
      Throws:
      IOException
    • startsWithLocalFileHeader

      private boolean startsWithLocalFileHeader() throws IOException
      Checks whether the archive starts with a LFH. If it doesn't, it may be an empty archive.
      Throws:
      IOException
    • createBoundedInputStream

      private BoundedArchiveInputStream createBoundedInputStream(long start, long remaining)
      Creates new BoundedInputStream, according to implementation of underlying archive channel.