java.lang.Object
org.apache.commons.compress.archivers.tar.TarFile
All Implemented Interfaces:
Closeable, AutoCloseable

public class TarFile extends Object implements Closeable
The TarFile provides random access to UNIX archives.
Since:
1.21
  • Field Details

    • SMALL_BUFFER_SIZE

      private static final int SMALL_BUFFER_SIZE
      See Also:
    • smallBuf

      private final byte[] smallBuf
    • archive

      private final SeekableByteChannel archive
    • zipEncoding

      private final ZipEncoding zipEncoding
      The encoding of the tar file
    • entries

      private final LinkedList<TarArchiveEntry> entries
    • blockSize

      private final int blockSize
    • lenient

      private final boolean lenient
    • recordSize

      private final int recordSize
    • recordBuffer

      private final ByteBuffer recordBuffer
    • globalSparseHeaders

      private final List<TarArchiveStructSparse> globalSparseHeaders
    • hasHitEOF

      private boolean hasHitEOF
    • currEntry

      private TarArchiveEntry currEntry
      The meta-data about the current entry
    • globalPaxHeaders

      private Map<String,String> globalPaxHeaders
    • sparseInputStreams

      private final Map<String,List<InputStream>> sparseInputStreams
  • Constructor Details

    • TarFile

      public TarFile(byte[] content) throws IOException
      Constructor for TarFile.
      Parameters:
      content - the content to use
      Throws:
      IOException - when reading the tar archive fails
    • TarFile

      public TarFile(byte[] content, String encoding) throws IOException
      Constructor for TarFile.
      Parameters:
      content - the content to use
      encoding - the encoding to use
      Throws:
      IOException - when reading the tar archive fails
    • TarFile

      public TarFile(byte[] content, boolean lenient) throws IOException
      Constructor for TarFile.
      Parameters:
      content - the content to use
      lenient - when set to true illegal values for group/userid, mode, device numbers and timestamp will be ignored and the fields set to TarArchiveEntry.UNKNOWN. When set to false such illegal fields cause an exception instead.
      Throws:
      IOException - when reading the tar archive fails
    • TarFile

      public TarFile(File archive) throws IOException
      Constructor for TarFile.
      Parameters:
      archive - the file of the archive to use
      Throws:
      IOException - when reading the tar archive fails
    • TarFile

      public TarFile(File archive, String encoding) throws IOException
      Constructor for TarFile.
      Parameters:
      archive - the file of the archive to use
      encoding - the encoding to use
      Throws:
      IOException - when reading the tar archive fails
    • TarFile

      public TarFile(File archive, boolean lenient) throws IOException
      Constructor for TarFile.
      Parameters:
      archive - the file of the archive to use
      lenient - when set to true illegal values for group/userid, mode, device numbers and timestamp will be ignored and the fields set to TarArchiveEntry.UNKNOWN. When set to false such illegal fields cause an exception instead.
      Throws:
      IOException - when reading the tar archive fails
    • TarFile

      public TarFile(Path archivePath) throws IOException
      Constructor for TarFile.
      Parameters:
      archivePath - the path of the archive to use
      Throws:
      IOException - when reading the tar archive fails
    • TarFile

      public TarFile(Path archivePath, String encoding) throws IOException
      Constructor for TarFile.
      Parameters:
      archivePath - the path of the archive to use
      encoding - the encoding to use
      Throws:
      IOException - when reading the tar archive fails
    • TarFile

      public TarFile(Path archivePath, boolean lenient) throws IOException
      Constructor for TarFile.
      Parameters:
      archivePath - the path of the archive to use
      lenient - when set to true illegal values for group/userid, mode, device numbers and timestamp will be ignored and the fields set to TarArchiveEntry.UNKNOWN. When set to false such illegal fields cause an exception instead.
      Throws:
      IOException - when reading the tar archive fails
    • TarFile

      public TarFile(SeekableByteChannel content) throws IOException
      Constructor for TarFile.
      Parameters:
      content - the content to use
      Throws:
      IOException - when reading the tar archive fails
    • TarFile

      public TarFile(SeekableByteChannel archive, int blockSize, int recordSize, String encoding, boolean lenient) throws IOException
      Constructor for TarFile.
      Parameters:
      archive - the seekable byte channel to use
      blockSize - the blocks size to use
      recordSize - the record size to use
      encoding - the encoding to use
      lenient - when set to true illegal values for group/userid, mode, device numbers and timestamp will be ignored and the fields set to TarArchiveEntry.UNKNOWN. When set to false such illegal fields cause an exception instead.
      Throws:
      IOException - when reading the tar archive fails
  • Method Details

    • getNextTarEntry

      private TarArchiveEntry getNextTarEntry() throws IOException
      Get the next entry in this tar archive. This will skip to the end of the current entry, if there is one, and place the position of the channel at the header of the next entry, and read the header and instantiate a new TarEntry from the header bytes and return that entry. If there are no more entries in the archive, null will be returned to indicate that the end of the archive has been reached.
      Returns:
      The next TarEntry in the archive, or null if there is no next entry.
      Throws:
      IOException - when reading the next TarEntry fails
    • readOldGNUSparse

      private void readOldGNUSparse() throws IOException
      Adds the sparse chunks from the current entry to the sparse chunks, including any additional sparse entries following the current entry.
      Throws:
      IOException - when reading the sparse entry fails
    • buildSparseInputStreams

      private void buildSparseInputStreams() throws IOException
      Build the input streams consisting of all-zero input streams and non-zero input streams. When reading from the non-zero input streams, the data is actually read from the original input stream. The size of each input stream is introduced by the sparse headers.
      Throws:
      IOException
    • applyPaxHeadersToCurrentEntry

      private void applyPaxHeadersToCurrentEntry(Map<String,String> headers, List<TarArchiveStructSparse> sparseHeaders) throws IOException
      Update the current entry with the read pax headers
      Parameters:
      headers - Headers read from the pax header
      sparseHeaders - Sparse headers read from pax header
      Throws:
      IOException
    • paxHeaders

      private void paxHeaders() throws IOException

      For PAX Format 0.0, the sparse headers(GNU.sparse.offset and GNU.sparse.numbytes) may appear multi times, and they look like:

       GNU.sparse.size=size
       GNU.sparse.numblocks=numblocks
       repeat numblocks times
         GNU.sparse.offset=offset
         GNU.sparse.numbytes=numbytes
       end repeat
       

      For PAX Format 0.1, the sparse headers are stored in a single variable : GNU.sparse.map

       GNU.sparse.map
          Map of non-null data chunks. It is a string consisting of comma-separated values "offset,size[,offset-1,size-1...]"
       

      For PAX Format 1.X:
      The sparse map itself is stored in the file data block, preceding the actual file data. It consists of a series of decimal numbers delimited by newlines. The map is padded with nulls to the nearest block boundary. The first number gives the number of entries in the map. Following are map entries, each one consisting of two numbers giving the offset and size of the data block it describes.

      Throws:
      IOException
    • readGlobalPaxHeaders

      private void readGlobalPaxHeaders() throws IOException
      Throws:
      IOException
    • getLongNameData

      private byte[] getLongNameData() throws IOException
      Get the next entry in this tar archive as longname data.
      Returns:
      The next entry in the archive as longname data, or null.
      Throws:
      IOException - on error
    • skipRecordPadding

      private void skipRecordPadding() throws IOException
      The last record block should be written at the full size, so skip any additional space used to fill a record after an entry
      Throws:
      IOException - when skipping the padding of the record fails
    • repositionForwardTo

      private void repositionForwardTo(long newPosition) throws IOException
      Throws:
      IOException
    • repositionForwardBy

      private void repositionForwardBy(long offset) throws IOException
      Throws:
      IOException
    • throwExceptionIfPositionIsNotInArchive

      private void throwExceptionIfPositionIsNotInArchive() throws IOException
      Checks if the current position of the SeekableByteChannel is in the archive.
      Throws:
      IOException - If the position is not in the archive
    • getRecord

      private ByteBuffer getRecord() throws IOException
      Get the next record in this tar archive. This will skip over any remaining data in the current entry, if there is one, and place the input stream at the header of the next entry.

      If there are no more entries in the archive, null will be returned to indicate that the end of the archive has been reached. At the same time the hasHitEOF marker will be set to true.

      Returns:
      The next TarEntry in the archive, or null if there is no next entry.
      Throws:
      IOException - when reading the next TarEntry fails
    • tryToConsumeSecondEOFRecord

      private void tryToConsumeSecondEOFRecord() throws IOException
      Tries to read the next record resetting the position in the archive if it is not a EOF record.

      This is meant to protect against cases where a tar implementation has written only one EOF record when two are expected. Actually this won't help since a non-conforming implementation likely won't fill full blocks consisting of - by default - ten records either so we probably have already read beyond the archive anyway.

      Throws:
      IOException - if reading the record of resetting the position in the archive fails
    • consumeRemainderOfLastBlock

      private void consumeRemainderOfLastBlock() throws IOException
      This method is invoked once the end of the archive is hit, it tries to consume the remaining bytes under the assumption that the tool creating this archive has padded the last block.
      Throws:
      IOException
    • readRecord

      private ByteBuffer readRecord() throws IOException
      Read a record from the input stream and return the data.
      Returns:
      The record data or null if EOF has been hit.
      Throws:
      IOException - if reading from the archive fails
    • getEntries

      public List<TarArchiveEntry> getEntries()
      Get all TAR Archive Entries from the TarFile
      Returns:
      All entries from the tar file
    • isEOFRecord

      private boolean isEOFRecord(ByteBuffer headerBuf)
    • isAtEOF

      protected final boolean isAtEOF()
    • setAtEOF

      protected final void setAtEOF(boolean b)
    • isDirectory

      private boolean isDirectory()
    • getInputStream

      public InputStream getInputStream(TarArchiveEntry entry) throws IOException
      Gets the input stream for the provided Tar Archive Entry.
      Parameters:
      entry - Entry to get the input stream from
      Returns:
      Input stream of the provided entry
      Throws:
      IOException - Corrupted TAR archive. Can't read entry.
    • close

      public void close() throws IOException
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException