java.lang.Object
org.apache.commons.compress.compressors.bzip2.BlockSort

class BlockSort extends Object
Encapsulates the Burrows-Wheeler sorting algorithm needed by BZip2CompressorOutputStream.

This class is based on a Java port of Julian Seward's blocksort.c in his libbzip2

The Burrows-Wheeler transform is a reversible transform of the original data that is supposed to group similar bytes close to each other. The idea is to sort all permutations of the input and only keep the last byte of each permutation. E.g. for "Commons Compress" you'd get:

  CompressCommons
 Commons Compress
 CompressCommons
 essCommons Compr
 mmons CompressCo
 mons CompressCom
 mpressCommons Co
 ns CompressCommo
 ommons CompressC
 ompressCommons C
 ons CompressComm
 pressCommons Com
 ressCommons Comp
 s CompressCommon
 sCommons Compres
 ssCommons Compre
 

Which results in a new text "ss romooCCmmpnse", in adition the index of the first line that contained the original text is kept - in this case it is 1. The idea is that in a long English text all permutations that start with "he" are likely suffixes of a "the" and thus they end in "t" leading to a larger block of "t"s that can better be compressed by the subsequent Move-to-Front, run-length und Huffman encoding steps.

For more information see for example:

  • Field Details

    • QSORT_STACK_SIZE

      private static final int QSORT_STACK_SIZE
      See Also:
    • FALLBACK_QSORT_STACK_SIZE

      private static final int FALLBACK_QSORT_STACK_SIZE
      See Also:
    • STACK_SIZE

      private static final int STACK_SIZE
      See Also:
    • workDone

      private int workDone
    • workLimit

      private int workLimit
    • firstAttempt

      private boolean firstAttempt
    • stack_ll

      private final int[] stack_ll
    • stack_hh

      private final int[] stack_hh
    • stack_dd

      private final int[] stack_dd
    • mainSort_runningOrder

      private final int[] mainSort_runningOrder
    • mainSort_copy

      private final int[] mainSort_copy
    • mainSort_bigDone

      private final boolean[] mainSort_bigDone
    • ftab

      private final int[] ftab
    • quadrant

      private final char[] quadrant
      Array instance identical to Data's sfmap, both are used only temporarily and indepently, so we do not need to allocate additional memory.
    • FALLBACK_QSORT_SMALL_THRESH

      private static final int FALLBACK_QSORT_SMALL_THRESH
      See Also:
    • eclass

      private int[] eclass
    • INCS

      private static final int[] INCS
    • SMALL_THRESH

      private static final int SMALL_THRESH
      See Also:
    • DEPTH_THRESH

      private static final int DEPTH_THRESH
      See Also:
    • WORK_FACTOR

      private static final int WORK_FACTOR
      See Also:
    • SETMASK

      private static final int SETMASK
      See Also:
    • CLEARMASK

      private static final int CLEARMASK
      See Also:
  • Constructor Details

  • Method Details

    • blockSort

      void blockSort(BZip2CompressorOutputStream.Data data, int last)
    • fallbackSort

      final void fallbackSort(BZip2CompressorOutputStream.Data data, int last)
      Adapt fallbackSort to the expected interface of the rest of the code, in particular deal with the fact that block starts at offset 1 (in libbzip2 1.0.6 it starts at 0).
    • fallbackSimpleSort

      private void fallbackSimpleSort(int[] fmap, int[] eclass, int lo, int hi)
      Parameters:
      fmap - points to the index of the starting point of a permutation inside the block of data in the current partially sorted order
      eclass - points from the index of a character inside the block to the first index in fmap that contains the bucket of its suffix that is sorted in this step.
      lo - lower boundary of the fmap-interval to be sorted
      hi - upper boundary of the fmap-interval to be sorted
    • fswap

      private void fswap(int[] fmap, int zz1, int zz2)
      swaps two values in fmap
    • fvswap

      private void fvswap(int[] fmap, int yyp1, int yyp2, int yyn)
      swaps two intervals starting at yyp1 and yyp2 of length yyn inside fmap.
    • fmin

      private int fmin(int a, int b)
    • fpush

      private void fpush(int sp, int lz, int hz)
    • fpop

      private int[] fpop(int sp)
    • fallbackQSort3

      private void fallbackQSort3(int[] fmap, int[] eclass, int loSt, int hiSt)
      Parameters:
      fmap - points to the index of the starting point of a permutation inside the block of data in the current partially sorted order
      eclass - points from the index of a character inside the block to the first index in fmap that contains the bucket of its suffix that is sorted in this step.
      loSt - lower boundary of the fmap-interval to be sorted
      hiSt - upper boundary of the fmap-interval to be sorted
    • getEclass

      private int[] getEclass()
    • fallbackSort

      final void fallbackSort(int[] fmap, byte[] block, int nblock)
      Parameters:
      fmap - points to the index of the starting point of a permutation inside the block of data in the current partially sorted order
      block - the original data
      nblock - size of the block
    • mainSimpleSort

      private boolean mainSimpleSort(BZip2CompressorOutputStream.Data dataShadow, int lo, int hi, int d, int lastShadow)
      This is the most hammered method of this class.

      This is the version using unrolled loops. Normally I never use such ones in Java code. The unrolling has shown a noticable performance improvement on JRE 1.4.2 (Linux i586 / HotSpot Client). Of course it depends on the JIT compiler of the vm.

    • vswap

      private static void vswap(int[] fmap, int p1, int p2, int n)
    • med3

      private static byte med3(byte a, byte b, byte c)
    • mainQSort3

      private void mainQSort3(BZip2CompressorOutputStream.Data dataShadow, int loSt, int hiSt, int dSt, int last)
      Method "mainQSort3", file "blocksort.c", BZip2 1.0.2
    • mainSort

      final void mainSort(BZip2CompressorOutputStream.Data dataShadow, int lastShadow)