Class MSBRadixSorter

java.lang.Object
org.apache.lucene.util.Sorter
org.apache.lucene.util.MSBRadixSorter
Direct Known Subclasses:
StableMSBRadixSorter, StringSorter.MSBStringRadixSorter

public abstract class MSBRadixSorter extends Sorter
Radix sorter for variable-length strings. This class sorts based on the most significant byte first and falls back to IntroSorter when the size of the buckets to sort becomes small.

This algorithm is NOT stable. Worst-case memory usage is about 2.3 KB.

  • Field Details

    • LEVEL_THRESHOLD

      protected static final int LEVEL_THRESHOLD
      See Also:
    • HISTOGRAM_SIZE

      protected static final int HISTOGRAM_SIZE
      See Also:
    • LENGTH_THRESHOLD

      protected static final int LENGTH_THRESHOLD
      See Also:
    • histograms

      private final int[][] histograms
    • endOffsets

      private final int[] endOffsets
    • commonPrefix

      private final int[] commonPrefix
    • maxLength

      protected final int maxLength
  • Constructor Details

    • MSBRadixSorter

      protected MSBRadixSorter(int maxLength)
      Sole constructor.
      Parameters:
      maxLength - the maximum length of keys, pass Integer.MAX_VALUE if unknown.
  • Method Details

    • byteAt

      protected abstract int byteAt(int i, int k)
      Return the k-th byte of the entry at index i, or -1 if its length is less than or equal to k. This may only be called with a value of i between 0 included and maxLength excluded.
    • getFallbackSorter

      protected Sorter getFallbackSorter(int k)
      Get a fall-back sorter which may assume that the first k bytes of all compared strings are equal.
    • compare

      protected final int compare(int i, int j)
      Description copied from class: Sorter
      Compare entries found in slots i and j. The contract for the returned value is the same as Comparator.compare(Object, Object).
      Specified by:
      compare in class Sorter
    • sort

      public void sort(int from, int to)
      Description copied from class: Sorter
      Sort the slice which starts at from (inclusive) and ends at to (exclusive).
      Specified by:
      sort in class Sorter
    • sort

      protected void sort(int from, int to, int k, int l)
    • shouldFallback

      protected boolean shouldFallback(int from, int to, int l)
    • radixSort

      private void radixSort(int from, int to, int k, int l)
      Parameters:
      k - the character number to compare
      l - the level of recursion
    • assertHistogram

      private boolean assertHistogram(int commonPrefixLength, int[] histogram)
    • getBucket

      protected int getBucket(int i, int k)
      Return a number for the k-th character between 0 and HISTOGRAM_SIZE.
    • computeCommonPrefixLengthAndBuildHistogram

      private int computeCommonPrefixLengthAndBuildHistogram(int from, int to, int k, int[] histogram)
      Build a histogram of the number of values per bucket and return a common prefix length for all visited values.
      See Also:
    • computeInitialCommonPrefixLength

      private int computeInitialCommonPrefixLength(int from, int k)
    • computeCommonPrefixLengthAndBuildHistogramPart1

      private int computeCommonPrefixLengthAndBuildHistogramPart1(int from, int to, int k, int[] histogram, int commonPrefixLength)
    • computeCommonPrefixLengthAndBuildHistogramPart2

      private int computeCommonPrefixLengthAndBuildHistogramPart2(int from, int to, int k, int[] histogram, int commonPrefixLength, int i)
    • buildHistogram

      protected void buildHistogram(int prefixCommonBucket, int prefixCommonLen, int from, int to, int k, int[] histogram)
      Build an histogram of the k-th characters of values occurring between offsets from and to, using getBucket(int, int).
    • sumHistogram

      private static void sumHistogram(int[] histogram, int[] endOffsets)
      Accumulate values of the histogram so that it does not store counts but start offsets. endOffsets will store the end offsets.
    • reorder

      protected void reorder(int from, int to, int[] startOffsets, int[] endOffsets, int k)
      Reorder based on start/end offsets for each bucket. When this method returns, startOffsets and endOffsets are equal.
      Parameters:
      startOffsets - start offsets per bucket
      endOffsets - end offsets per bucket