- All Implemented Interfaces:
Accountable
NOTE: this is a costly operation, as it must merge sort all terms, and may require non-trivial RAM once done. It's better to operate in segment-private ordinal space instead when possible.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static class
private static class
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final long
(package private) final LongValues
(package private) final LongValues
final IndexReader.CacheKey
Cache key of whoever asked for this awful thing(package private) final long
(package private) final OrdinalMap.SegmentMap
(package private) final LongValues[]
(package private) final long
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
Constructor Summary
ConstructorsConstructorDescriptionOrdinalMap
(IndexReader.CacheKey owner, TermsEnum[] subs, OrdinalMap.SegmentMap segmentMap, float acceptableOverheadRatio) Here is how the OrdinalMap encodes the mapping from global ords to local segment ords. -
Method Summary
Modifier and TypeMethodDescriptionstatic OrdinalMap
build
(IndexReader.CacheKey owner, SortedDocValues[] values, float acceptableOverheadRatio) Create an ordinal map that uses the number of unique values of eachSortedDocValues
instance as a weight.static OrdinalMap
build
(IndexReader.CacheKey owner, SortedSetDocValues[] values, float acceptableOverheadRatio) Create an ordinal map that uses the number of unique values of eachSortedSetDocValues
instance as a weight.static OrdinalMap
build
(IndexReader.CacheKey owner, TermsEnum[] subs, long[] weights, float acceptableOverheadRatio) Creates an ordinal map that allows mapping ords to/from a merged space fromsubs
.Returns nested resources of this class.int
getFirstSegmentNumber
(long globalOrd) Given a global ordinal, returns the index of the first segment that contains this term.long
getFirstSegmentOrd
(long globalOrd) Given global ordinal, returns the ordinal of the first segment which contains this ordinal (the corresponding to the segment returngetFirstSegmentNumber(long)
).getGlobalOrds
(int segmentIndex) Given a segment number, return aLongValues
instance that maps segment ordinals to global ordinals.long
Returns the total number of unique terms in global ord space.long
Return the memory usage of this object in bytes.
-
Field Details
-
BASE_RAM_BYTES_USED
private static final long BASE_RAM_BYTES_USED -
owner
Cache key of whoever asked for this awful thing -
valueCount
final long valueCount -
globalOrdDeltas
-
firstSegments
-
segmentToGlobalOrds
-
segmentMap
-
ramBytesUsed
final long ramBytesUsed
-
-
Constructor Details
-
OrdinalMap
OrdinalMap(IndexReader.CacheKey owner, TermsEnum[] subs, OrdinalMap.SegmentMap segmentMap, float acceptableOverheadRatio) throws IOException Here is how the OrdinalMap encodes the mapping from global ords to local segment ords. Assume we have the following global mapping for a doc values field:
bar -> 0, cat -> 1, dog -> 2, foo -> 3
And our index is split into 2 segments with the following local mappings for that same doc values field:
Segment 0: bar -> 0, foo -> 1
Segment 1: cat -> 0, dog -> 1
We will then encode delta between the local and global mapping in a packed 2d array keyed by (segmentIndex, segmentOrd). So the following 2d array will be created by OrdinalMap:
[[0, 2], [1, 1]]The general algorithm for creating an OrdinalMap (skipping over some implementation details and optimizations) is as follows:
[1] Create and populate a PQ with (
TermsEnum
, index) tuples where index is the position of the termEnum in an array of termEnum's sorted by descending size. The PQ itself will be ordered byTermsEnum.term()
[2] We will iterate through every term in the index now. In order to do so, we will start with the first term at the top of the PQ . We keep track of a global ord, and track the difference between the global ord and
TermsEnum.ord()
in ordDeltas, which maps:
(segmentIndex,TermsEnum.ord()
) -> globalTermOrdinal -TermsEnum.ord()
We then callBytesRefIterator.next()
then update the PQ to iterate (remember the PQ maintains and order based onTermsEnum.term()
which changes on the next() calls). If the current term exists in some other segment, the top of the queue will contain that segment. If not, the top of the queue will contain a segment with the next term in the index and the global ord will also be incremented.[3] We use some information gathered in the previous step to perform optimizations on memory usage and building time in the following steps, for more detail on those, look at the code.
[4] We will then populate segmentToGlobalOrds, which maps (segmentIndex, segmentOrd) -> globalOrd. Using the information we tracked in ordDeltas, we can construct this information relatively easily.
- Parameters:
owner
- For caching purposessubs
- A TermsEnum[], where each index corresponds to a segmentsegmentMap
- Provides two maps, newToOld which lists segments in descending 'weight' order (seeOrdinalMap.SegmentMap
for more details) and a oldToNew map which maps each original segment index to their position in newToOldacceptableOverheadRatio
- Acceptable overhead memory usage for some packed data structures- Throws:
IOException
- throws IOException
-
-
Method Details
-
build
public static OrdinalMap build(IndexReader.CacheKey owner, SortedDocValues[] values, float acceptableOverheadRatio) throws IOException Create an ordinal map that uses the number of unique values of eachSortedDocValues
instance as a weight.- Throws:
IOException
- See Also:
-
build
public static OrdinalMap build(IndexReader.CacheKey owner, SortedSetDocValues[] values, float acceptableOverheadRatio) throws IOException Create an ordinal map that uses the number of unique values of eachSortedSetDocValues
instance as a weight.- Throws:
IOException
- See Also:
-
build
public static OrdinalMap build(IndexReader.CacheKey owner, TermsEnum[] subs, long[] weights, float acceptableOverheadRatio) throws IOException Creates an ordinal map that allows mapping ords to/from a merged space fromsubs
.- Parameters:
owner
- a cache keysubs
- TermsEnums that supportTermsEnum.ord()
. They need not be dense (e.g. can be FilteredTermsEnums}.weights
- a weight for each sub. This is ideally correlated with the number of unique terms that each sub introduces compared to the other subs- Throws:
IOException
- if an I/O error occurred.
-
getGlobalOrds
Given a segment number, return aLongValues
instance that maps segment ordinals to global ordinals. -
getFirstSegmentOrd
public long getFirstSegmentOrd(long globalOrd) Given global ordinal, returns the ordinal of the first segment which contains this ordinal (the corresponding to the segment returngetFirstSegmentNumber(long)
). -
getFirstSegmentNumber
public int getFirstSegmentNumber(long globalOrd) Given a global ordinal, returns the index of the first segment that contains this term. -
getValueCount
public long getValueCount()Returns the total number of unique terms in global ord space. -
ramBytesUsed
public long ramBytesUsed()Description copied from interface:Accountable
Return the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsed
in interfaceAccountable
-
getChildResources
Description copied from interface:Accountable
Returns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).- Specified by:
getChildResources
in interfaceAccountable
- See Also:
-