Class MergeSortWrapper
- All Implemented Interfaces:
SortImpl.SortResults
Since all batches are in memory, we don't want to use the usual merge algorithm as that makes a copy of the original batches (which were read from a spill file) to produce an output batch. Instead, we want to use the in-memory batches as-is. To do this, we use a selection vector 4 (SV4) as a global index into the collection of batches. The SV4 uses the upper two bytes as the batch index, and the lower two as an offset of a record within the batch.
The merger ("M Sorter") populates the SV4 by scanning the set of in-memory batches, searching for the one with the lowest value of the sort key. The batch number and offset are placed into the SV4. The process continues until all records from all batches have an entry in the SV4.
The actual implementation uses an iterative merge to perform the above efficiently.
A sort can only do a single merge. So, we do not attempt to share the generated class; we just generate it internally and discard it at completion of the merge.
The merge sorter only makes sense when we have at least one row. The caller must handle the special case of no rows.
-
Nested Class Summary
Nested Classes -
Field Summary
Fields inherited from class org.apache.drill.exec.physical.impl.xsort.BaseSortWrapper
LEFT_MAPPING, MAIN_MAPPING, RIGHT_MAPPINGFields inherited from class org.apache.drill.exec.physical.impl.xsort.BaseWrapper
context -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()intContainer into which results are delivered.intgetSv2()getSv4()voidmerge(List<InputBatch> batchGroups, int outputBatchSize) Merge the set of in-memory batches to produce a single logical output in the given destination container, indexed by an SV4.booleannext()The SV4 provides a built-in iterator that returns a virtual set of record batches so that the downstream operator need not consume the entire set of accumulated batches in a single step.voidupdateOutputContainer(VectorContainer container, SelectionVector4 sv4, RecordBatch.IterOutcome outcome, BatchSchema schema) Methods inherited from class org.apache.drill.exec.physical.impl.xsort.BaseSortWrapper
generateComparisonsMethods inherited from class org.apache.drill.exec.physical.impl.xsort.BaseWrapper
getInstance
-
Constructor Details
-
MergeSortWrapper
-
-
Method Details
-
merge
Merge the set of in-memory batches to produce a single logical output in the given destination container, indexed by an SV4.- Parameters:
batchGroups- the complete set of in-memory batchesoutputBatchSize- output batch size for in-memory merge
-
next
public boolean next()The SV4 provides a built-in iterator that returns a virtual set of record batches so that the downstream operator need not consume the entire set of accumulated batches in a single step.- Specified by:
nextin interfaceSortImpl.SortResults
-
close
public void close()- Specified by:
closein interfaceSortImpl.SortResults
-
getBatchCount
public int getBatchCount()- Specified by:
getBatchCountin interfaceSortImpl.SortResults
-
getRecordCount
public int getRecordCount()- Specified by:
getRecordCountin interfaceSortImpl.SortResults
-
getSv4
- Specified by:
getSv4in interfaceSortImpl.SortResults
-
updateOutputContainer
public void updateOutputContainer(VectorContainer container, SelectionVector4 sv4, RecordBatch.IterOutcome outcome, BatchSchema schema) - Specified by:
updateOutputContainerin interfaceSortImpl.SortResults
-
getSv2
- Specified by:
getSv2in interfaceSortImpl.SortResults
-
getContainer
Description copied from interface:SortImpl.SortResultsContainer into which results are delivered. May the the original operator container, or may be a different one. This is the container that should be sent downstream. This is a fixed value for all returned results.- Specified by:
getContainerin interfaceSortImpl.SortResults- Returns:
-