public class MergeSortWrapper extends BaseSortWrapper implements SortImpl.SortResults
Since all batches are in memory, we don't want to use the usual merge algorithm as that makes a copy of the original batches (which were read from a spill file) to produce an output batch. Instead, we want to use the in-memory batches as-is. To do this, we use a selection vector 4 (SV4) as a global index into the collection of batches. The SV4 uses the upper two bytes as the batch index, and the lower two as an offset of a record within the batch.
The merger ("M Sorter") populates the SV4 by scanning the set of in-memory batches, searching for the one with the lowest value of the sort key. The batch number and offset are placed into the SV4. The process continues until all records from all batches have an entry in the SV4.
The actual implementation uses an iterative merge to perform the above efficiently.
A sort can only do a single merge. So, we do not attempt to share the generated class; we just generate it internally and discard it at completion of the merge.
The merge sorter only makes sense when we have at least one row. The caller must handle the special case of no rows.
| Modifier and Type | Class and Description |
|---|---|
static class |
MergeSortWrapper.State |
LEFT_MAPPING, MAIN_MAPPING, RIGHT_MAPPINGcontext| Constructor and Description |
|---|
MergeSortWrapper(OperatorContext opContext,
VectorContainer destContainer) |
| Modifier and Type | Method and Description |
|---|---|
void |
close() |
int |
getBatchCount() |
VectorContainer |
getContainer()
Container into which results are delivered.
|
int |
getRecordCount() |
SelectionVector2 |
getSv2() |
SelectionVector4 |
getSv4() |
void |
merge(List<InputBatch> batchGroups,
int outputBatchSize)
Merge the set of in-memory batches to produce a single logical output in the given
destination container, indexed by an SV4.
|
boolean |
next()
The SV4 provides a built-in iterator that returns a virtual set of record
batches so that the downstream operator need not consume the entire set
of accumulated batches in a single step.
|
void |
updateOutputContainer(VectorContainer container,
SelectionVector4 sv4,
RecordBatch.IterOutcome outcome,
BatchSchema schema) |
generateComparisonsgetInstancepublic MergeSortWrapper(OperatorContext opContext, VectorContainer destContainer)
public void merge(List<InputBatch> batchGroups, int outputBatchSize)
batchGroups - the complete set of in-memory batchesoutputBatchSize - output batch size for in-memory mergepublic boolean next()
next in interface SortImpl.SortResultspublic void close()
close in interface SortImpl.SortResultspublic int getBatchCount()
getBatchCount in interface SortImpl.SortResultspublic int getRecordCount()
getRecordCount in interface SortImpl.SortResultspublic SelectionVector4 getSv4()
getSv4 in interface SortImpl.SortResultspublic void updateOutputContainer(VectorContainer container, SelectionVector4 sv4, RecordBatch.IterOutcome outcome, BatchSchema schema)
updateOutputContainer in interface SortImpl.SortResultspublic SelectionVector2 getSv2()
getSv2 in interface SortImpl.SortResultspublic VectorContainer getContainer()
SortImpl.SortResultsgetContainer in interface SortImpl.SortResultsCopyright © 2021 The Apache Software Foundation. All rights reserved.