public static class PriorityQueueCopierWrapper.BatchMerger extends Object implements SortImpl.SortResults, AutoCloseable
Input. Here the top line is a selection vector of indexes. The second line is a set of batch groups (separated by underscores) with letters indicating individual records:
[3 7 4 8 0 6 1] [5 3 6 8 2 0] [eh_ad_ibf] [r_qm_kn_p]
Output, assuming blocks of 5 records. The brackets represent batches, the line represents the set of batches copied to the spill file.
[abcde] [fhikm] [npqr]
The copying operation does a merge as well: copying values from the sources in ordered fashion. Consider a different example, we want to merge two input batches to produce a single output batch:
Input: [aceg] [bdfh] Output: [abcdefgh]
In the above, the input consists of two sorted batches. (In reality, the input batches have an associated selection vector, but that is omitted here and just the sorted values shown.) The output is a single batch with the merged records (indicated by letters) from the two input batches.
Here we bind the copier to the batchGroupList of sorted, buffered batches to be merged. We bind the copier output to outputContainer: the copier will write its merged "batches" of records to that container.
Calls to the next() method sequentially return merged batches
of the desired row count.
| Modifier and Type | Method and Description |
|---|---|
void |
close() |
int |
getBatchCount() |
VectorContainer |
getContainer()
Container into which results are delivered.
|
long |
getEstBatchSize()
Gets the estimated batch size, in bytes.
|
int |
getRecordCount() |
SelectionVector2 |
getSv2() |
SelectionVector4 |
getSv4() |
boolean |
next()
Read the next merged batch.
|
void |
updateOutputContainer(VectorContainer container,
SelectionVector4 sv4,
RecordBatch.IterOutcome outcome,
BatchSchema schema) |
public boolean next()
next in interface SortImpl.SortResultspublic void close()
close in interface AutoCloseableclose in interface SortImpl.SortResultspublic int getRecordCount()
getRecordCount in interface SortImpl.SortResultspublic int getBatchCount()
getBatchCount in interface SortImpl.SortResultspublic long getEstBatchSize()
public SelectionVector4 getSv4()
getSv4 in interface SortImpl.SortResultspublic void updateOutputContainer(VectorContainer container, SelectionVector4 sv4, RecordBatch.IterOutcome outcome, BatchSchema schema)
updateOutputContainer in interface SortImpl.SortResultspublic SelectionVector2 getSv2()
getSv2 in interface SortImpl.SortResultspublic VectorContainer getContainer()
SortImpl.SortResultsgetContainer in interface SortImpl.SortResultsCopyright © 2021 The Apache Software Foundation. All rights reserved.