Interface ResultSetCopier
- All Known Implementing Classes:
ResultSetCopierImpl
Works to create full output batches to minimize per-batch overhead and to eliminate unnecessary empty batches if no rows are copied.
The output batches are assumed to have the same schema as input batches. (No projection occurs.) The output schema will change each time the input schema changes. (For an SV4, then the upstream operator must have ensured all batches covered by the SV4 have the same schema.)
This implementation works with a single stream of batches which, following Drill's rules, must consist of the same set of vectors on each non-schema-change batch.
Protocol
Overall lifecycle:- Create an instance of the
ResultSetCopierImpl
class, passing the input row set reader to the constructor. - Loop to process each output batch as shown below. That is, continually
process calls to the
BatchIterator.next()
method. - Call
close()
.
To build each output batch:
public IterOutcome next() {
copier.startOutputBatch();
while (!copier.isFull() {
IterOutcome innerResult = inner.next();
if (innerResult == DONE) { break; }
copier.startInputBatch();
copier.copyAllRows();
}
if (copier.hasRows()) {
outputContainer = copier.harvest();
return outputContainer.isSchemaChanged() ? OK_NEW_SCHEMA ? OK;
} else { return DONE; }
}
The above assumes that the upstream operator can be polled multiple times in the DONE state. The extra polling is needed to handle any in-flight copies when the input exhausts its batches.
The above also shows that the copier handles and reports schema changes by setting the schema change flag in the output container. Real code must handle multiple calls to next() in the DONE state, and work around lack of such support in its input (perhaps by tracking a state.)
An input batch is processed by copying the rows. Copying can be done row-by row, via a row range, or by copying the entire input batch as shown in the example. Copying the entire batch make sense when the input batch carries as selection vector that identifies which rows to copy, in which order.
Because we wish to fill the output batch, we may be able to copy part of a batch, the whole batch, or multiple batches to the output.
-
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Release resources, including any pending input batch and any non-harvested output batch.void
Copy all (remaining) input rows to the output.boolean
If copying rows one by one, copy the next row from the input.void
copyRow
(int inputRowIndex) Copy a row at the given position.harvest()
Obtain the output batch.boolean
Reports if the output batch has rows.boolean
Helper method to determine if a copy is pending: more rows remain to be copied.boolean
Reports if the output batch is full and must be sent downstream.boolean
Start the next input batch.void
Start the next output batch.
-
Method Details
-
startOutputBatch
void startOutputBatch()Start the next output batch. -
nextInputBatch
boolean nextInputBatch()Start the next input batch. The input batch must be held by theResultSetReader
passed into the constructor. -
copyNextRow
boolean copyNextRow()If copying rows one by one, copy the next row from the input.- Returns:
- true if more rows remain on the input, false if all rows are exhausted
-
copyRow
void copyRow(int inputRowIndex) Copy a row at the given position. For those cases in which random copying is needed, but a selection vector is not available. Note that this version is slow because of the need to reset indexes for every row. Better to use a selection vector, then copy sequentially.- Parameters:
inputRowIndex
- the input row position. If a selection vector is attached, then this is the selection vector position
-
copyAllRows
void copyAllRows()Copy all (remaining) input rows to the output. If insufficient space exists in the output, does a partial copy, andisCopyPending()
will return true. -
hasOutputRows
boolean hasOutputRows()Reports if the output batch has rows. Useful after the end of input to determine if a partial output batch exists to send downstream.- Returns:
- true if the output batch has one or more rows
-
isOutputFull
boolean isOutputFull()Reports if the output batch is full and must be sent downstream. The output batch can be full in the middle of a copy, in which caseisCopyPending()
will also return true.This function also returns true if a schema change occurred on the latest input row, in which case the partially-completed batch of the old schema must be flushed downstream.
- Returns:
- true if the output is full and must be harvested and sent downstream
-
isCopyPending
boolean isCopyPending()Helper method to determine if a copy is pending: more rows remain to be copied. If so, start a new output batch, which will finish the copy. Do that before start a new input batch.- Returns:
-
harvest
VectorContainer harvest()Obtain the output batch. Returned as a vector container since the output will not have a selection vector.- Returns:
- a vector container holding the output batch
-
close
void close()Release resources, including any pending input batch and any non-harvested output batch.
-