public class ScanOperatorExec extends Object implements OperatorExec
ScanBatch and should be used
by all new scan implementations.
The scan operator itself is simply a framework for handling a set of readers; it knows nothing other than the interfaces of the components it works with; delegating all knowledge of schemas, projection, reading and the like to implementations of those interfaces. Because that work is complex, a set of frameworks exist to handle most common use cases, but a specialized reader can create a framework or reader from scratch.
Error handling in this class is minimal: the enclosing record batch iterator is responsible for handling exceptions. Error handling relies on the fact that the iterator will call close() regardless of which exceptions are thrown.
The ScanOperatorEvents implementation provides the set of readers to
use. This class can simply maintain a list, or can create the reader on
demand.
More subtly, the factory also handles projection issues and manages vectors across subsequent readers. A number of factories are available for the most common cases. Extend these to implement a version specific to a data source.
The RowBatchReader is a surprisingly minimal interface that
nonetheless captures the essence of reading a result set as a set of batches.
The factory implementations mentioned above implement this interface to provide
commonly-used services, the most important of which is access to a
{#link ResultSetLoader} to write values into value vectors.
Readers can discover columns as they read data, such as with any JSON-based format. In this case, the row set mutator also provides a schema version, but a fine-grained one that changes each time a column is added.
The two schema versions serve different purposes and are not interchangeable. For example, if a scan reads two files, both will build up their own schemas, each increasing its internal version number as work proceeds. But, at the end of each batch, the schemas may (and, in fact, should) be identical, which is the schema version downstream operators care about.
SELECT * FROM VALUES()
| Modifier and Type | Field and Description |
|---|---|
protected VectorContainerAccessor |
containerAccessor |
protected OperatorContext |
context |
| Constructor and Description |
|---|
ScanOperatorExec(ScanOperatorEvents factory,
boolean allowEmptyResult) |
| Modifier and Type | Method and Description |
|---|---|
BatchAccessor |
batchAccessor()
Provides a generic access mechanism to the batch's output data.
|
void |
bind(OperatorContext context)
Bind this operator to the context.
|
boolean |
buildSchema()
Retrieves the schema of the batch before the first actual batch
of data.
|
void |
cancel()
Alerts the operator that the query was cancelled.
|
void |
close()
Close the operator by releasing all resources that the operator
held.
|
OperatorContext |
context() |
boolean |
next()
Retrieves the next batch of data.
|
protected final VectorContainerAccessor containerAccessor
protected OperatorContext context
public ScanOperatorExec(ScanOperatorEvents factory, boolean allowEmptyResult)
public void bind(OperatorContext context)
OperatorExecbind in interface OperatorExeccontext - operator contextpublic BatchAccessor batchAccessor()
OperatorExecOperatorExec.buildSchema() and OperatorExec.next(). The batch itself
can be held in a standard VectorContainer, or in some
other structure more convenient for this operator.batchAccessor in interface OperatorExecpublic OperatorContext context()
public boolean buildSchema()
OperatorExecOperatorExec.batchAccessor().buildSchema in interface OperatorExecpublic boolean next()
OperatorExecOperatorExec.batchAccessor() method.next in interface OperatorExecpublic void cancel()
OperatorExeccancel in interface OperatorExecpublic void close()
OperatorExecOperatorExec.cancel() and after OperatorExec.batchAccessor()
or OperatorExec.next() returns false.
Note that there may be a significant delay between the last call to next() and the call to close() during which downstream operators do their work. A tidy operator will release resources immediately after EOF to avoid holding onto memory or other resources that could be used by downstream operators.
close in interface OperatorExecCopyright © 2021 The Apache Software Foundation. All rights reserved.