public class ReaderLifecycle extends Object implements RowBatchReader
RowBatchReader protocol based on three methods, and converts it to
the two-method protocol of the managed reader. The open() call of the
RowBatchReader is combined with the constructor of the
ManagedReader, enforcing the rule that the managed reader is created
just-in-time when it is to be used, which avoids accidentally holding
resources for the life of the scan. Also allows most of the reader's fields
to be final.
Coordinates the components that wrap a reader to create the final output batch:
This class coordinates the reader-visible aspects of the scan:
SchemaNegotiator (or subclass) which provides schema-related
input to the reader and which creates the reader's ResultSetLoader,
among other tasks. The schema negotiator is specific to each kind of scan and
is thus created via the ScanLifecycleBuilder.
The reader is schema-driven. See ScanSchemaTracker for an overview.
The framework handles the projection task so the reader does not have to worry about it. Reading an unwanted column is low cost: the result set loader will have provided a "dummy" column writer that simply discards the value. This is just as fast as having the reader use if-statements or a table to determine which columns to save.
| Modifier and Type | Field and Description |
|---|---|
protected TupleMetadata |
readerInputSchema |
protected ResultSetLoader |
tableLoader |
| Constructor and Description |
|---|
ReaderLifecycle(ScanLifecycle scanLifecycle) |
| Modifier and Type | Method and Description |
|---|---|
ResultSetLoader |
buildLoader() |
void |
close()
Release resources.
|
boolean |
defineSchema()
Called for the first reader within a scan.
|
CustomErrorContext |
errorContext() |
MissingColumnHandlerBuilder |
missingColumnsBuilder(TupleMetadata readerSchema) |
String |
name()
Name used when reporting errors.
|
boolean |
next()
Read the next batch.
|
boolean |
open()
Setup the record reader.
|
VectorContainer |
output()
Return the container with the reader's output.
|
TupleMetadata |
readerInputSchema() |
TupleMetadata |
readerOutputSchema() |
ScanLifecycle |
scanLifecycle() |
ScanLifecycleBuilder |
scanOptions() |
ScanSchemaTracker |
schemaTracker() |
int |
schemaVersion()
Return the version of the schema returned by
RowBatchReader.output(). |
ResultSetLoader |
tableLoader() |
protected final TupleMetadata readerInputSchema
protected ResultSetLoader tableLoader
public ReaderLifecycle(ScanLifecycle scanLifecycle)
public ScanLifecycle scanLifecycle()
public TupleMetadata readerInputSchema()
public CustomErrorContext errorContext()
public ScanSchemaTracker schemaTracker()
public ScanLifecycleBuilder scanOptions()
public String name()
RowBatchReadername in interface RowBatchReaderpublic ResultSetLoader tableLoader()
public boolean open()
RowBatchReaderopen in interface RowBatchReaderpublic ResultSetLoader buildLoader()
public boolean defineSchema()
RowBatchReaderThis step is optional and is purely for performance.
defineSchema in interface RowBatchReaderpublic boolean next()
RowBatchReaderThis somewhat complex protocol avoids the need to allocate a final batch just to find out that no more data is available; it allows EOF to be returned along with the final batch.
next in interface RowBatchReaderpublic MissingColumnHandlerBuilder missingColumnsBuilder(TupleMetadata readerSchema)
public TupleMetadata readerOutputSchema()
public VectorContainer output()
RowBatchReaderRowBatchReader.open(). If the data source
can provide a schema at open time, then the reader should provide an
empty batch with the schema set. The scanner will return this schema
downstream to inform other operators of the schema.RowBatchReader.next() to retrieve
the batch produced by that call. (No call is made if next()
returns false.output in interface RowBatchReaderpublic int schemaVersion()
RowBatchReaderRowBatchReader.output(). The schema
is assumed to start at -1 (no schema). The reader is free to use any
numbering system it likes as long as:
If the reader can return a schema on open (so-called "early-schema), then this method must return a non-negative version number, even if the schema happens to be empty (such as reading an empty file.)
However, if the reader cannot return a schema on open (so-called "late schema"), then this method must return -1 (and output() must return null) to indicate now schema is available when called before the first call to next().
No calls will be made to this method before open() after close() or after next() returns false. The implementation is thus not required to handle these cases.
schemaVersion in interface RowBatchReaderpublic void close()
RowBatchReaderclose in interface RowBatchReaderCopyright © 2021 The Apache Software Foundation. All rights reserved.