Class ScanLifecycleBuilder
- Direct Known Subclasses:
FileScanLifecycleBuilder
ScanLifecycle
then builds a scan lifecycle
instance.
This framework is a bridge between operator logic and the scan internals. It gathers scan-specific options in a builder abstraction, then passes them on the scan lifecycle at the right time. By abstracting out this plumbing, a scan batch creator simply chooses the proper framework builder, passes config options, and implements the matching "managed reader" and factory. All details of setup, projection, and so on are handled by the framework and the components that the framework builds upon.
Inputs
At this basic level, a scan framework requires just a few simple inputs:- The options defined by the scan projection framework such as the projection list.
- A reader factory to create a reader for each of the files or blocks to be scanned. (Readers are expected to be created one-by-one as files are read.)
- The operator context which provides access to a memory allocator and other plumbing items.
In practice, there are other options to fine tune behavior (provided schema, custom error context, various limits, etc.)
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
static interface
-
Field Summary
Modifier and TypeFieldDescriptionprotected boolean
protected boolean
Option to disable schema changes.static final int
static final int
protected TupleMetadata
protected boolean
Option to disable empty results.protected boolean
Option that enables whether the scan operator starts with an empty schema-only batch (the so-called "fast schema" that Drill once tried to provide) or starts with a non-empty data batch (which appears to be the standard since the "Empty Batches" project some time back.) See more details inOperatorDriver
Javadoc.protected CustomErrorContext
Context for error messages.static final int
static final int
static final int
protected TypeProtos.MajorType
protected TupleMetadata
protected ScanLifecycleBuilder.SchemaValidator
Optional schema validator to perform per-scan checks of the projection or resolved schema.protected String
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionboolean
void
allowRequiredNullColumns
(boolean flag) boolean
void
allowSchemaChange
(boolean flag) void
batchByteLimit
(int byteLimit) void
batchRecordLimit
(int batchRecordLimit) Specify a custom batch record count.build
(OperatorContext context) buildScanOperator
(FragmentContext fragContext, PhysicalOperator pop) void
definedSchema
(TupleMetadata definedSchema) void
disableEmptyResults
(boolean option) void
enableSchemaBatch
(boolean option) void
errorContext
(CustomErrorContext context) long
limit()
void
limit
(long limit) nullType()
void
nullType
(TypeProtos.MajorType nullType) Specify the type to use for null columns in place of the standard nullable int.options()
void
void
projection
(List<SchemaPath> projection) void
providedSchema
(TupleMetadata providedSchema) void
readerFactory
(ReaderFactory<?> readerFactory) int
int
void
schemaValidator
(ScanLifecycleBuilder.SchemaValidator schemaValidator) userName()
void
-
Field Details
-
MIN_BATCH_BYTE_SIZE
public static final int MIN_BATCH_BYTE_SIZE- See Also:
-
MAX_BATCH_BYTE_SIZE
public static final int MAX_BATCH_BYTE_SIZE- See Also:
-
DEFAULT_BATCH_ROW_COUNT
public static final int DEFAULT_BATCH_ROW_COUNT- See Also:
-
DEFAULT_BATCH_BYTE_COUNT
public static final int DEFAULT_BATCH_BYTE_COUNT -
MAX_BATCH_ROW_COUNT
public static final int MAX_BATCH_ROW_COUNT- See Also:
-
userName
-
nullType
-
allowRequiredNullColumns
protected boolean allowRequiredNullColumns -
definedSchema
-
providedSchema
-
enableSchemaBatch
protected boolean enableSchemaBatchOption that enables whether the scan operator starts with an empty schema-only batch (the so-called "fast schema" that Drill once tried to provide) or starts with a non-empty data batch (which appears to be the standard since the "Empty Batches" project some time back.) See more details inOperatorDriver
Javadoc.Defaults to false, meaning to not provide the empty schema batch. DRILL-7305 explains that many operators fail when presented with an empty batch, so do not enable this feature until those issues are fixed. Of course, do enable the feature if you want to track down the DRILL-7305 bugs.
-
disableEmptyResults
protected boolean disableEmptyResultsOption to disable empty results. An empty result occurs if no reader has any data, but at least one reader can provide a schema. In this case, the scan can return a single, empty batch, with an associated schema. This is the correct SQL result for an empty query. However, if this result triggers empty-batch bugs in other operators, we can, instead, disable this feature and return a null result set: no schema, no batch, just a "fast NONE", an immediate return of NONE from the Volcano iterator.Disabling this option is not desirable: it means that the user gets no schema for queries that should be able to return one. So, disable this option only if we cannot find or fix empty-batch bugs.
-
allowSchemaChange
protected boolean allowSchemaChangeOption to disable schema changes. Iffalse
, then the first batch commits the scan to a single, unchanged schema. Iftrue
(the legacy default), then each batch or reader can change the schema, even though downstream operators generally cannot handle a schema change. The goal is to evolve all readers so that they do not generate schema changes. -
schemaValidator
Optional schema validator to perform per-scan checks of the projection or resolved schema. -
errorContext
Context for error messages.
-
-
Constructor Details
-
ScanLifecycleBuilder
public ScanLifecycleBuilder()
-
-
Method Details
-
options
-
options
-
readerFactory
-
userName
-
userName
-
batchRecordLimit
public void batchRecordLimit(int batchRecordLimit) Specify a custom batch record count. This is the maximum number of records per batch for this scan. Readers can adjust this, but the adjustment is capped at the value specified here- Parameters:
batchRecordLimit
- maximum records per batch
-
batchByteLimit
public void batchByteLimit(int byteLimit) -
nullType
Specify the type to use for null columns in place of the standard nullable int. This type is used for all missing columns. (Readers that need per-column control need a different mechanism.)- Parameters:
nullType
- the type to use for null columns
-
allowRequiredNullColumns
public void allowRequiredNullColumns(boolean flag) -
allowRequiredNullColumns
public boolean allowRequiredNullColumns() -
allowSchemaChange
public void allowSchemaChange(boolean flag) -
allowSchemaChange
public boolean allowSchemaChange() -
projection
-
enableSchemaBatch
public void enableSchemaBatch(boolean option) -
disableEmptyResults
public void disableEmptyResults(boolean option) -
definedSchema
-
definedSchema
-
providedSchema
-
providedSchema
-
errorContext
-
errorContext
-
projection
-
scanBatchRecordLimit
public int scanBatchRecordLimit() -
scanBatchByteLimit
public int scanBatchByteLimit() -
nullType
-
readerFactory
-
schemaValidator
-
schemaValidator
-
limit
public void limit(long limit) -
limit
public long limit() -
build
-
buildScan
-
buildScanOperator
-