Class ScanLifecycleBuilder
- Direct Known Subclasses:
FileScanLifecycleBuilder
ScanLifecycle then builds a scan lifecycle
instance.
This framework is a bridge between operator logic and the scan internals. It gathers scan-specific options in a builder abstraction, then passes them on the scan lifecycle at the right time. By abstracting out this plumbing, a scan batch creator simply chooses the proper framework builder, passes config options, and implements the matching "managed reader" and factory. All details of setup, projection, and so on are handled by the framework and the components that the framework builds upon.
Inputs
At this basic level, a scan framework requires just a few simple inputs:- The options defined by the scan projection framework such as the projection list.
- A reader factory to create a reader for each of the files or blocks to be scanned. (Readers are expected to be created one-by-one as files are read.)
- The operator context which provides access to a memory allocator and other plumbing items.
In practice, there are other options to fine tune behavior (provided schema, custom error context, various limits, etc.)
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classstatic interface -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected booleanprotected booleanOption to disable schema changes.static final intstatic final intprotected TupleMetadataprotected booleanOption to disable empty results.protected booleanOption that enables whether the scan operator starts with an empty schema-only batch (the so-called "fast schema" that Drill once tried to provide) or starts with a non-empty data batch (which appears to be the standard since the "Empty Batches" project some time back.) See more details inOperatorDriverJavadoc.protected CustomErrorContextContext for error messages.static final intstatic final intstatic final intprotected TypeProtos.MajorTypeprotected TupleMetadataprotected ScanLifecycleBuilder.SchemaValidatorOptional schema validator to perform per-scan checks of the projection or resolved schema.protected String -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionbooleanvoidallowRequiredNullColumns(boolean flag) booleanvoidallowSchemaChange(boolean flag) voidbatchByteLimit(int byteLimit) voidbatchRecordLimit(int batchRecordLimit) Specify a custom batch record count.build(OperatorContext context) buildScanOperator(FragmentContext fragContext, PhysicalOperator pop) voiddefinedSchema(TupleMetadata definedSchema) voiddisableEmptyResults(boolean option) voidenableSchemaBatch(boolean option) voiderrorContext(CustomErrorContext context) longlimit()voidlimit(long limit) nullType()voidnullType(TypeProtos.MajorType nullType) Specify the type to use for null columns in place of the standard nullable int.options()voidvoidprojection(List<SchemaPath> projection) voidprovidedSchema(TupleMetadata providedSchema) voidreaderFactory(ReaderFactory<?> readerFactory) intintvoidschemaValidator(ScanLifecycleBuilder.SchemaValidator schemaValidator) userName()void
-
Field Details
-
MIN_BATCH_BYTE_SIZE
public static final int MIN_BATCH_BYTE_SIZE- See Also:
-
MAX_BATCH_BYTE_SIZE
public static final int MAX_BATCH_BYTE_SIZE- See Also:
-
DEFAULT_BATCH_ROW_COUNT
public static final int DEFAULT_BATCH_ROW_COUNT- See Also:
-
DEFAULT_BATCH_BYTE_COUNT
public static final int DEFAULT_BATCH_BYTE_COUNT -
MAX_BATCH_ROW_COUNT
public static final int MAX_BATCH_ROW_COUNT- See Also:
-
userName
-
nullType
-
allowRequiredNullColumns
protected boolean allowRequiredNullColumns -
definedSchema
-
providedSchema
-
enableSchemaBatch
protected boolean enableSchemaBatchOption that enables whether the scan operator starts with an empty schema-only batch (the so-called "fast schema" that Drill once tried to provide) or starts with a non-empty data batch (which appears to be the standard since the "Empty Batches" project some time back.) See more details inOperatorDriverJavadoc.Defaults to false, meaning to not provide the empty schema batch. DRILL-7305 explains that many operators fail when presented with an empty batch, so do not enable this feature until those issues are fixed. Of course, do enable the feature if you want to track down the DRILL-7305 bugs.
-
disableEmptyResults
protected boolean disableEmptyResultsOption to disable empty results. An empty result occurs if no reader has any data, but at least one reader can provide a schema. In this case, the scan can return a single, empty batch, with an associated schema. This is the correct SQL result for an empty query. However, if this result triggers empty-batch bugs in other operators, we can, instead, disable this feature and return a null result set: no schema, no batch, just a "fast NONE", an immediate return of NONE from the Volcano iterator.Disabling this option is not desirable: it means that the user gets no schema for queries that should be able to return one. So, disable this option only if we cannot find or fix empty-batch bugs.
-
allowSchemaChange
protected boolean allowSchemaChangeOption to disable schema changes. Iffalse, then the first batch commits the scan to a single, unchanged schema. Iftrue(the legacy default), then each batch or reader can change the schema, even though downstream operators generally cannot handle a schema change. The goal is to evolve all readers so that they do not generate schema changes. -
schemaValidator
Optional schema validator to perform per-scan checks of the projection or resolved schema. -
errorContext
Context for error messages.
-
-
Constructor Details
-
ScanLifecycleBuilder
public ScanLifecycleBuilder()
-
-
Method Details
-
options
-
options
-
readerFactory
-
userName
-
userName
-
batchRecordLimit
public void batchRecordLimit(int batchRecordLimit) Specify a custom batch record count. This is the maximum number of records per batch for this scan. Readers can adjust this, but the adjustment is capped at the value specified here- Parameters:
batchRecordLimit- maximum records per batch
-
batchByteLimit
public void batchByteLimit(int byteLimit) -
nullType
Specify the type to use for null columns in place of the standard nullable int. This type is used for all missing columns. (Readers that need per-column control need a different mechanism.)- Parameters:
nullType- the type to use for null columns
-
allowRequiredNullColumns
public void allowRequiredNullColumns(boolean flag) -
allowRequiredNullColumns
public boolean allowRequiredNullColumns() -
allowSchemaChange
public void allowSchemaChange(boolean flag) -
allowSchemaChange
public boolean allowSchemaChange() -
projection
-
enableSchemaBatch
public void enableSchemaBatch(boolean option) -
disableEmptyResults
public void disableEmptyResults(boolean option) -
definedSchema
-
definedSchema
-
providedSchema
-
providedSchema
-
errorContext
-
errorContext
-
projection
-
scanBatchRecordLimit
public int scanBatchRecordLimit() -
scanBatchByteLimit
public int scanBatchByteLimit() -
nullType
-
readerFactory
-
schemaValidator
-
schemaValidator
-
limit
public void limit(long limit) -
limit
public long limit() -
build
-
buildScan
-
buildScanOperator
-