Class ScanLifecycleBuilder

java.lang.Object
org.apache.drill.exec.physical.impl.scan.v3.ScanLifecycleBuilder
Direct Known Subclasses:
FileScanLifecycleBuilder

public class ScanLifecycleBuilder extends Object
Gathers options for the ScanLifecycle then builds a scan lifecycle instance.

This framework is a bridge between operator logic and the scan internals. It gathers scan-specific options in a builder abstraction, then passes them on the scan lifecycle at the right time. By abstracting out this plumbing, a scan batch creator simply chooses the proper framework builder, passes config options, and implements the matching "managed reader" and factory. All details of setup, projection, and so on are handled by the framework and the components that the framework builds upon.

Inputs Link icon

At this basic level, a scan framework requires just a few simple inputs:
  • The options defined by the scan projection framework such as the projection list.
  • A reader factory to create a reader for each of the files or blocks to be scanned. (Readers are expected to be created one-by-one as files are read.)
  • The operator context which provides access to a memory allocator and other plumbing items.

In practice, there are other options to fine tune behavior (provided schema, custom error context, various limits, etc.)

  • Field Details Link icon

    • MIN_BATCH_BYTE_SIZE Link icon

      public static final int MIN_BATCH_BYTE_SIZE
      See Also:
    • MAX_BATCH_BYTE_SIZE Link icon

      public static final int MAX_BATCH_BYTE_SIZE
      See Also:
    • DEFAULT_BATCH_ROW_COUNT Link icon

      public static final int DEFAULT_BATCH_ROW_COUNT
      See Also:
    • DEFAULT_BATCH_BYTE_COUNT Link icon

      public static final int DEFAULT_BATCH_BYTE_COUNT
    • MAX_BATCH_ROW_COUNT Link icon

      public static final int MAX_BATCH_ROW_COUNT
      See Also:
    • userName Link icon

      protected String userName
    • nullType Link icon

      protected TypeProtos.MajorType nullType
    • allowRequiredNullColumns Link icon

      protected boolean allowRequiredNullColumns
    • definedSchema Link icon

      protected TupleMetadata definedSchema
    • providedSchema Link icon

      protected TupleMetadata providedSchema
    • enableSchemaBatch Link icon

      protected boolean enableSchemaBatch
      Option that enables whether the scan operator starts with an empty schema-only batch (the so-called "fast schema" that Drill once tried to provide) or starts with a non-empty data batch (which appears to be the standard since the "Empty Batches" project some time back.) See more details in OperatorDriver Javadoc.

      Defaults to false, meaning to not provide the empty schema batch. DRILL-7305 explains that many operators fail when presented with an empty batch, so do not enable this feature until those issues are fixed. Of course, do enable the feature if you want to track down the DRILL-7305 bugs.

    • disableEmptyResults Link icon

      protected boolean disableEmptyResults
      Option to disable empty results. An empty result occurs if no reader has any data, but at least one reader can provide a schema. In this case, the scan can return a single, empty batch, with an associated schema. This is the correct SQL result for an empty query. However, if this result triggers empty-batch bugs in other operators, we can, instead, disable this feature and return a null result set: no schema, no batch, just a "fast NONE", an immediate return of NONE from the Volcano iterator.

      Disabling this option is not desirable: it means that the user gets no schema for queries that should be able to return one. So, disable this option only if we cannot find or fix empty-batch bugs.

    • allowSchemaChange Link icon

      protected boolean allowSchemaChange
      Option to disable schema changes. If false, then the first batch commits the scan to a single, unchanged schema. If true (the legacy default), then each batch or reader can change the schema, even though downstream operators generally cannot handle a schema change. The goal is to evolve all readers so that they do not generate schema changes.
    • schemaValidator Link icon

      protected ScanLifecycleBuilder.SchemaValidator schemaValidator
      Optional schema validator to perform per-scan checks of the projection or resolved schema.
    • errorContext Link icon

      protected CustomErrorContext errorContext
      Context for error messages.
  • Constructor Details Link icon

    • ScanLifecycleBuilder Link icon

      public ScanLifecycleBuilder()
  • Method Details Link icon

    • options Link icon

      public void options(OptionSet options)
    • options Link icon

      public OptionSet options()
    • readerFactory Link icon

      public void readerFactory(ReaderFactory<?> readerFactory)
    • userName Link icon

      public void userName(String userName)
    • userName Link icon

      public String userName()
    • batchRecordLimit Link icon

      public void batchRecordLimit(int batchRecordLimit)
      Specify a custom batch record count. This is the maximum number of records per batch for this scan. Readers can adjust this, but the adjustment is capped at the value specified here
      Parameters:
      batchRecordLimit - maximum records per batch
    • batchByteLimit Link icon

      public void batchByteLimit(int byteLimit)
    • nullType Link icon

      public void nullType(TypeProtos.MajorType nullType)
      Specify the type to use for null columns in place of the standard nullable int. This type is used for all missing columns. (Readers that need per-column control need a different mechanism.)
      Parameters:
      nullType - the type to use for null columns
    • allowRequiredNullColumns Link icon

      public void allowRequiredNullColumns(boolean flag)
    • allowRequiredNullColumns Link icon

      public boolean allowRequiredNullColumns()
    • allowSchemaChange Link icon

      public void allowSchemaChange(boolean flag)
    • allowSchemaChange Link icon

      public boolean allowSchemaChange()
    • projection Link icon

      public void projection(List<SchemaPath> projection)
    • enableSchemaBatch Link icon

      public void enableSchemaBatch(boolean option)
    • disableEmptyResults Link icon

      public void disableEmptyResults(boolean option)
    • definedSchema Link icon

      public void definedSchema(TupleMetadata definedSchema)
    • definedSchema Link icon

      public TupleMetadata definedSchema()
    • providedSchema Link icon

      public void providedSchema(TupleMetadata providedSchema)
    • providedSchema Link icon

      public TupleMetadata providedSchema()
    • errorContext Link icon

      public void errorContext(CustomErrorContext context)
    • errorContext Link icon

      public CustomErrorContext errorContext()
    • projection Link icon

      public List<SchemaPath> projection()
    • scanBatchRecordLimit Link icon

      public int scanBatchRecordLimit()
    • scanBatchByteLimit Link icon

      public int scanBatchByteLimit()
    • nullType Link icon

      public TypeProtos.MajorType nullType()
    • readerFactory Link icon

      public ReaderFactory<?> readerFactory()
    • schemaValidator Link icon

      public void schemaValidator(ScanLifecycleBuilder.SchemaValidator schemaValidator)
    • schemaValidator Link icon

      public ScanLifecycleBuilder.SchemaValidator schemaValidator()
    • limit Link icon

      public void limit(long limit)
    • limit Link icon

      public long limit()
    • build Link icon

      public ScanLifecycle build(OperatorContext context)
    • buildScan Link icon

      public ScanOperatorExec buildScan()
    • buildScanOperator Link icon

      public OperatorRecordBatch buildScanOperator(FragmentContext fragContext, PhysicalOperator pop)