java.lang.Object

org.apache.drill.exec.physical.impl.scan.v3.schema.AbstractSchemaTracker

All Implemented Interfaces:: ScanSchemaTracker

Direct Known Subclasses:: ProjectionSchemaTracker, SchemaBasedTracker

public abstract class AbstractSchemaTracker extends Object implements ScanSchemaTracker

Base class for the projection-based and defined-schema-based scan schema trackers.

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.drill.exec.physical.impl.scan.v3.schema.ScanSchemaTracker
ScanSchemaTracker.ProjectionType
Field Summary

Fields

Modifier and Type

Field

Description

protected final CustomErrorContext

errorContext

protected boolean

isResolved

protected final MutableTupleSchema

schema
Constructor Summary

Constructors

Constructor

Description

AbstractSchemaTracker(CustomErrorContext errorContext)
Method Summary
Modifier and Type

Method

Description

TupleMetadata

applyImplicitCols()

Indicate that implicit column parsing is complete.

protected void

checkResolved()

Determine if the schema is resolved.

CustomErrorContext

errorContext()

The scan-level error context used for errors which may occur before the first reader starts.

MutableTupleSchema

internalSchema()

Returns the internal scan schema.

boolean

isResolved()

Is the scan schema resolved? The schema is resolved depending on the complex lifecycle explained in the class comment.

TupleMetadata

missingColumns(TupleMetadata readerOutputSchema)

Identifies the missing columns given a reader output schema.

TupleMetadata

outputSchema()

Returns the scan output schema which is a somewhat complicated computation that depends on the projection type.

ScanSchemaTracker.ProjectionType

projectionType()

TupleMetadata

readerInputSchema()

The schema which the reader should produce.

void

resolveMissingCols(TupleMetadata missingCols)

The missing column handler obtains the list of missing columns from

invalid reference

#missingColumns()

.

int

schemaVersion()

Gives the output schema version which will start at some arbitrary positive number.

protected static void

validateProjection(TupleMetadata projection, TupleMetadata schema)

Validate a projection list against a defined-schema tuple.
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.drill.exec.physical.impl.scan.v3.schema.ScanSchemaTracker
applyEarlyReaderSchema, applyReaderSchema, columnProjection, expandImplicitCol, projectionFilter

Field Details
- errorContext
  
  protected final CustomErrorContext errorContext
- schema
  
  protected final MutableTupleSchema schema
- isResolved
  
  protected boolean isResolved
Constructor Details
- AbstractSchemaTracker
  
  public AbstractSchemaTracker(CustomErrorContext errorContext)
Method Details
- validateProjection
  
  protected static void validateProjection(TupleMetadata projection, TupleMetadata schema)
  
  Validate a projection list against a defined-schema tuple. Recursively walks the tree of maps to validate all nested tuples.
  
  Parameters:
  
  projection - the parsed projection list
  
  schema - the defined schema to validate against
- projectionType
  
  public ScanSchemaTracker.ProjectionType projectionType()
  
  Specified by:
  
  projectionType in interface ScanSchemaTracker
- errorContext
  
  public CustomErrorContext errorContext()
  
  Description copied from interface: ScanSchemaTracker
  
  The scan-level error context used for errors which may occur before the first reader starts. The reader will provide a more detailed error context that describes what is being read.
  
  Specified by:
  
  errorContext in interface ScanSchemaTracker
  
  Returns:
  
  the scan-level error context
- internalSchema
  
  public MutableTupleSchema internalSchema()
  
  Description copied from interface: ScanSchemaTracker
  
  Returns the internal scan schema. Primarily for testing.
  
  Specified by:
  
  internalSchema in interface ScanSchemaTracker
  
  Returns:
  
  the internal mutable scan schema
- isResolved
  
  public boolean isResolved()
  
  Description copied from interface: ScanSchemaTracker
  
  Is the scan schema resolved? The schema is resolved depending on the complex lifecycle explained in the class comment. Resolution occurs when the wildcard (if any) is expanded, and all explicit projection columns obtain a definite type. If schema change is disabled, the schema will not change once it is resolved. If schema change is allowed, then batches or readers may extend the schema, triggering a schema change, and so the scan schema may move from one resolved state to another.
  The schema will be fully resolved after the first batch of data arrives from a reader (since the reader lifecycle will then fill in any missing columns.) The schema may be resolved sooner (such as if a strict provided schema, or an early reader schema is available and there are no missing columns.)
  
  Specified by:
  
  isResolved in interface ScanSchemaTracker
  
  Returns:
  
  if the schema is resolved, and hence the ScanSchemaTracker.outputSchema() is available, false if the schema contains one or more dynamic columns which are not yet resolved.
- schemaVersion
  
  public int schemaVersion()
  
  Description copied from interface: ScanSchemaTracker
  
  Gives the output schema version which will start at some arbitrary positive number.
  If schema change is allowed, the schema version allows detecting schema changes as the scan schema moves from one resolved state to the next. Each schema will have a unique, increasing version number. A schema change has occurred if the version is newer than the previous output schema version.
  
  Specified by:
  
  schemaVersion in interface ScanSchemaTracker
  
  Returns:
  
  the schema version. The absolute number is not important, rather an increase indicates one or more columns were added at the top level or within a map at some nesting level
- checkResolved
  
  protected void checkResolved()
  
  Determine if the schema is resolved. It is resolved if the schema itself is resolved. Since an empty schema is resolved, for the SELECT * case, we require at least one column, which means that something (provided schema, early reader schema) has provided us with a schema. Once resolved, a schema can never become unresolved: readers are not allowed to add dynamic columns.
- applyImplicitCols
  
  public TupleMetadata applyImplicitCols()
  
  Description copied from interface: ScanSchemaTracker
  
  Indicate that implicit column parsing is complete. Returns the implicit columns as identified by the implicit column handler, in the order of the projection list. Implicit columns do not appear in a reader input schema, and it is an error for the reader to produce such columns.
  
  Specified by:
  
  applyImplicitCols in interface ScanSchemaTracker
  
  Returns:
  
  a sub-schema of only implicit columns, in the order in which they appear in the output schema
- readerInputSchema
  
  public TupleMetadata readerInputSchema()
  
  Description copied from interface: ScanSchemaTracker
  The schema which the reader should produce. Depending on the type of the scan (specifically, if
  
  invalid reference
  
  #isProjectAll()
  
  is true), the reader may produce additional columns beyond those in the the reader input schema. However, for any batch, the reader, plus the missing columns handler, must produce all columns in the reader input schema.
  Formally:
  reader input schema = output schema - implicit col schema
  Specified by:
  
  readerInputSchema in interface ScanSchemaTracker
  
  Returns:
  
  the sub-schema which includes those columns which the reader should provide, excluding implicit columns
- missingColumns
  
  public TupleMetadata missingColumns(TupleMetadata readerOutputSchema)
  
  Description copied from interface: ScanSchemaTracker
  Identifies the missing columns given a reader output schema. The reader output schema are those columns which the reader actually produced.
  Formally:
  missing cols = reader input schema - reader output schema
  
  The reader output schema can contain extra, newly discovered columns. Those are ignored when computing missing columns. Thus, the subtraction is set subtraction: remove columns common to the two sets.
  Specified by:
  
  missingColumns in interface ScanSchemaTracker
- resolveMissingCols
  
  public void resolveMissingCols(TupleMetadata missingCols)
  
  Description copied from interface: ScanSchemaTracker
  The missing column handler obtains the list of missing columns from
  
  invalid reference
  
  #missingColumns()
  
  . Depending on the scan lifecycle, some of the columns may have a type, others may be dynamic. The missing column handler chooses a type for any dynamic columns, then calls this method to tell the scan schema tracker the now-resolved column type.
  Note: a goal of the provided/defined schema system is to avoid the need to guess types for missing columns since doing so quite often leads to problems further downstream in the query. Ideally, the type of missing columns will be known (via the provided or defined schema) to avoid such conflicts.
  Specified by:
  
  resolveMissingCols in interface ScanSchemaTracker
- outputSchema
  
  public TupleMetadata outputSchema()
  
  Description copied from interface: ScanSchemaTracker
  Returns the scan output schema which is a somewhat complicated computation that depends on the projection type.
  For a wildcard schema:
  output schema = implicit cols U reader output schema
  
  For an explicit projection:
  output schema = projection list
  Where the projection list is augmented by types from the provided schema, implicit columns or readers.
  A defined schema is the output schema, so:
  output schema = defined schema
  Specified by:
  
  outputSchema in interface ScanSchemaTracker
  
  Returns:
  
  the complete output schema provided by the scan to downstream operators. Includes both reader and implicit columns, in the order of the projection list or, for a wildcard, in the order of the first reader

Class AbstractSchemaTracker

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.drill.exec.physical.impl.scan.v3.schema.ScanSchemaTracker

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.drill.exec.physical.impl.scan.v3.schema.ScanSchemaTracker

Field Details

errorContext

schema

isResolved

Constructor Details

AbstractSchemaTracker

Method Details

validateProjection

projectionType

errorContext

internalSchema

isResolved

schemaVersion

checkResolved

applyImplicitCols

readerInputSchema

missingColumns

resolveMissingCols

outputSchema