org.apache.drill.exec.physical.impl.scan.project.ScanLevelProjection

public class ScanLevelProjection extends Object

Parses and analyzes the projection list passed to the scanner. The scanner accepts a projection list and a plugin-specific set of items to read. The scan operator produces a series of output batches, which (in the best case) all have the same schema. Since Drill is "schema on read", in practice batch schema may evolve. The framework tries to "smooth" such changes where possible. An output schema adds another level of stability by specifying the set of columns to project (for wildcard queries) and the types of those columns (for all queries.)

The projection list is per scan, independent of any tables that the scanner might scan. The projection list is then used as input to the per-table projection planning.

Overview

In most query engines, this kind of projection analysis is done at plan time. But, since Drill is schema-on-read, we don't know the available columns, or their types, until we start scanning a table. The table may provide the schema up-front, or may discover it as the read proceeds. Hence, the job here is to make sense of the project list based on static a-priori information, then to create a list that can be further resolved against an table schema when it appears. This give us two steps:

Scan-level projection: this class, that handles schema for the entire scan operator.
Table-level projection: defined elsewhere, that merges the table and scan-level projections.

Accepts the inputs needed to plan a projection, builds the mappings, and constructs the projection mapping object.

Builds the per-scan projection plan given a set of projected columns. Determines the output schema, which columns to project from the data source, which are metadata, and so on.

An annoying aspect of SQL is that the projection list (the list of columns to appear in the output) is specified after the SELECT keyword. In Relational theory, projection is about columns, selection is about rows...

Projection Mappings

Mappings can be based on three primary use cases:

SELECT *: Project all data source columns, whatever they happen to be. Create columns using names from the data source. The data source also determines the order of columns within the row.
SELECT columns: Similar to SELECT * in that it projects all columns from the data source, in data source order. But, rather than creating individual output columns for each data source column, creates a single column which is an array of Varchars which holds the (text form) of each column as an array element.
SELECT a, b, c, ...: Project a specific set of columns, identified by case-insensitive name. The output row uses the names from the SELECT list, but types from the data source. Columns appear in the row in the order specified by the SELECT.

invalid input: '<'

SELECT ...

SELECT COUNT(*)

Names in the SELECT list can reference any of five distinct types of output columns:

Wildcard ("*") column: indicates the place in the projection list to insert the table columns once found in the table projection plan.
Data source columns: columns from the underlying table. The table projection planner will determine if the column exists, or must be filled in with a null column.
The generic data source columns array: columns, or optionally specific members of the columns array such as columns[1].
Implicit columns: fqn, filename, filepath and suffix. These reference parts of the name of the file being scanned.
Partition columns: dir0, dir1, ...: These reference parts of the path name of the file.

Projection with a Schema

The client can provide an output schema that defines the types (and defaults) for the tuple produced by the scan. When a schema is provided, the above use cases are extended as follows:

SELECT * with strict schema: All columns in the output schema are projected, and only those columns. If a reader offers additional columns, those columns are ignored. If the reader omits output columns, the default value (if any) for the column is used.
SELECT * with a non-strict schema: the output tuple contains all columns from the output schema as explained above. In addition, if the reader provides any columns not in the output schema, those columns are appended to the end of the tuple. (That is, the output schema acts as it it were from an imaginary "0th" reader.)
Explicit projection: only the requested columns appear, whether from the output schema, the reader, or as nulls.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

ScanLevelProjection.Builder

static interface

ScanLevelProjection.ScanProjectionParser

Interface for add-on parsers, avoids the need to create a single, tightly-coupled parser for all types of columns.

static enum

ScanLevelProjection.ScanProjectionType

Identifies the kind of projection done for this scan.
Field Summary

Fields

Modifier and Type

Field

Description

protected final CustomErrorContext

errorContext

Context used with error messages.

protected boolean

includesWildcard

protected List<ColumnProjection>

outputCols

protected RequestedTuple

outputProjection

Projection definition for the scan a whole.

protected List<ScanLevelProjection.ScanProjectionParser>

parsers

protected final List<SchemaPath>

projectionList

protected ScanLevelProjection.ScanProjectionType

projectionType

protected ProjectionFilter

readerProjection

Projection definition passed to each reader.

protected final TupleMetadata

readerSchema

protected boolean

sawWildcard
Method Summary

Modifier and Type

Method

Description

void

addMetadataColumn(ColumnProjection outCol)

void

addTableColumn(ColumnProjection outCol)

static ScanLevelProjection

build(List<SchemaPath> projectionList, List<ScanLevelProjection.ScanProjectionParser> parsers)

Builder shortcut, primarily for tests.

static ScanLevelProjection

build(List<SchemaPath> projectionList, List<ScanLevelProjection.ScanProjectionParser> parsers, TupleMetadata outputSchema)

Builder shortcut, primarily for tests.

static ScanLevelProjection.Builder

builder()

List<ColumnProjection>

columns()

The entire set of output columns, in output order.

CustomErrorContext

context()

boolean

hasReaderSchema()

boolean

isEmptyProjection()

Returns true if the projection list is empty.

boolean

projectAll()

Return whether this is a SELECT * query

ScanLevelProjection.ScanProjectionType

projectionType()

ProjectionFilter

readerProjection()

TupleMetadata

readerSchema()

List<SchemaPath>

requestedCols()

Return the set of columns from the SELECT list

RequestedTuple

rootProjection()

String

toString()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Details
- errorContext
  
  protected final CustomErrorContext errorContext
  
  Context used with error messages.
- projectionList
  
  protected final List<SchemaPath> projectionList
- readerSchema
  
  protected final TupleMetadata readerSchema
- parsers
  
  protected List<ScanLevelProjection.ScanProjectionParser> parsers
- includesWildcard
  
  protected boolean includesWildcard
- sawWildcard
  
  protected boolean sawWildcard
- outputCols
  
  protected List<ColumnProjection> outputCols
- outputProjection
  
  protected RequestedTuple outputProjection
  
  Projection definition for the scan a whole. Parsed form of the input projection list.
- readerProjection
  
  protected ProjectionFilter readerProjection
  
  Projection definition passed to each reader. This is the set of columns that the reader is asked to provide.
- projectionType
  
  protected ScanLevelProjection.ScanProjectionType projectionType
Method Details
- builder
  
  public static ScanLevelProjection.Builder builder()
- build
  
  public static ScanLevelProjection build(List<SchemaPath> projectionList, List<ScanLevelProjection.ScanProjectionParser> parsers)
  
  Builder shortcut, primarily for tests.
- build
  
  public static ScanLevelProjection build(List<SchemaPath> projectionList, List<ScanLevelProjection.ScanProjectionParser> parsers, TupleMetadata outputSchema)
  
  Builder shortcut, primarily for tests.
- addTableColumn
  
  public void addTableColumn(ColumnProjection outCol)
- addMetadataColumn
  
  public void addMetadataColumn(ColumnProjection outCol)
- context
  
  public CustomErrorContext context()
- requestedCols
  
  public List<SchemaPath> requestedCols()
  
  Return the set of columns from the SELECT list
  
  Returns:
  
  the SELECT list columns, in SELECT list order
- columns
  
  public List<ColumnProjection> columns()
  
  The entire set of output columns, in output order. Output order is that specified in the SELECT (for an explicit list of columns) or table order (for SELECT * queries).
  
  Returns:
  
  the set of output columns in output order
- projectionType
  
  public ScanLevelProjection.ScanProjectionType projectionType()
- projectAll
  
  public boolean projectAll()
  
  Return whether this is a SELECT * query
  
  Returns:
  
  true if this is a SELECT * query
- isEmptyProjection
  
  public boolean isEmptyProjection()
  
  Returns true if the projection list is empty. This usually indicates a SELECT COUNT(*) query (though the scan operator does not have the context to know that an empty list does, in fact, imply a count-only query...)
  
  Returns:
  
  true if no table columns are projected, false if at least one column is projected (or the query contained the wildcard)
- rootProjection
  
  public RequestedTuple rootProjection()
- readerProjection
  
  public ProjectionFilter readerProjection()
- hasReaderSchema
  
  public boolean hasReaderSchema()
- readerSchema
  
  public TupleMetadata readerSchema()
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object

Class ScanLevelProjection

Overview

Projection Mappings

Projection with a Schema

Nested Class Summary

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

errorContext

projectionList

readerSchema

parsers

includesWildcard

sawWildcard

outputCols

outputProjection

readerProjection

projectionType

Method Details

builder

build

build

addTableColumn

addMetadataColumn

context

requestedCols

columns

projectionType

projectAll

isEmptyProjection

rootProjection

readerProjection

hasReaderSchema

readerSchema

toString