All Superinterfaces:: FragmentLeaf, GraphValue<PhysicalOperator>, HasAffinity, Iterable<PhysicalOperator>, Leaf, PhysicalOperator, Scan

All Known Subinterfaces:: DbGroupScan, FileGroupScan, IndexGroupScan

All Known Implementing Classes:: AbstractDbGroupScan, AbstractFileGroupScan, AbstractGroupScan, AbstractGroupScanWithMetadata, AbstractParquetGroupScan, DeltaGroupScan, DirectGroupScan, DrillGroupScan, DruidGroupScan, EasyGroupScan, EnumerableGroupScan, GoogleSheetsGroupScan, HBaseGroupScan, HiveDrillNativeParquetScan, HiveScan, HttpGroupScan, IcebergGroupScan, InfoSchemaGroupScan, JdbcGroupScan, KafkaGroupScan, KuduGroupScan, MetadataDirectGroupScan, MockGroupScanPOP, MongoGroupScan, OpenTSDBGroupScan, ParquetGroupScan, PhoenixGroupScan, SchemalessScan, SplunkGroupScan, SystemTableScan

public interface GroupScan extends Scan, HasAffinity

A GroupScan operator represents all data which will be scanned by a given physical plan. It is the superset of all SubScans for the plan.

Field Summary

Fields

Modifier and Type

Field

Description

static final List<SchemaPath>

ALL_COLUMNS

columns list in GroupScan : 1) empty_column is for skipAll query.
Method Summary

Modifier and Type

Method

Description

void

applyAssignments(List<CoordinationProtos.DrillbitEndpoint> endpoints)

GroupScan

applyFilter(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionImplementationRegistry functionImplementationRegistry, OptionManager optionManager)

GroupScan

applyLimit(int maxRecords)

Apply rowcount based prune for "LIMIT n" query.

boolean

canPushdownProjects(List<SchemaPath> columns)

GroupScan should check the list of columns, and see if it could support all the columns in the list.

GroupScan

clone(List<SchemaPath> columns)

Returns a clone of GroupScan instance, except that the new GroupScan will use the provided list of columns .

boolean

enforceWidth()

Deprecated.
Use getMinParallelizationWidth() to determine whether this GroupScan spans more than one fragment.

AnalyzeInfoProvider

getAnalyzeInfoProvider()

Returns AnalyzeInfoProvider instance which will be used when running ANALYZE statement.

List<SchemaPath>

getColumns()

Returns a list of columns scanned by this group scan

long

getColumnValueCount(SchemaPath column)

Return the number of non-null value in the specified column.

String

getDigest()

Returns a signature of the GroupScan which should usually be composed of all its attributes which could describe it uniquely.

Collection<org.apache.hadoop.fs.Path>

getFiles()

Returns a collection of file names associated with this GroupScan.

LogicalExpression

getFilter()

int

getMaxParallelizationWidth()

TableMetadataProvider

getMetadataProvider()

Returns TableMetadataProvider instance which is used for providing metadata for current GroupScan.

int

getMinParallelizationWidth()

At minimum, the GroupScan requires these many fragments to run.

List<SchemaPath>

getPartitionColumns()

Returns a list of columns that can be used for partition pruning

ScanStats

getScanStats(org.apache.calcite.rel.metadata.RelMetadataQuery mq)

ScanStats

getScanStats(PlannerSettings settings)

org.apache.hadoop.fs.Path

getSelectionRoot()

Returns path to the selection root.

SubScan

getSpecificScan(int minorFragmentId)

TableMetadata

getTableMetadata()

boolean

hasFiles()

Return true if this GroupScan can return its selection as a list of file names (retrieved by getFiles()).

boolean

isDistributed()

boolean

supportsFilterPushDown()

Checks whether this group scan supports filter push down.

boolean

supportsLimitPushdown()

Whether or not this GroupScan supports limit pushdown

boolean

supportsPartitionFilterPushdown()

Whether or not this GroupScan supports pushdown of partition filters (directories for filesystems)

boolean

usedMetastore()

Returns true if current group scan uses metadata obtained from the Metastore.

Methods inherited from interface org.apache.drill.common.graph.GraphValue
accept

Methods inherited from interface org.apache.drill.exec.physical.base.HasAffinity
getDistributionAffinity, getOperatorAffinity

Methods inherited from interface java.lang.Iterable
forEach, iterator, spliterator

Methods inherited from interface org.apache.drill.exec.physical.base.PhysicalOperator
accept, getCost, getInitialAllocation, getMaxAllocation, getNewWithChildren, getOperatorId, getOperatorType, getSVMode, getUserName, isBufferedOperator, isExecutable, setCost, setMaxAllocation, setOperatorId

Field Details
- ALL_COLUMNS
  
  static final List<SchemaPath> ALL_COLUMNS
  
  columns list in GroupScan : 1) empty_column is for skipAll query. 2) NULL is interpreted as ALL_COLUMNS. How to handle skipAll query is up to each storage plugin, with different policy in corresponding RecordReader.
Method Details
- applyAssignments
  
  void applyAssignments(List<CoordinationProtos.DrillbitEndpoint> endpoints) throws PhysicalOperatorSetupException
  
  Throws:
  
  PhysicalOperatorSetupException
- getSpecificScan
  
  SubScan getSpecificScan(int minorFragmentId) throws ExecutionSetupException
  
  Throws:
  
  ExecutionSetupException
- getMaxParallelizationWidth
  
  int getMaxParallelizationWidth()
- isDistributed
  
  boolean isDistributed()
- getMinParallelizationWidth
  
  int getMinParallelizationWidth()
  
  At minimum, the GroupScan requires these many fragments to run. Currently, this is used in SimpleParallelizer
  
  Returns:
  
  the minimum number of fragments that should run
- enforceWidth
  
  @Deprecated boolean enforceWidth()
  
  Deprecated.
  Use getMinParallelizationWidth() to determine whether this GroupScan spans more than one fragment.
  
  Check if GroupScan enforces width to be maximum parallelization width. Currently, this is used in ExcessiveExchangeIdentifier
  
  Returns:
  
  if maximum width should be enforced
- getDigest
  
  String getDigest()
  
  Returns a signature of the GroupScan which should usually be composed of all its attributes which could describe it uniquely.
- getScanStats
  
  ScanStats getScanStats(PlannerSettings settings)
- getScanStats
  
  ScanStats getScanStats(org.apache.calcite.rel.metadata.RelMetadataQuery mq)
- clone
  
  GroupScan clone(List<SchemaPath> columns)
  
  Returns a clone of GroupScan instance, except that the new GroupScan will use the provided list of columns .
- canPushdownProjects
  
  boolean canPushdownProjects(List<SchemaPath> columns)
  
  GroupScan should check the list of columns, and see if it could support all the columns in the list.
- getColumnValueCount
  
  long getColumnValueCount(SchemaPath column)
  
  Return the number of non-null value in the specified column. Raise exception, if groupscan does not have exact column row count.
- supportsPartitionFilterPushdown
  
  boolean supportsPartitionFilterPushdown()
  
  Whether or not this GroupScan supports pushdown of partition filters (directories for filesystems)
- getColumns
  
  List<SchemaPath> getColumns()
  
  Returns a list of columns scanned by this group scan
- getPartitionColumns
  
  List<SchemaPath> getPartitionColumns()
  
  Returns a list of columns that can be used for partition pruning
- supportsLimitPushdown
  
  boolean supportsLimitPushdown()
  
  Whether or not this GroupScan supports limit pushdown
- applyLimit
  
  GroupScan applyLimit(int maxRecords)
  
  Apply rowcount based prune for "LIMIT n" query.
  
  Parameters:
  
  maxRecords - : the number of rows requested from group scan.
  
  Returns:
  
  a new instance of group scan if the prune is successful. null when either if row-based prune is not supported, or if prune is not successful.
- hasFiles
  
  boolean hasFiles()
  
  Return true if this GroupScan can return its selection as a list of file names (retrieved by getFiles()).
- getSelectionRoot
  
  org.apache.hadoop.fs.Path getSelectionRoot()
  
  Returns path to the selection root. If this GroupScan cannot provide selection root, it returns null.
  
  Returns:
  
  path to the selection root
- getFiles
  
  Collection<org.apache.hadoop.fs.Path> getFiles()
  
  Returns a collection of file names associated with this GroupScan. This should be called after checking hasFiles(). If this GroupScan cannot provide file names, it returns null.
  
  Returns:
  
  collection of files paths
- getFilter
  
  LogicalExpression getFilter()
- applyFilter
  
  GroupScan applyFilter(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionImplementationRegistry functionImplementationRegistry, OptionManager optionManager)
- getMetadataProvider
  
  TableMetadataProvider getMetadataProvider()
  
  Returns TableMetadataProvider instance which is used for providing metadata for current GroupScan.
  
  Returns:
  
  TableMetadataProvider instance the source of metadata
- getTableMetadata
  
  TableMetadata getTableMetadata()
- usedMetastore
  
  boolean usedMetastore()
  
  Returns true if current group scan uses metadata obtained from the Metastore.
  
  Returns:
  
  true if current group scan uses metadata obtained from the Metastore, false otherwise.
- getAnalyzeInfoProvider
  
  AnalyzeInfoProvider getAnalyzeInfoProvider()
  
  Returns AnalyzeInfoProvider instance which will be used when running ANALYZE statement.
  
  Returns:
  
  AnalyzeInfoProvider instance
- supportsFilterPushDown
  
  boolean supportsFilterPushDown()
  
  Checks whether this group scan supports filter push down.
  
  Returns:
  
  true if this group scan supports filter push down, false otherwise

Interface GroupScan

Field Summary

Method Summary

Methods inherited from interface org.apache.drill.common.graph.GraphValue

Methods inherited from interface org.apache.drill.exec.physical.base.HasAffinity

Methods inherited from interface java.lang.Iterable

Methods inherited from interface org.apache.drill.exec.physical.base.PhysicalOperator

Field Details

ALL_COLUMNS

Method Details

applyAssignments

getSpecificScan

getMaxParallelizationWidth

isDistributed

getMinParallelizationWidth

enforceWidth

getDigest

getScanStats

getScanStats

clone

canPushdownProjects

getColumnValueCount

supportsPartitionFilterPushdown

getColumns

getPartitionColumns

supportsLimitPushdown

applyLimit

hasFiles

getSelectionRoot

getFiles

getFilter

applyFilter

getMetadataProvider

getTableMetadata

usedMetastore

getAnalyzeInfoProvider

supportsFilterPushDown