Interface GroupScan
- All Superinterfaces:
FragmentLeaf,GraphValue<PhysicalOperator>,HasAffinity,Iterable<PhysicalOperator>,Leaf,PhysicalOperator,Scan
- All Known Subinterfaces:
DbGroupScan,FileGroupScan,IndexGroupScan
- All Known Implementing Classes:
AbstractDbGroupScan,AbstractFileGroupScan,AbstractGroupScan,AbstractGroupScanWithMetadata,AbstractParquetGroupScan,DeltaGroupScan,DirectGroupScan,DrillGroupScan,DruidGroupScan,EasyGroupScan,EnumerableGroupScan,GoogleSheetsGroupScan,HBaseGroupScan,HiveDrillNativeParquetScan,HiveScan,HttpGroupScan,IcebergGroupScan,InfoSchemaGroupScan,JdbcGroupScan,KafkaGroupScan,KuduGroupScan,MetadataDirectGroupScan,MockGroupScanPOP,MongoGroupScan,OpenTSDBGroupScan,ParquetGroupScan,PhoenixGroupScan,SchemalessScan,SplunkGroupScan,SystemTableScan
A GroupScan operator represents all data which will be scanned by a given physical
plan. It is the superset of all SubScans for the plan.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final List<SchemaPath> columns list in GroupScan : 1) empty_column is for skipAll query. -
Method Summary
Modifier and TypeMethodDescriptionvoidapplyAssignments(List<CoordinationProtos.DrillbitEndpoint> endpoints) applyFilter(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionImplementationRegistry functionImplementationRegistry, OptionManager optionManager) applyLimit(int maxRecords) Apply rowcount based prune for "LIMIT n" query.booleancanPushdownProjects(List<SchemaPath> columns) GroupScan should check the list of columns, and see if it could support all the columns in the list.clone(List<SchemaPath> columns) Returns a clone of GroupScan instance, except that the new GroupScan will use the provided list of columns .booleanDeprecated.ReturnsAnalyzeInfoProviderinstance which will be used when running ANALYZE statement.Returns a list of columns scanned by this group scanlonggetColumnValueCount(SchemaPath column) Return the number of non-null value in the specified column.Returns a signature of theGroupScanwhich should usually be composed of all its attributes which could describe it uniquely.Collection<org.apache.hadoop.fs.Path> getFiles()Returns a collection of file names associated with this GroupScan.intReturnsTableMetadataProviderinstance which is used for providing metadata for currentGroupScan.intAt minimum, the GroupScan requires these many fragments to run.Returns a list of columns that can be used for partition pruninggetScanStats(org.apache.calcite.rel.metadata.RelMetadataQuery mq) getScanStats(PlannerSettings settings) org.apache.hadoop.fs.PathReturns path to the selection root.getSpecificScan(int minorFragmentId) booleanhasFiles()Return true if this GroupScan can return its selection as a list of file names (retrieved by getFiles()).booleanbooleanChecks whether this group scan supports filter push down.booleanWhether or not this GroupScan supports limit pushdownbooleanWhether or not this GroupScan supports pushdown of partition filters (directories for filesystems)booleanReturnstrueif current group scan uses metadata obtained from the Metastore.Methods inherited from interface org.apache.drill.common.graph.GraphValue
acceptMethods inherited from interface org.apache.drill.exec.physical.base.HasAffinity
getDistributionAffinity, getOperatorAffinityMethods inherited from interface java.lang.Iterable
forEach, iterator, spliteratorMethods inherited from interface org.apache.drill.exec.physical.base.PhysicalOperator
accept, getCost, getInitialAllocation, getMaxAllocation, getNewWithChildren, getOperatorId, getOperatorType, getSVMode, getUserName, isBufferedOperator, isExecutable, setCost, setMaxAllocation, setOperatorId
-
Field Details
-
ALL_COLUMNS
columns list in GroupScan : 1) empty_column is for skipAll query. 2) NULL is interpreted as ALL_COLUMNS. How to handle skipAll query is up to each storage plugin, with different policy in corresponding RecordReader.
-
-
Method Details
-
applyAssignments
void applyAssignments(List<CoordinationProtos.DrillbitEndpoint> endpoints) throws PhysicalOperatorSetupException - Throws:
PhysicalOperatorSetupException
-
getSpecificScan
- Throws:
ExecutionSetupException
-
getMaxParallelizationWidth
int getMaxParallelizationWidth() -
isDistributed
boolean isDistributed() -
getMinParallelizationWidth
int getMinParallelizationWidth()At minimum, the GroupScan requires these many fragments to run. Currently, this is used inSimpleParallelizer- Returns:
- the minimum number of fragments that should run
-
enforceWidth
Deprecated.UsegetMinParallelizationWidth()to determine whether this GroupScan spans more than one fragment.Check if GroupScan enforces width to be maximum parallelization width. Currently, this is used inExcessiveExchangeIdentifier- Returns:
- if maximum width should be enforced
-
getDigest
String getDigest()Returns a signature of theGroupScanwhich should usually be composed of all its attributes which could describe it uniquely. -
getScanStats
-
getScanStats
-
clone
Returns a clone of GroupScan instance, except that the new GroupScan will use the provided list of columns . -
canPushdownProjects
GroupScan should check the list of columns, and see if it could support all the columns in the list. -
getColumnValueCount
Return the number of non-null value in the specified column. Raise exception, if groupscan does not have exact column row count. -
supportsPartitionFilterPushdown
boolean supportsPartitionFilterPushdown()Whether or not this GroupScan supports pushdown of partition filters (directories for filesystems) -
getColumns
List<SchemaPath> getColumns()Returns a list of columns scanned by this group scan -
getPartitionColumns
List<SchemaPath> getPartitionColumns()Returns a list of columns that can be used for partition pruning -
supportsLimitPushdown
boolean supportsLimitPushdown()Whether or not this GroupScan supports limit pushdown -
applyLimit
Apply rowcount based prune for "LIMIT n" query.- Parameters:
maxRecords- : the number of rows requested from group scan.- Returns:
- a new instance of group scan if the prune is successful. null when either if row-based prune is not supported, or if prune is not successful.
-
hasFiles
boolean hasFiles()Return true if this GroupScan can return its selection as a list of file names (retrieved by getFiles()). -
getSelectionRoot
org.apache.hadoop.fs.Path getSelectionRoot()Returns path to the selection root. If this GroupScan cannot provide selection root, it returns null.- Returns:
- path to the selection root
-
getFiles
Collection<org.apache.hadoop.fs.Path> getFiles()Returns a collection of file names associated with this GroupScan. This should be called after checking hasFiles(). If this GroupScan cannot provide file names, it returns null.- Returns:
- collection of files paths
-
getFilter
LogicalExpression getFilter() -
applyFilter
GroupScan applyFilter(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionImplementationRegistry functionImplementationRegistry, OptionManager optionManager) -
getMetadataProvider
TableMetadataProvider getMetadataProvider()ReturnsTableMetadataProviderinstance which is used for providing metadata for currentGroupScan.- Returns:
TableMetadataProviderinstance the source of metadata
-
getTableMetadata
TableMetadata getTableMetadata() -
usedMetastore
boolean usedMetastore()Returnstrueif current group scan uses metadata obtained from the Metastore.- Returns:
trueif current group scan uses metadata obtained from the Metastore,falseotherwise.
-
getAnalyzeInfoProvider
AnalyzeInfoProvider getAnalyzeInfoProvider()ReturnsAnalyzeInfoProviderinstance which will be used when running ANALYZE statement.- Returns:
AnalyzeInfoProviderinstance
-
supportsFilterPushDown
boolean supportsFilterPushDown()Checks whether this group scan supports filter push down.- Returns:
trueif this group scan supports filter push down,falseotherwise
-
getMinParallelizationWidth()to determine whether this GroupScan spans more than one fragment.