public abstract class AbstractGroupScanWithMetadata<P extends TableMetadataProvider> extends AbstractFileGroupScan
| Modifier and Type | Class and Description |
|---|---|
protected static class |
AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<B extends AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<B>>
This class is responsible for filtering different metadata levels.
|
| Modifier and Type | Field and Description |
|---|---|
protected List<SchemaPath> |
columns |
protected Map<org.apache.hadoop.fs.Path,FileMetadata> |
files |
protected Set<org.apache.hadoop.fs.Path> |
fileSet |
protected LogicalExpression |
filter |
protected boolean |
matchAllMetadata |
protected int |
maxRecords |
protected P |
metadataProvider |
protected NonInterestingColumnsMetadata |
nonInterestingColumnsMetadata |
protected List<SchemaPath> |
partitionColumns |
protected List<PartitionMetadata> |
partitions |
protected Map<org.apache.hadoop.fs.Path,SegmentMetadata> |
segments |
protected TableMetadata |
tableMetadata |
protected boolean |
usedMetastore |
INIT_ALLOCATION, initialAllocation, MAX_ALLOCATION, maxAllocationALL_COLUMNS| Modifier | Constructor and Description |
|---|---|
protected |
AbstractGroupScanWithMetadata(AbstractGroupScanWithMetadata<P> that) |
protected |
AbstractGroupScanWithMetadata(String userName,
List<SchemaPath> columns,
LogicalExpression filter) |
| Modifier and Type | Method and Description |
|---|---|
AbstractGroupScanWithMetadata<?> |
applyFilter(LogicalExpression filterExpr,
UdfUtilities udfUtilities,
FunctionImplementationRegistry functionImplementationRegistry,
OptionManager optionManager)
Applies specified filter
filterExpr to current group scan and produces filtering at:
table level:
if filter matches all the the data or prunes all the data, sets corresponding value to
isMatchAllMetadata() and returns null
segment level:
if filter matches all the the data or prunes all the data, sets corresponding value to
isMatchAllMetadata() and returns null
if segment metadata was pruned, prunes underlying metadata
partition level:
if filter matches all the the data or prunes all the data, sets corresponding value to
isMatchAllMetadata() and returns null
if partition metadata was pruned, prunes underlying metadata
file level:
if filter matches all the the data or prunes all the data, sets corresponding value to
isMatchAllMetadata() and returns null
|
GroupScan |
applyLimit(int maxRecords)
By default, return null to indicate row count based prune is not supported.
|
protected void |
checkMetadataConsistency(FileSelection selection,
org.apache.hadoop.conf.Configuration fsConf)
Compares the last modified time of files obtained from specified selection with
the Metastore last modified time to determine whether Metastore metadata
is up-to-date.
|
protected abstract TableMetadataProviderBuilder |
defaultTableMetadataProviderBuilder(MetadataProviderManager source)
Returns
TableMetadataProviderBuilder instance which may provide metadata
without using Drill Metastore. |
List<SchemaPath> |
getColumns()
Returns a list of columns scanned by this group scan
|
long |
getColumnValueCount(SchemaPath column)
Return column value count for the specified column.
|
String |
getDigest()
Returns a signature of the
GroupScan which should usually be composed of
all its attributes which could describe it uniquely. |
Collection<org.apache.hadoop.fs.Path> |
getFiles()
Returns a collection of file names associated with this GroupScan.
|
Set<org.apache.hadoop.fs.Path> |
getFileSet() |
Map<org.apache.hadoop.fs.Path,FileMetadata> |
getFilesMetadata() |
LogicalExpression |
getFilter() |
protected abstract AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<?> |
getFilterer()
Returns holder for metadata values which provides API to filter metadata
and build new group scan instance using filtered metadata.
|
FilterPredicate<?> |
getFilterPredicate(LogicalExpression filterExpr,
UdfUtilities udfUtilities,
FunctionLookupContext functionImplementationRegistry,
OptionManager optionManager,
boolean omitUnsupportedExprs) |
static FilterPredicate<?> |
getFilterPredicate(LogicalExpression filterExpr,
UdfUtilities udfUtilities,
FunctionLookupContext functionImplementationRegistry,
OptionManager optionManager,
boolean omitUnsupportedExprs,
boolean supportsFileImplicitColumns,
TupleMetadata schema)
Returns parquet filter predicate built from specified
filterExpr. |
protected String |
getFilterString() |
int |
getMaxRecords() |
P |
getMetadataProvider()
Returns
TableMetadataProvider instance which is used for providing metadata for current GroupScan. |
protected <T> List<T> |
getNextOrEmpty(Collection<T> inputList)
Returns list with the first element of input list or empty list if input one was empty.
|
NonInterestingColumnsMetadata |
getNonInterestingColumnsMetadata() |
List<SchemaPath> |
getPartitionColumns()
Returns a list of columns that can be used for partition pruning
|
List<PartitionMetadata> |
getPartitionsMetadata() |
<T> T |
getPartitionValue(org.apache.hadoop.fs.Path path,
SchemaPath column,
Class<T> clazz) |
protected abstract List<String> |
getPartitionValues(LocationProvider locationProvider) |
ScanStats |
getScanStats() |
TupleMetadata |
getSchema() |
Map<org.apache.hadoop.fs.Path,SegmentMetadata> |
getSegmentsMetadata() |
TableMetadata |
getTableMetadata() |
TypeProtos.MajorType |
getTypeForColumn(SchemaPath schemaPath) |
boolean |
hasFiles()
Return true if this GroupScan can return its selection as a list of file names (retrieved by getFiles()).
|
protected void |
init() |
protected boolean |
isAllDataPruned(AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<?> filteredMetadata) |
protected boolean |
isGroupScanFullyMatchesFilter(AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<?> filteredMetadata) |
static boolean |
isImplicitOrPartCol(SchemaPath schemaPath,
OptionManager optionManager) |
boolean |
isMatchAllMetadata() |
protected <T extends BaseMetadata> |
limitMetadata(Collection<T> metadataList,
int maxRecords)
Prunes specified metadata list and leaves minimum metadata instances count with general rows number
which is not less than specified
maxRecords. |
void |
modifyFileSelection(FileSelection selection) |
protected static <T extends BaseMetadata & LocationProvider> |
pruneForPartitions(Map<org.apache.hadoop.fs.Path,T> metadataToPrune,
List<PartitionMetadata> filteredPartitionMetadata)
Removes metadata which does not belong to any of partitions in metadata list.
|
void |
setFilter(LogicalExpression filter) |
void |
setFilterForRuntime(LogicalExpression filterExpr,
OptimizerRulesContext optimizerContext)
Set the filter - thus enabling runtime rowgroup pruning
The runtime pruning can be disabled with an option.
|
protected abstract boolean |
supportsFileImplicitColumns() |
boolean |
supportsLimitPushdown()
Default is not to support limit pushdown.
|
protected abstract TableMetadataProviderBuilder |
tableMetadataProviderBuilder(MetadataProviderManager source)
Returns
TableMetadataProviderBuilder instance based on specified
MetadataProviderManager source. |
boolean |
usedMetastore()
Returns
true if current group scan uses metadata obtained from the Metastore. |
clone, supportsPartitionFilterPushdownaccept, canPushdownProjects, clone, enforceWidth, getAnalyzeInfoProvider, getDistributionAffinity, getInitialAllocation, getMaxAllocation, getMinParallelizationWidth, getOperatorAffinity, getOperatorType, getScanStats, getSelectionRoot, isDistributed, isExecutable, iterator, supportsFilterPushDownaccept, getCost, getOperatorId, getSVMode, getUserName, isBufferedOperator, setCost, setMaxAllocation, setOperatorIdclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitapplyAssignments, canPushdownProjects, clone, enforceWidth, getAnalyzeInfoProvider, getMaxParallelizationWidth, getMinParallelizationWidth, getScanStats, getSelectionRoot, getSpecificScan, isDistributed, supportsFilterPushDownaccept, getCost, getInitialAllocation, getMaxAllocation, getNewWithChildren, getOperatorId, getOperatorType, getSVMode, getUserName, isBufferedOperator, isExecutable, setCost, setMaxAllocation, setOperatorIdacceptforEach, iterator, spliteratorgetDistributionAffinity, getOperatorAffinityprotected P extends TableMetadataProvider metadataProvider
protected TableMetadata tableMetadata
protected List<PartitionMetadata> partitions
protected Map<org.apache.hadoop.fs.Path,SegmentMetadata> segments
protected NonInterestingColumnsMetadata nonInterestingColumnsMetadata
protected List<SchemaPath> partitionColumns
protected LogicalExpression filter
protected List<SchemaPath> columns
protected Map<org.apache.hadoop.fs.Path,FileMetadata> files
protected Set<org.apache.hadoop.fs.Path> fileSet
protected boolean matchAllMetadata
protected boolean usedMetastore
protected int maxRecords
protected AbstractGroupScanWithMetadata(String userName, List<SchemaPath> columns, LogicalExpression filter)
protected AbstractGroupScanWithMetadata(AbstractGroupScanWithMetadata<P> that)
public List<SchemaPath> getColumns()
GroupScangetColumns in interface GroupScangetColumns in class AbstractGroupScanpublic Collection<org.apache.hadoop.fs.Path> getFiles()
GroupScangetFiles in interface GroupScangetFiles in class AbstractGroupScanpublic boolean hasFiles()
GroupScanhasFiles in interface GroupScanhasFiles in class AbstractGroupScanpublic int getMaxRecords()
public boolean isMatchAllMetadata()
public long getColumnValueCount(SchemaPath column)
getColumnValueCount in interface GroupScangetColumnValueCount in class AbstractGroupScancolumn - column schema pathpublic String getDigest()
GroupScanGroupScan which should usually be composed of
all its attributes which could describe it uniquely.public ScanStats getScanStats()
getScanStats in class AbstractGroupScanpublic LogicalExpression getFilter()
getFilter in interface GroupScangetFilter in class AbstractGroupScanpublic P getMetadataProvider()
GroupScanTableMetadataProvider instance which is used for providing metadata for current GroupScan.getMetadataProvider in interface GroupScangetMetadataProvider in class AbstractGroupScanTableMetadataProvider instance the source of metadatapublic void setFilter(LogicalExpression filter)
public void setFilterForRuntime(LogicalExpression filterExpr, OptimizerRulesContext optimizerContext)
filterExpr - The filter to be used at runtime to match with rowgroups' footersoptimizerContext - The context for the optionspublic AbstractGroupScanWithMetadata<?> applyFilter(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionImplementationRegistry functionImplementationRegistry, OptionManager optionManager)
filterExpr to current group scan and produces filtering at:
isMatchAllMetadata() and returns nullisMatchAllMetadata() and returns nullisMatchAllMetadata() and returns nullisMatchAllMetadata() and returns nullapplyFilter in interface GroupScanapplyFilter in class AbstractGroupScanfilterExpr - filter expression to buildudfUtilities - udf utilitiesfunctionImplementationRegistry - context to find drill function holderoptionManager - option managerprotected boolean isAllDataPruned(AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<?> filteredMetadata)
protected boolean isGroupScanFullyMatchesFilter(AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<?> filteredMetadata)
protected <T> List<T> getNextOrEmpty(Collection<T> inputList)
T - type of values in the listinputList - the source of the first elementprotected abstract AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<?> getFilterer()
public FilterPredicate<?> getFilterPredicate(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionLookupContext functionImplementationRegistry, OptionManager optionManager, boolean omitUnsupportedExprs)
public static FilterPredicate<?> getFilterPredicate(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionLookupContext functionImplementationRegistry, OptionManager optionManager, boolean omitUnsupportedExprs, boolean supportsFileImplicitColumns, TupleMetadata schema)
filterExpr.filterExpr - filter expression to buildudfUtilities - udf utilitiesfunctionImplementationRegistry - context to find drill function holderoptionManager - option manageromitUnsupportedExprs - whether expressions which cannot be converted
may be omitted from the resulting expressionsupportsFileImplicitColumns - whether implicit columns are supportedschema - schemapublic TupleMetadata getSchema()
public boolean supportsLimitPushdown()
AbstractGroupScansupportsLimitPushdown in interface GroupScansupportsLimitPushdown in class AbstractGroupScanpublic GroupScan applyLimit(int maxRecords)
AbstractGroupScanapplyLimit in interface GroupScanapplyLimit in class AbstractGroupScanmaxRecords - : the number of rows requested from group scan.protected static <T extends BaseMetadata & LocationProvider> Map<org.apache.hadoop.fs.Path,T> pruneForPartitions(Map<org.apache.hadoop.fs.Path,T> metadataToPrune, List<PartitionMetadata> filteredPartitionMetadata)
T - type of metadata to filtermetadataToPrune - list of metadata which should be prunedfilteredPartitionMetadata - list of partition metadata which was prunedprotected <T extends BaseMetadata> List<T> limitMetadata(Collection<T> metadataList, int maxRecords)
maxRecords.T - type of metadata to prunemetadataList - list of metadata to prunemaxRecords - rows number to leavepublic List<SchemaPath> getPartitionColumns()
GroupScangetPartitionColumns in interface GroupScangetPartitionColumns in class AbstractGroupScanpublic TypeProtos.MajorType getTypeForColumn(SchemaPath schemaPath)
public <T> T getPartitionValue(org.apache.hadoop.fs.Path path,
SchemaPath column,
Class<T> clazz)
public Set<org.apache.hadoop.fs.Path> getFileSet()
public void modifyFileSelection(FileSelection selection)
modifyFileSelection in interface FileGroupScanmodifyFileSelection in class AbstractFileGroupScanprotected void init()
throws IOException
IOExceptionprotected String getFilterString()
protected abstract boolean supportsFileImplicitColumns()
protected abstract List<String> getPartitionValues(LocationProvider locationProvider)
public static boolean isImplicitOrPartCol(SchemaPath schemaPath, OptionManager optionManager)
public Map<org.apache.hadoop.fs.Path,FileMetadata> getFilesMetadata()
public TableMetadata getTableMetadata()
getTableMetadata in interface GroupScangetTableMetadata in class AbstractGroupScanpublic List<PartitionMetadata> getPartitionsMetadata()
public Map<org.apache.hadoop.fs.Path,SegmentMetadata> getSegmentsMetadata()
public boolean usedMetastore()
GroupScantrue if current group scan uses metadata obtained from the Metastore.usedMetastore in interface GroupScanusedMetastore in class AbstractGroupScantrue if current group scan uses metadata obtained from the Metastore, false otherwise.public NonInterestingColumnsMetadata getNonInterestingColumnsMetadata()
protected abstract TableMetadataProviderBuilder tableMetadataProviderBuilder(MetadataProviderManager source)
TableMetadataProviderBuilder instance based on specified
MetadataProviderManager source.source - metadata provider managerTableMetadataProviderBuilder instanceprotected abstract TableMetadataProviderBuilder defaultTableMetadataProviderBuilder(MetadataProviderManager source)
TableMetadataProviderBuilder instance which may provide metadata
without using Drill Metastore.source - metadata provider managerTableMetadataProviderBuilder instanceprotected void checkMetadataConsistency(FileSelection selection, org.apache.hadoop.conf.Configuration fsConf) throws IOException
MetadataException will be thrown.selection - the source of files to checkMetadataException - if metadata is outdatedIOExceptionCopyright © 2021 The Apache Software Foundation. All rights reserved.