public class ParquetTableMetadataUtils extends Object
| Modifier and Type | Method and Description |
|---|---|
static Map<SchemaPath,ColumnStatistics<?>> |
addImplicitColumnsStatistics(Map<SchemaPath,ColumnStatistics<?>> columnsStatistics,
List<SchemaPath> columns,
List<String> partitionValues,
OptionManager optionManager,
org.apache.hadoop.fs.Path location,
boolean supportsFileImplicitColumns)
Creates new map based on specified
columnStatistics with added statistics
for implicit and partition (dir) columns. |
static Map<SchemaPath,ColumnStatistics<?>> |
getColumnStatistics(TupleMetadata schema,
DrillStatsTable statistics)
Returns map with schema path and
ColumnStatistics obtained from specified DrillStatsTable
for all columns from specified BaseTableMetadata. |
static Map<SchemaPath,TypeProtos.MajorType> |
getFileFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata,
MetadataBase.ParquetFileMetadata file)
Returns map of column names with their drill types for specified
file. |
static FileMetadata |
getFileMetadata(Collection<RowGroupMetadata> rowGroups)
Returns
FileMetadata instance received by merging specified RowGroupMetadata list. |
static Map<SchemaPath,TypeProtos.MajorType> |
getIntermediateFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata,
MetadataBase.RowGroupMetadata rowGroup)
Returns map of column names with their Drill types for every
NameSegment in SchemaPath
in specified rowGroup. |
static NonInterestingColumnsMetadata |
getNonInterestingColumnsMeta(MetadataBase.ParquetTableMetadataBase parquetTableMetadata)
Returns the non-interesting column's metadata
|
static org.apache.parquet.schema.OriginalType |
getOriginalType(MetadataBase.ParquetTableMetadataBase parquetTableMetadata,
MetadataBase.ColumnMetadata column)
Returns
OriginalType type for the specified column. |
static PartitionMetadata |
getPartitionMetadata(SchemaPath partitionColumn,
List<FileMetadata> files)
Returns
PartitionMetadata instance received by merging specified FileMetadata list. |
static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName |
getPrimitiveTypeName(MetadataBase.ParquetTableMetadataBase parquetTableMetadata,
MetadataBase.ColumnMetadata column)
Returns
PrimitiveType.PrimitiveTypeName type for the specified column. |
static Map<SchemaPath,ColumnStatistics<?>> |
getRowGroupColumnStatistics(MetadataBase.ParquetTableMetadataBase tableMetadata,
MetadataBase.RowGroupMetadata rowGroupMetadata)
Converts specified
MetadataBase.RowGroupMetadata into the map of ColumnStatistics
instances with column names as keys. |
static Map<SchemaPath,TypeProtos.MajorType> |
getRowGroupFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata,
MetadataBase.RowGroupMetadata rowGroup)
Returns map of column names with their drill types for specified
rowGroup. |
static RowGroupMetadata |
getRowGroupMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata,
MetadataBase.RowGroupMetadata rowGroupMetadata,
int rgIndexInFile,
org.apache.hadoop.fs.Path location)
Returns
RowGroupMetadata instance converted from specified parquet rowGroupMetadata. |
static org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata> |
getRowGroupsMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata)
Returns list of
RowGroupMetadata received by converting parquet row groups metadata
taken from the specified tableMetadata. |
static Object |
getValue(Object value,
org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName primitiveType,
org.apache.parquet.schema.OriginalType originalType)
Handles passed value considering its type and specified
primitiveType with originalType. |
public static Map<SchemaPath,ColumnStatistics<?>> addImplicitColumnsStatistics(Map<SchemaPath,ColumnStatistics<?>> columnsStatistics, List<SchemaPath> columns, List<String> partitionValues, OptionManager optionManager, org.apache.hadoop.fs.Path location, boolean supportsFileImplicitColumns)
columnStatistics with added statistics
for implicit and partition (dir) columns.columnsStatistics - map of column statistics to expandcolumns - list of all columns including implicit or partition onespartitionValues - list of partition valuesoptionManager - option managerlocation - location of metadata partsupportsFileImplicitColumns - whether implicit columns are supportedpublic static org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata> getRowGroupsMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata)
RowGroupMetadata received by converting parquet row groups metadata
taken from the specified tableMetadata.
Assigns index to row groups based on their position in files metadata.
For empty / fake row groups assigns '-1' index.tableMetadata - the source of row groups to be convertedRowGroupMetadatapublic static RowGroupMetadata getRowGroupMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata, MetadataBase.RowGroupMetadata rowGroupMetadata, int rgIndexInFile, org.apache.hadoop.fs.Path location)
RowGroupMetadata instance converted from specified parquet rowGroupMetadata.tableMetadata - table metadata which contains row group metadata to convertrowGroupMetadata - row group metadata to convertrgIndexInFile - index of current row group within the filelocation - location of file with current row groupRowGroupMetadata instance converted from specified parquet rowGroupMetadatapublic static FileMetadata getFileMetadata(Collection<RowGroupMetadata> rowGroups)
FileMetadata instance received by merging specified RowGroupMetadata list.rowGroups - collection of RowGroupMetadata to be mergedFileMetadata instancepublic static PartitionMetadata getPartitionMetadata(SchemaPath partitionColumn, List<FileMetadata> files)
PartitionMetadata instance received by merging specified FileMetadata list.partitionColumn - partition columnfiles - list of files to be mergedPartitionMetadata instancepublic static Map<SchemaPath,ColumnStatistics<?>> getRowGroupColumnStatistics(MetadataBase.ParquetTableMetadataBase tableMetadata, MetadataBase.RowGroupMetadata rowGroupMetadata)
MetadataBase.RowGroupMetadata into the map of ColumnStatistics
instances with column names as keys.tableMetadata - the source of column typesrowGroupMetadata - metadata to convertpublic static NonInterestingColumnsMetadata getNonInterestingColumnsMeta(MetadataBase.ParquetTableMetadataBase parquetTableMetadata)
parquetTableMetadata - the source of column metadata for non-interesting column's statisticspublic static Object getValue(Object value, org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName primitiveType, org.apache.parquet.schema.OriginalType originalType)
primitiveType with originalType.value - value to handleprimitiveType - primitive type of the column whose value should be handledoriginalType - original type of the column whose value should be handledpublic static Map<SchemaPath,TypeProtos.MajorType> getFileFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ParquetFileMetadata file)
file.parquetTableMetadata - the source of primitive and original column typesfile - file whose columns should be discoveredpublic static Map<SchemaPath,TypeProtos.MajorType> getRowGroupFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.RowGroupMetadata rowGroup)
rowGroup.parquetTableMetadata - the source of primitive and original column typesrowGroup - row group whose columns should be discoveredpublic static Map<SchemaPath,TypeProtos.MajorType> getIntermediateFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.RowGroupMetadata rowGroup)
NameSegment in SchemaPath
in specified rowGroup. The type for a SchemaPath can be null in case when
it is not possible to determine its type. Actually, as of now this hierarchy is of interest solely
because there is a need to account for TypeProtos.MinorType.DICT
to make sure filters used on DICT's values (get by key) are not pruned out before actual filtering
happens.parquetTableMetadata - the source of column typesrowGroup - row group whose columns should be discoveredpublic static org.apache.parquet.schema.OriginalType getOriginalType(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ColumnMetadata column)
OriginalType type for the specified column.parquetTableMetadata - the source of column typecolumn - column whose OriginalType should be returnedOriginalType type for the specified columnpublic static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName getPrimitiveTypeName(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ColumnMetadata column)
PrimitiveType.PrimitiveTypeName type for the specified column.parquetTableMetadata - the source of column typecolumn - column whose PrimitiveType.PrimitiveTypeName should be returnedPrimitiveType.PrimitiveTypeName type for the specified columnpublic static Map<SchemaPath,ColumnStatistics<?>> getColumnStatistics(TupleMetadata schema, DrillStatsTable statistics)
ColumnStatistics obtained from specified DrillStatsTable
for all columns from specified BaseTableMetadata.schema - source of column namesstatistics - source of column statisticsColumnStatisticsCopyright © 2021 The Apache Software Foundation. All rights reserved.