Class ParquetTableMetadataUtils
java.lang.Object
org.apache.drill.exec.store.parquet.ParquetTableMetadataUtils
Utility class for converting parquet metadata classes to Metastore metadata classes.
-
Method Summary
Modifier and TypeMethodDescriptionstatic Map<SchemaPath, ColumnStatistics<?>> addImplicitColumnsStatistics(Map<SchemaPath, ColumnStatistics<?>> columnsStatistics, List<SchemaPath> columns, List<String> partitionValues, OptionManager optionManager, org.apache.hadoop.fs.Path location, boolean supportsFileImplicitColumns) Creates new map based on specifiedcolumnStatisticswith added statistics for implicit and partition (dir) columns.static Map<SchemaPath, ColumnStatistics<?>> getColumnStatistics(TupleMetadata schema, DrillStatsTable statistics) Returns map with schema path andColumnStatisticsobtained from specifiedDrillStatsTablefor all columns from specifiedBaseTableMetadata.static Map<SchemaPath, TypeProtos.MajorType> getFileFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ParquetFileMetadata file) Returns map of column names with their drill types for specifiedfile.static FileMetadatagetFileMetadata(Collection<RowGroupMetadata> rowGroups) ReturnsFileMetadatainstance received by merging specifiedRowGroupMetadatalist.static Map<SchemaPath, TypeProtos.MajorType> getIntermediateFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.RowGroupMetadata rowGroup) Returns map of column names with their Drill types for everyNameSegmentinSchemaPathin specifiedrowGroup.getNonInterestingColumnsMeta(MetadataBase.ParquetTableMetadataBase parquetTableMetadata) Returns the non-interesting column's metadatastatic org.apache.parquet.schema.OriginalTypegetOriginalType(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ColumnMetadata column) ReturnsOriginalTypetype for the specified column.static PartitionMetadatagetPartitionMetadata(SchemaPath partitionColumn, List<FileMetadata> files) ReturnsPartitionMetadatainstance received by merging specifiedFileMetadatalist.static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeNamegetPrimitiveTypeName(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ColumnMetadata column) ReturnsPrimitiveType.PrimitiveTypeNametype for the specified column.static Map<SchemaPath, ColumnStatistics<?>> getRowGroupColumnStatistics(MetadataBase.ParquetTableMetadataBase tableMetadata, MetadataBase.RowGroupMetadata rowGroupMetadata) Converts specifiedMetadataBase.RowGroupMetadatainto the map ofColumnStatisticsinstances with column names as keys.static Map<SchemaPath, TypeProtos.MajorType> getRowGroupFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.RowGroupMetadata rowGroup) Returns map of column names with their drill types for specifiedrowGroup.static RowGroupMetadatagetRowGroupMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata, MetadataBase.RowGroupMetadata rowGroupMetadata, int rgIndexInFile, org.apache.hadoop.fs.Path location) ReturnsRowGroupMetadatainstance converted from specified parquetrowGroupMetadata.static com.google.common.collect.Multimap<org.apache.hadoop.fs.Path, RowGroupMetadata> getRowGroupsMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata) Returns list ofRowGroupMetadatareceived by converting parquet row groups metadata taken from the specified tableMetadata.static ObjectgetValue(Object value, org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName primitiveType, org.apache.parquet.schema.OriginalType originalType) Handles passed value considering its type and specifiedprimitiveTypewithoriginalType.
-
Method Details
-
addImplicitColumnsStatistics
public static Map<SchemaPath,ColumnStatistics<?>> addImplicitColumnsStatistics(Map<SchemaPath, ColumnStatistics<?>> columnsStatistics, List<SchemaPath> columns, List<String> partitionValues, OptionManager optionManager, org.apache.hadoop.fs.Path location, boolean supportsFileImplicitColumns) Creates new map based on specifiedcolumnStatisticswith added statistics for implicit and partition (dir) columns.- Parameters:
columnsStatistics- map of column statistics to expandcolumns- list of all columns including implicit or partition onespartitionValues- list of partition valuesoptionManager- option managerlocation- location of metadata partsupportsFileImplicitColumns- whether implicit columns are supported- Returns:
- map with added statistics for implicit and partition (dir) columns
-
getRowGroupsMetadata
public static com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata> getRowGroupsMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata) Returns list ofRowGroupMetadatareceived by converting parquet row groups metadata taken from the specified tableMetadata. Assigns index to row groups based on their position in files metadata. For empty / fake row groups assigns '-1' index.- Parameters:
tableMetadata- the source of row groups to be converted- Returns:
- list of
RowGroupMetadata
-
getRowGroupMetadata
public static RowGroupMetadata getRowGroupMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata, MetadataBase.RowGroupMetadata rowGroupMetadata, int rgIndexInFile, org.apache.hadoop.fs.Path location) ReturnsRowGroupMetadatainstance converted from specified parquetrowGroupMetadata.- Parameters:
tableMetadata- table metadata which contains row group metadata to convertrowGroupMetadata- row group metadata to convertrgIndexInFile- index of current row group within the filelocation- location of file with current row group- Returns:
RowGroupMetadatainstance converted from specified parquetrowGroupMetadata
-
getFileMetadata
ReturnsFileMetadatainstance received by merging specifiedRowGroupMetadatalist.- Parameters:
rowGroups- collection ofRowGroupMetadatato be merged- Returns:
FileMetadatainstance
-
getPartitionMetadata
public static PartitionMetadata getPartitionMetadata(SchemaPath partitionColumn, List<FileMetadata> files) ReturnsPartitionMetadatainstance received by merging specifiedFileMetadatalist.- Parameters:
partitionColumn- partition columnfiles- list of files to be merged- Returns:
PartitionMetadatainstance
-
getRowGroupColumnStatistics
public static Map<SchemaPath,ColumnStatistics<?>> getRowGroupColumnStatistics(MetadataBase.ParquetTableMetadataBase tableMetadata, MetadataBase.RowGroupMetadata rowGroupMetadata) Converts specifiedMetadataBase.RowGroupMetadatainto the map ofColumnStatisticsinstances with column names as keys.- Parameters:
tableMetadata- the source of column typesrowGroupMetadata- metadata to convert- Returns:
- map with converted row group metadata
-
getNonInterestingColumnsMeta
public static NonInterestingColumnsMetadata getNonInterestingColumnsMeta(MetadataBase.ParquetTableMetadataBase parquetTableMetadata) Returns the non-interesting column's metadata- Parameters:
parquetTableMetadata- the source of column metadata for non-interesting column's statistics- Returns:
- returns non-interesting columns metadata
-
getValue
public static Object getValue(Object value, org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName primitiveType, org.apache.parquet.schema.OriginalType originalType) Handles passed value considering its type and specifiedprimitiveTypewithoriginalType.- Parameters:
value- value to handleprimitiveType- primitive type of the column whose value should be handledoriginalType- original type of the column whose value should be handled- Returns:
- handled value
-
getFileFields
public static Map<SchemaPath,TypeProtos.MajorType> getFileFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ParquetFileMetadata file) Returns map of column names with their drill types for specifiedfile.- Parameters:
parquetTableMetadata- the source of primitive and original column typesfile- file whose columns should be discovered- Returns:
- map of column names with their drill types
-
getRowGroupFields
public static Map<SchemaPath,TypeProtos.MajorType> getRowGroupFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.RowGroupMetadata rowGroup) Returns map of column names with their drill types for specifiedrowGroup.- Parameters:
parquetTableMetadata- the source of primitive and original column typesrowGroup- row group whose columns should be discovered- Returns:
- map of column names with their drill types
-
getIntermediateFields
public static Map<SchemaPath,TypeProtos.MajorType> getIntermediateFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.RowGroupMetadata rowGroup) Returns map of column names with their Drill types for everyNameSegmentinSchemaPathin specifiedrowGroup. The type for aSchemaPathcan benullin case when it is not possible to determine its type. Actually, as of now this hierarchy is of interest solely because there is a need to account forTypeProtos.MinorType.DICTto make sure filters used onDICT's values (get by key) are not pruned out before actual filtering happens.- Parameters:
parquetTableMetadata- the source of column typesrowGroup- row group whose columns should be discovered- Returns:
- map of column names with their drill types
-
getOriginalType
public static org.apache.parquet.schema.OriginalType getOriginalType(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ColumnMetadata column) ReturnsOriginalTypetype for the specified column.- Parameters:
parquetTableMetadata- the source of column typecolumn- column whoseOriginalTypeshould be returned- Returns:
OriginalTypetype for the specified column
-
getPrimitiveTypeName
public static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName getPrimitiveTypeName(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ColumnMetadata column) ReturnsPrimitiveType.PrimitiveTypeNametype for the specified column.- Parameters:
parquetTableMetadata- the source of column typecolumn- column whosePrimitiveType.PrimitiveTypeNameshould be returned- Returns:
PrimitiveType.PrimitiveTypeNametype for the specified column
-
getColumnStatistics
public static Map<SchemaPath,ColumnStatistics<?>> getColumnStatistics(TupleMetadata schema, DrillStatsTable statistics) Returns map with schema path andColumnStatisticsobtained from specifiedDrillStatsTablefor all columns from specifiedBaseTableMetadata.- Parameters:
schema- source of column namesstatistics- source of column statistics- Returns:
- map with schema path and
ColumnStatistics
-