Class SchemaPathUtils
-
Method Summary
Modifier and TypeMethodDescriptionstatic void
addColumnMetadata
(TupleMetadata schema, SchemaPath schemaPath, TypeProtos.MajorType type, Map<SchemaPath, TypeProtos.MajorType> types) Adds column with specified schema path and type into specifiedTupleMetadata schema
.static ColumnMetadata
getColumnMetadata
(SchemaPath schemaPath, TupleMetadata schema) ReturnsColumnMetadata
instance obtained from specifiedTupleMetadata schema
which corresponds to the specified column schema path.static boolean
isFieldNestedInDictOrRepeatedMap
(SchemaPath schemaPath, TupleMetadata schema) Checks if field identified by the schema path is child in eitherDICT
orREPEATED MAP
.
-
Method Details
-
getColumnMetadata
ReturnsColumnMetadata
instance obtained from specifiedTupleMetadata schema
which corresponds to the specified column schema path.- Parameters:
schemaPath
- schema path of the column which should be obtainedschema
- tuple schema where column should be searched- Returns:
ColumnMetadata
instance which corresponds to the specified column schema path
-
isFieldNestedInDictOrRepeatedMap
Checks if field identified by the schema path is child in eitherDICT
orREPEATED MAP
. For such fields, nested inDICT
orREPEATED MAP
, filters can't be removed based on Parquet statistics.The need for the check arises because statistics data is not obtained for such fields as their representation differs from the 'canonical' one. For example, field
`a`
in Parquet'sSTRUCT ARRAY
is represented as`struct_array`.`bag`.`array_element`.`a`
but once it is used in a filter,... WHERE struct_array[0].a = 1
, it has different representation (with indexes stripped):`struct_array`.`a`
which is not present in statistics. The same happens with DICT'svalue
: forSELECT ... WHERE dict_col['a'] = 0
, statistics exist for`dict_col`.`key_value`.`value`
but the field in filter is translated to`dict_col`.`a`
and hence it is considered not present in statistics. If the fields (such as ones shown in examples) areOPTIONAL INT
then the field is considered not present in a table and is treated asNULL
. To avoid this situation, the method is used.- Parameters:
schemaPath
- schema path used in filterschema
- schema containing all the fields in the file- Returns:
- true if field is nested inside
DICT
(is`key`
or`value`
) or insideREPEATED MAP
field, false otherwise.
-
addColumnMetadata
public static void addColumnMetadata(TupleMetadata schema, SchemaPath schemaPath, TypeProtos.MajorType type, Map<SchemaPath, TypeProtos.MajorType> types) Adds column with specified schema path and type into specifiedTupleMetadata schema
. For the case when specifiedSchemaPath
has children, corresponding maps will be created in theTupleMetadata schema
and the last child of the map will have specified type.- Parameters:
schema
- tuple schema where column should be addedschemaPath
- schema path of the column which should be addedtype
- type of the column which should be addedtypes
- list of column's parent types
-