Class ParquetSchema
java.lang.Object
org.apache.drill.exec.store.parquet.columnreaders.ParquetSchema
Mapping from the schema of the Parquet file to that of the record reader
to the schema that Drill and the Parquet reader uses.
-
Constructor Summary
ConstructorDescriptionParquetSchema
(OptionManager options, int rowGroupIndex, org.apache.parquet.hadoop.metadata.ParquetMetadata footer, Collection<SchemaPath> selectedCols) Build the Parquet schema. -
Method Summary
Modifier and TypeMethodDescriptionboolean
void
Build the schema for this read as a combination of the schema specified in the Parquet footer and the list of columns selected in the query.void
createNonExistentColumns
(OutputMutator output, List<NullableIntVector> nullFilledVectors) Create "dummy" fields for columns which are selected in the SELECT clause, but not present in the Parquet schema.org.apache.parquet.hadoop.metadata.ParquetMetadata
footer()
int
long
Return the Parquet file row count.org.apache.parquet.hadoop.metadata.BlockMetaData
boolean
-
Constructor Details
-
ParquetSchema
public ParquetSchema(OptionManager options, int rowGroupIndex, org.apache.parquet.hadoop.metadata.ParquetMetadata footer, Collection<SchemaPath> selectedCols) Build the Parquet schema. The schema can be based on a "SELECT *", meaning we want all columns defined in the Parquet file. In this case, the list of selected columns is null. Or, the query can be based on an explicit list of selected columns. In this case, the columns need not exist in the Parquet file. If a column does not exist, the reader returns null for that column. If no selected column exists in the file, then we return "mock" records: records with only null values, but repeated for the number of rows in the Parquet file.- Parameters:
options
- session optionsrowGroupIndex
- row group to readselectedCols
- columns specified in the SELECT clause, or null if this is a SELECT * query
-
-
Method Details
-
buildSchema
public void buildSchema()Build the schema for this read as a combination of the schema specified in the Parquet footer and the list of columns selected in the query. -
isStarQuery
public boolean isStarQuery() -
getBitWidthAllFixedFields
public int getBitWidthAllFixedFields() -
allFieldsFixedLength
public boolean allFieldsFixedLength() -
getColumnMetadata
-
getGroupRecordCount
public long getGroupRecordCount()Return the Parquet file row count.- Returns:
- number of records in the Parquet row group
-
getRowGroupMetadata
public org.apache.parquet.hadoop.metadata.BlockMetaData getRowGroupMetadata() -
createNonExistentColumns
public void createNonExistentColumns(OutputMutator output, List<NullableIntVector> nullFilledVectors) throws SchemaChangeException Create "dummy" fields for columns which are selected in the SELECT clause, but not present in the Parquet schema.- Parameters:
output
- the output container- Throws:
SchemaChangeException
- should not occur
-