java.lang.Object

org.apache.drill.exec.store.parquet.columnreaders.ParquetSchema

public final class ParquetSchema extends Object

Mapping from the schema of the Parquet file to that of the record reader to the schema that Drill and the Parquet reader uses.

Constructor Summary

Constructors

Constructor

Description

ParquetSchema(OptionManager options, int rowGroupIndex, org.apache.parquet.hadoop.metadata.ParquetMetadata footer, Collection<SchemaPath> selectedCols)

Build the Parquet schema.
Method Summary

Modifier and Type

Method

Description

boolean

allFieldsFixedLength()

void

buildSchema()

Build the schema for this read as a combination of the schema specified in the Parquet footer and the list of columns selected in the query.

void

createNonExistentColumns(OutputMutator output, List<NullableIntVector> nullFilledVectors)

Create "dummy" fields for columns which are selected in the SELECT clause, but not present in the Parquet schema.

org.apache.parquet.hadoop.metadata.ParquetMetadata

footer()

int

getBitWidthAllFixedFields()

List<ParquetColumnMetadata>

getColumnMetadata()

long

getGroupRecordCount()

Return the Parquet file row count.

org.apache.parquet.hadoop.metadata.BlockMetaData

getRowGroupMetadata()

boolean

isStarQuery()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- ParquetSchema
  
  public ParquetSchema(OptionManager options, int rowGroupIndex, org.apache.parquet.hadoop.metadata.ParquetMetadata footer, Collection<SchemaPath> selectedCols)
  
  Build the Parquet schema. The schema can be based on a "SELECT *", meaning we want all columns defined in the Parquet file. In this case, the list of selected columns is null. Or, the query can be based on an explicit list of selected columns. In this case, the columns need not exist in the Parquet file. If a column does not exist, the reader returns null for that column. If no selected column exists in the file, then we return "mock" records: records with only null values, but repeated for the number of rows in the Parquet file.
  
  Parameters:
  
  options - session options
  
  rowGroupIndex - row group to read
  
  selectedCols - columns specified in the SELECT clause, or null if this is a SELECT * query
Method Details
- buildSchema
  
  public void buildSchema()
  
  Build the schema for this read as a combination of the schema specified in the Parquet footer and the list of columns selected in the query.
- isStarQuery
  
  public boolean isStarQuery()
- footer
  
  public org.apache.parquet.hadoop.metadata.ParquetMetadata footer()
- getBitWidthAllFixedFields
  
  public int getBitWidthAllFixedFields()
- allFieldsFixedLength
  
  public boolean allFieldsFixedLength()
- getColumnMetadata
  
  public List<ParquetColumnMetadata> getColumnMetadata()
- getGroupRecordCount
  
  public long getGroupRecordCount()
  
  Return the Parquet file row count.
  
  Returns:
  
  number of records in the Parquet row group
- getRowGroupMetadata
  
  public org.apache.parquet.hadoop.metadata.BlockMetaData getRowGroupMetadata()
- createNonExistentColumns
  
  public void createNonExistentColumns(OutputMutator output, List<NullableIntVector> nullFilledVectors) throws SchemaChangeException
  
  Create "dummy" fields for columns which are selected in the SELECT clause, but not present in the Parquet schema.
  
  Parameters:
  
  output - the output container
  
  Throws:
  
  SchemaChangeException - should not occur

Class ParquetSchema

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

ParquetSchema

Method Details

buildSchema

isStarQuery

footer

getBitWidthAllFixedFields

allFieldsFixedLength

getColumnMetadata

getGroupRecordCount

getRowGroupMetadata

createNonExistentColumns