java.lang.Object
org.apache.drill.exec.store.parquet.columnreaders.ParquetSchema

public final class ParquetSchema extends Object
Mapping from the schema of the Parquet file to that of the record reader to the schema that Drill and the Parquet reader uses.
  • Constructor Details

    • ParquetSchema

      public ParquetSchema(OptionManager options, int rowGroupIndex, org.apache.parquet.hadoop.metadata.ParquetMetadata footer, Collection<SchemaPath> selectedCols)
      Build the Parquet schema. The schema can be based on a "SELECT *", meaning we want all columns defined in the Parquet file. In this case, the list of selected columns is null. Or, the query can be based on an explicit list of selected columns. In this case, the columns need not exist in the Parquet file. If a column does not exist, the reader returns null for that column. If no selected column exists in the file, then we return "mock" records: records with only null values, but repeated for the number of rows in the Parquet file.
      Parameters:
      options - session options
      rowGroupIndex - row group to read
      selectedCols - columns specified in the SELECT clause, or null if this is a SELECT * query
  • Method Details

    • buildSchema

      public void buildSchema()
      Build the schema for this read as a combination of the schema specified in the Parquet footer and the list of columns selected in the query.
    • isStarQuery

      public boolean isStarQuery()
    • footer

      public org.apache.parquet.hadoop.metadata.ParquetMetadata footer()
    • getBitWidthAllFixedFields

      public int getBitWidthAllFixedFields()
    • allFieldsFixedLength

      public boolean allFieldsFixedLength()
    • getColumnMetadata

      public List<ParquetColumnMetadata> getColumnMetadata()
    • getGroupRecordCount

      public long getGroupRecordCount()
      Return the Parquet file row count.
      Returns:
      number of records in the Parquet row group
    • getRowGroupMetadata

      public org.apache.parquet.hadoop.metadata.BlockMetaData getRowGroupMetadata()
    • createNonExistentColumns

      public void createNonExistentColumns(OutputMutator output, List<NullableIntVector> nullFilledVectors) throws SchemaChangeException
      Create "dummy" fields for columns which are selected in the SELECT clause, but not present in the Parquet schema.
      Parameters:
      output - the output container
      Throws:
      SchemaChangeException - should not occur