org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader

All Implemented Interfaces:: AutoCloseable, RecordReader

public class ParquetRecordReader extends CommonParquetRecordReader

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.drill.exec.store.CommonParquetRecordReader
CommonParquetRecordReader.Metric
Field Summary

Fields inherited from class org.apache.drill.exec.store.CommonParquetRecordReader
footer, fragmentContext, NUM_RECORDS_TO_READ_NOT_SPECIFIED, operatorContext, parquetReaderStats

Fields inherited from class org.apache.drill.exec.store.AbstractRecordReader
DEFAULT_TEXT_COLS_TO_READ

Fields inherited from interface org.apache.drill.exec.store.RecordReader
ALLOCATOR_INITIAL_RESERVATION, ALLOCATOR_MAX_RESERVATION
Constructor Summary

Constructors

Constructor

Description

ParquetRecordReader(FragmentContext fragmentContext, long numRecordsToRead, org.apache.hadoop.fs.Path path, int rowGroupIndex, org.apache.hadoop.fs.FileSystem fs, org.apache.parquet.compression.CompressionCodecFactory codecFactory, org.apache.parquet.hadoop.metadata.ParquetMetadata footer, List<SchemaPath> columns, ParquetReaderUtility.DateCorruptionStatus dateCorruptionStatus)

ParquetRecordReader(FragmentContext fragmentContext, org.apache.hadoop.fs.Path path, int rowGroupIndex, long numRecordsToRead, org.apache.hadoop.fs.FileSystem fs, org.apache.parquet.compression.CompressionCodecFactory codecFactory, org.apache.parquet.hadoop.metadata.ParquetMetadata footer, List<SchemaPath> columns, ParquetReaderUtility.DateCorruptionStatus dateCorruptionStatus)

ParquetRecordReader(FragmentContext fragmentContext, org.apache.hadoop.fs.Path path, int rowGroupIndex, org.apache.hadoop.fs.FileSystem fs, org.apache.parquet.compression.CompressionCodecFactory codecFactory, org.apache.parquet.hadoop.metadata.ParquetMetadata footer, List<SchemaPath> columns, ParquetReaderUtility.DateCorruptionStatus dateCorruptionStatus)
Method Summary

Modifier and Type

Method

Description

void

allocate(Map<String,ValueVector> vectorMap)

void

close()

RecordBatchSizerManager

getBatchSizesMgr()

org.apache.parquet.compression.CompressionCodecFactory

getCodecFactory()

ParquetReaderUtility.DateCorruptionStatus

getDateCorruptionStatus()

Flag indicating if the old non-standard data format appears in this file, see DRILL-4203.

protected List<SchemaPath>

getDefaultColumnsToRead()

org.apache.hadoop.fs.FileSystem

getFileSystem()

FragmentContext

getFragmentContext()

org.apache.hadoop.fs.Path

getHadoopPath()

OperatorContext

getOperatorContext()

ReadState

getReadState()

int

getRowGroupIndex()

int

next()

Read the next record batch from the file using the reader and read state created previously.

void

setup(OperatorContext operatorContext, OutputMutator output)

Prepare the Parquet reader.

String

toString()

boolean

useBulkReader()

Methods inherited from class org.apache.drill.exec.store.CommonParquetRecordReader
closeStats, handleAndRaise, initNumRecordsToRead, updateRowGroupsStats

Methods inherited from class org.apache.drill.exec.store.AbstractRecordReader
getColumns, hasNext, isSkipQuery, isStarQuery, setColumns, transformColumns

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Constructor Details
- ParquetRecordReader
  
  public ParquetRecordReader(FragmentContext fragmentContext, org.apache.hadoop.fs.Path path, int rowGroupIndex, long numRecordsToRead, org.apache.hadoop.fs.FileSystem fs, org.apache.parquet.compression.CompressionCodecFactory codecFactory, org.apache.parquet.hadoop.metadata.ParquetMetadata footer, List<SchemaPath> columns, ParquetReaderUtility.DateCorruptionStatus dateCorruptionStatus)
- ParquetRecordReader
  
  public ParquetRecordReader(FragmentContext fragmentContext, org.apache.hadoop.fs.Path path, int rowGroupIndex, org.apache.hadoop.fs.FileSystem fs, org.apache.parquet.compression.CompressionCodecFactory codecFactory, org.apache.parquet.hadoop.metadata.ParquetMetadata footer, List<SchemaPath> columns, ParquetReaderUtility.DateCorruptionStatus dateCorruptionStatus)
- ParquetRecordReader
  
  public ParquetRecordReader(FragmentContext fragmentContext, long numRecordsToRead, org.apache.hadoop.fs.Path path, int rowGroupIndex, org.apache.hadoop.fs.FileSystem fs, org.apache.parquet.compression.CompressionCodecFactory codecFactory, org.apache.parquet.hadoop.metadata.ParquetMetadata footer, List<SchemaPath> columns, ParquetReaderUtility.DateCorruptionStatus dateCorruptionStatus)
Method Details
- getDateCorruptionStatus
  
  public ParquetReaderUtility.DateCorruptionStatus getDateCorruptionStatus()
  
  Flag indicating if the old non-standard data format appears in this file, see DRILL-4203.
  
  Returns:
  
  true if the dates are corrupted and need to be corrected
- getCodecFactory
  
  public org.apache.parquet.compression.CompressionCodecFactory getCodecFactory()
- getHadoopPath
  
  public org.apache.hadoop.fs.Path getHadoopPath()
- getFileSystem
  
  public org.apache.hadoop.fs.FileSystem getFileSystem()
- getRowGroupIndex
  
  public int getRowGroupIndex()
- getBatchSizesMgr
  
  public RecordBatchSizerManager getBatchSizesMgr()
- getOperatorContext
  
  public OperatorContext getOperatorContext()
- getFragmentContext
  
  public FragmentContext getFragmentContext()
- useBulkReader
  
  public boolean useBulkReader()
  
  Returns:
  
  true if Parquet reader Bulk processing is enabled; false otherwise
- getReadState
  
  public ReadState getReadState()
- setup
  
  public void setup(OperatorContext operatorContext, OutputMutator output) throws ExecutionSetupException
  
  Prepare the Parquet reader. First determine the set of columns to read (the schema for this read.) Then, create a state object to track the read across calls to the reader next() method. Finally, create one of three readers to read batches depending on whether this scan is for only fixed-width fields, contains at least one variable-width field, or is a "mock" scan consisting only of null fields (fields in the SELECT clause but not in the Parquet file.)
  
  Parameters:
  
  operatorContext - operator context for the reader
  
  output - The place where output for a particular scan should be written. The record reader is responsible for mutating the set of schema values for that particular record.
  
  Throws:
  
  ExecutionSetupException
- allocate
  
  public void allocate(Map<String,ValueVector> vectorMap) throws OutOfMemoryException
  
  Specified by:
  
  allocate in interface RecordReader
  
  Overrides:
  
  allocate in class AbstractRecordReader
  
  Throws:
  
  OutOfMemoryException
- next
  
  public int next()
  
  Read the next record batch from the file using the reader and read state created previously.
  
  Returns:
  
  The number of additional records added to the output.
- close
  
  public void close()
- getDefaultColumnsToRead
  
  protected List<SchemaPath> getDefaultColumnsToRead()
  
  Overrides:
  
  getDefaultColumnsToRead in class AbstractRecordReader
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class AbstractRecordReader

Class ParquetRecordReader

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.drill.exec.store.CommonParquetRecordReader

Field Summary

Fields inherited from class org.apache.drill.exec.store.CommonParquetRecordReader

Fields inherited from class org.apache.drill.exec.store.AbstractRecordReader

Fields inherited from interface org.apache.drill.exec.store.RecordReader

Constructor Summary

Method Summary

Methods inherited from class org.apache.drill.exec.store.CommonParquetRecordReader

Methods inherited from class org.apache.drill.exec.store.AbstractRecordReader

Methods inherited from class java.lang.Object

Constructor Details

ParquetRecordReader

ParquetRecordReader

ParquetRecordReader

Method Details

getDateCorruptionStatus

getCodecFactory

getHadoopPath

getFileSystem

getRowGroupIndex

getBatchSizesMgr

getOperatorContext

getFragmentContext

useBulkReader

getReadState

setup

allocate

next

close

getDefaultColumnsToRead

toString