org.apache.parquet.hadoop.ParquetFileWriter

All Implemented Interfaces:: AutoCloseable

public class ParquetFileWriter extends Object implements AutoCloseable

Internal implementation of the Parquet file writer as a block container
Note: this is temporary Drill-Parquet class needed to write empty parquet files. This is a full copy of the Parquet library implementation with the lines that throw an error on writing empty Parquet files commented out. See details in: PARQUET-2026 and DRILL-7907

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static enum

ParquetFileWriter.Mode
Field Summary

Fields

Modifier and Type

Field

Description

static final int

CURRENT_VERSION

static final String

EF_MAGIC_STR

static final byte[]

EFMAGIC

static final byte[]

MAGIC

static final String

MAGIC_STR

protected final org.apache.parquet.io.PositionOutputStream

out

static final String

PARQUET_COMMON_METADATA_FILE

static final String

PARQUET_METADATA_FILE
Constructor Summary

Constructors

Constructor

Description

ParquetFileWriter(org.apache.hadoop.conf.Configuration configuration, org.apache.parquet.schema.MessageType schema, org.apache.hadoop.fs.Path file)

Deprecated.
will be removed in 2.0.0

ParquetFileWriter(org.apache.hadoop.conf.Configuration configuration, org.apache.parquet.schema.MessageType schema, org.apache.hadoop.fs.Path file, ParquetFileWriter.Mode mode)

Deprecated.
will be removed in 2.0.0

ParquetFileWriter(org.apache.hadoop.conf.Configuration configuration, org.apache.parquet.schema.MessageType schema, org.apache.hadoop.fs.Path file, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize)

Deprecated.
will be removed in 2.0.0

ParquetFileWriter(org.apache.parquet.io.OutputFile file, org.apache.parquet.schema.MessageType schema, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize)

Deprecated.
will be removed in 2.0.0

ParquetFileWriter(org.apache.parquet.io.OutputFile file, org.apache.parquet.schema.MessageType schema, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize, int columnIndexTruncateLength, int statisticsTruncateLength, boolean pageWriteChecksumEnabled)

ParquetFileWriter(org.apache.parquet.io.OutputFile file, org.apache.parquet.schema.MessageType schema, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize, int columnIndexTruncateLength, int statisticsTruncateLength, boolean pageWriteChecksumEnabled, org.apache.parquet.crypto.FileEncryptionProperties encryptionProperties)

ParquetFileWriter(org.apache.parquet.io.OutputFile file, org.apache.parquet.schema.MessageType schema, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize, int columnIndexTruncateLength, int statisticsTruncateLength, boolean pageWriteChecksumEnabled, org.apache.parquet.crypto.InternalFileEncryptor encryptor)

Deprecated.

ParquetFileWriter(org.apache.parquet.io.OutputFile file, org.apache.parquet.schema.MessageType schema, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize, org.apache.parquet.crypto.FileEncryptionProperties encryptionProperties, org.apache.parquet.column.ParquetProperties props)
Method Summary

Modifier and Type

Method

Description

void

addBloomFilter(String column, org.apache.parquet.column.values.bloomfilter.BloomFilter bloomFilter)

Add a Bloom filter that will be written out.

void

appendColumnChunk(org.apache.parquet.column.ColumnDescriptor descriptor, org.apache.parquet.io.SeekableInputStream from, org.apache.parquet.hadoop.metadata.ColumnChunkMetaData chunk, org.apache.parquet.column.values.bloomfilter.BloomFilter bloomFilter, org.apache.parquet.internal.column.columnindex.ColumnIndex columnIndex, org.apache.parquet.internal.column.columnindex.OffsetIndex offsetIndex)

void

appendFile(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path file)

Deprecated.
will be removed in 2.0.0; use appendFile(InputFile) instead

void

appendFile(org.apache.parquet.io.InputFile file)

void

appendRowGroup(org.apache.hadoop.fs.FSDataInputStream from, org.apache.parquet.hadoop.metadata.BlockMetaData rowGroup, boolean dropColumns)

Deprecated.
will be removed in 2.0.0; use appendRowGroup(SeekableInputStream, BlockMetaData, boolean) instead

void

appendRowGroup(org.apache.parquet.io.SeekableInputStream from, org.apache.parquet.hadoop.metadata.BlockMetaData rowGroup, boolean dropColumns)

void

appendRowGroups(org.apache.hadoop.fs.FSDataInputStream file, List<org.apache.parquet.hadoop.metadata.BlockMetaData> rowGroups, boolean dropColumns)

Deprecated.
will be removed in 2.0.0; use appendRowGroups(SeekableInputStream, List, boolean) instead

void

appendRowGroups(org.apache.parquet.io.SeekableInputStream file, List<org.apache.parquet.hadoop.metadata.BlockMetaData> rowGroups, boolean dropColumns)

void

close()

void

end(Map<String,String> extraMetaData)

ends a file once all blocks have been written.

void

endBlock()

ends a block once all column chunks have been written

void

endColumn()

end a column (once all rep, def and data have been written)

org.apache.parquet.crypto.InternalFileEncryptor

getEncryptor()

org.apache.parquet.hadoop.metadata.ParquetMetadata

getFooter()

long

getNextRowGroupSize()

long

getPos()

void

invalidateStatistics(org.apache.parquet.column.statistics.Statistics<?> totalStatistics)

Overwrite the column total statistics.

static org.apache.parquet.hadoop.metadata.ParquetMetadata

mergeMetadataFiles(List<org.apache.hadoop.fs.Path> files, org.apache.hadoop.conf.Configuration conf)

Deprecated.
metadata files are not recommended and will be removed in 2.0.0

static org.apache.parquet.hadoop.metadata.ParquetMetadata

mergeMetadataFiles(List<org.apache.hadoop.fs.Path> files, org.apache.hadoop.conf.Configuration conf, org.apache.parquet.hadoop.metadata.KeyValueMetadataMergeStrategy keyValueMetadataMergeStrategy)

Deprecated.
metadata files are not recommended and will be removed in 2.0.0

void

start()

start the file

void

startBlock(long recordCount)

start a block

void

startColumn(org.apache.parquet.column.ColumnDescriptor descriptor, long valueCount, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName)

start a column inside a block

void

writeDataPage(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding)

Deprecated.

void

writeDataPage(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.statistics.Statistics<?> statistics, long rowCount, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding)

Writes a single page

void

writeDataPage(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.statistics.Statistics<?> statistics, long rowCount, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding, org.apache.parquet.format.BlockCipher.Encryptor metadataBlockEncryptor, byte[] pageHeaderAAD)

Writes a single page

void

writeDataPage(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.statistics.Statistics<?> statistics, long rowCount, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding, org.apache.parquet.format.BlockCipher.Encryptor metadataBlockEncryptor, byte[] pageHeaderAAD, org.apache.parquet.column.statistics.SizeStatistics sizeStatistics)

Writes a single page

void

writeDataPage(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.statistics.Statistics<?> statistics, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding)

Deprecated.
this method does not support writing column indexes; Use writeDataPage(int, int, BytesInput, Statistics, long, Encoding, Encoding, Encoding) instead

void

writeDataPage(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.statistics.Statistics<?> statistics, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding, org.apache.parquet.format.BlockCipher.Encryptor metadataBlockEncryptor, byte[] pageHeaderAAD)

writes a single page

void

writeDataPage(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.statistics.Statistics<?> statistics, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding, org.apache.parquet.format.BlockCipher.Encryptor metadataBlockEncryptor, byte[] pageHeaderAAD, org.apache.parquet.column.statistics.SizeStatistics sizeStatistics)

writes a single page

void

writeDataPageV2(int rowCount, int nullCount, int valueCount, org.apache.parquet.bytes.BytesInput repetitionLevels, org.apache.parquet.bytes.BytesInput definitionLevels, org.apache.parquet.column.Encoding dataEncoding, org.apache.parquet.bytes.BytesInput compressedData, int uncompressedDataSize, org.apache.parquet.column.statistics.Statistics<?> statistics)

Writes a single v2 data page

void

writeDataPageV2(int rowCount, int nullCount, int valueCount, org.apache.parquet.bytes.BytesInput repetitionLevels, org.apache.parquet.bytes.BytesInput definitionLevels, org.apache.parquet.column.Encoding dataEncoding, org.apache.parquet.bytes.BytesInput compressedData, int uncompressedDataSize, org.apache.parquet.column.statistics.Statistics<?> statistics, org.apache.parquet.format.BlockCipher.Encryptor metadataBlockEncryptor, byte[] pageHeaderAAD)

Writes a single v2 data page

void

writeDataPageV2(int rowCount, int nullCount, int valueCount, org.apache.parquet.bytes.BytesInput repetitionLevels, org.apache.parquet.bytes.BytesInput definitionLevels, org.apache.parquet.column.Encoding dataEncoding, org.apache.parquet.bytes.BytesInput compressedData, int uncompressedDataSize, org.apache.parquet.column.statistics.Statistics<?> statistics, org.apache.parquet.format.BlockCipher.Encryptor metadataBlockEncryptor, byte[] pageHeaderAAD, org.apache.parquet.column.statistics.SizeStatistics sizeStatistics)

Writes a single v2 data page

void

writeDictionaryPage(org.apache.parquet.column.page.DictionaryPage dictionaryPage)

writes a dictionary page

void

writeDictionaryPage(org.apache.parquet.column.page.DictionaryPage dictionaryPage, org.apache.parquet.format.BlockCipher.Encryptor headerBlockEncryptor, byte[] AAD)

static void

writeMergedMetadataFile(List<org.apache.hadoop.fs.Path> files, org.apache.hadoop.fs.Path outputPath, org.apache.hadoop.conf.Configuration conf)

Deprecated.
metadata files are not recommended and will be removed in 2.0.0

static void

writeMetadataFile(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path outputPath, List<org.apache.parquet.hadoop.Footer> footers)

Deprecated.
metadata files are not recommended and will be removed in 2.0.0

static void

writeMetadataFile(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path outputPath, List<org.apache.parquet.hadoop.Footer> footers, org.apache.parquet.hadoop.ParquetOutputFormat.JobSummaryLevel level)

Deprecated.
metadata files are not recommended and will be removed in 2.0.0

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- PARQUET_METADATA_FILE
  
  public static final String PARQUET_METADATA_FILE
  See Also:
  
  Constant Field Values
- MAGIC_STR
  
  public static final String MAGIC_STR
  See Also:
  
  Constant Field Values
- MAGIC
  
  public static final byte[] MAGIC
- EF_MAGIC_STR
  
  public static final String EF_MAGIC_STR
  See Also:
  
  Constant Field Values
- EFMAGIC
  
  public static final byte[] EFMAGIC
- PARQUET_COMMON_METADATA_FILE
  
  public static final String PARQUET_COMMON_METADATA_FILE
  See Also:
  
  Constant Field Values
- CURRENT_VERSION
  
  public static final int CURRENT_VERSION
  See Also:
  
  Constant Field Values
- out
  
  protected final org.apache.parquet.io.PositionOutputStream out
Constructor Details
- ParquetFileWriter
  
  @Deprecated public ParquetFileWriter(org.apache.hadoop.conf.Configuration configuration, org.apache.parquet.schema.MessageType schema, org.apache.hadoop.fs.Path file) throws IOException
  
  Deprecated.
  will be removed in 2.0.0
  
  Parameters:
  
  configuration - Hadoop configuration
  
  schema - the schema of the data
  
  file - the file to write to
  
  Throws:
  
  IOException - if the file can not be created
- ParquetFileWriter
  
  @Deprecated public ParquetFileWriter(org.apache.hadoop.conf.Configuration configuration, org.apache.parquet.schema.MessageType schema, org.apache.hadoop.fs.Path file, ParquetFileWriter.Mode mode) throws IOException
  
  Deprecated.
  will be removed in 2.0.0
  
  Parameters:
  
  configuration - Hadoop configuration
  
  schema - the schema of the data
  
  file - the file to write to
  
  mode - file creation mode
  
  Throws:
  
  IOException - if the file can not be created
- ParquetFileWriter
  
  @Deprecated public ParquetFileWriter(org.apache.hadoop.conf.Configuration configuration, org.apache.parquet.schema.MessageType schema, org.apache.hadoop.fs.Path file, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize) throws IOException
  
  Deprecated.
  will be removed in 2.0.0
  
  Parameters:
  
  configuration - Hadoop configuration
  
  schema - the schema of the data
  
  file - the file to write to
  
  mode - file creation mode
  
  rowGroupSize - the row group size
  
  maxPaddingSize - the maximum padding
  
  Throws:
  
  IOException - if the file can not be created
- ParquetFileWriter
  
  @Deprecated public ParquetFileWriter(org.apache.parquet.io.OutputFile file, org.apache.parquet.schema.MessageType schema, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize) throws IOException
  
  Deprecated.
  will be removed in 2.0.0
  
  Parameters:
  
  file - OutputFile to create or overwrite
  
  schema - the schema of the data
  
  mode - file creation mode
  
  rowGroupSize - the row group size
  
  maxPaddingSize - the maximum padding
  
  Throws:
  
  IOException - if the file can not be created
- ParquetFileWriter
  
  public ParquetFileWriter(org.apache.parquet.io.OutputFile file, org.apache.parquet.schema.MessageType schema, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize, int columnIndexTruncateLength, int statisticsTruncateLength, boolean pageWriteChecksumEnabled) throws IOException
  
  Parameters:
  
  file - OutputFile to create or overwrite
  
  schema - the schema of the data
  
  mode - file creation mode
  
  rowGroupSize - the row group size
  
  maxPaddingSize - the maximum padding
  
  columnIndexTruncateLength - the length which the min/max values in column indexes tried to be truncated to
  
  statisticsTruncateLength - the length which the min/max values in row groups tried to be truncated to
  
  pageWriteChecksumEnabled - whether to write out page level checksums
  
  Throws:
  
  IOException - if the file can not be created
- ParquetFileWriter
  
  public ParquetFileWriter(org.apache.parquet.io.OutputFile file, org.apache.parquet.schema.MessageType schema, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize, int columnIndexTruncateLength, int statisticsTruncateLength, boolean pageWriteChecksumEnabled, org.apache.parquet.crypto.FileEncryptionProperties encryptionProperties) throws IOException
  
  Throws:
  
  IOException
- ParquetFileWriter
  
  public ParquetFileWriter(org.apache.parquet.io.OutputFile file, org.apache.parquet.schema.MessageType schema, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize, org.apache.parquet.crypto.FileEncryptionProperties encryptionProperties, org.apache.parquet.column.ParquetProperties props) throws IOException
  
  Throws:
  
  IOException
- ParquetFileWriter
  
  @Deprecated public ParquetFileWriter(org.apache.parquet.io.OutputFile file, org.apache.parquet.schema.MessageType schema, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize, int columnIndexTruncateLength, int statisticsTruncateLength, boolean pageWriteChecksumEnabled, org.apache.parquet.crypto.InternalFileEncryptor encryptor) throws IOException
  
  Deprecated.
  
  Throws:
  
  IOException
Method Details
- start
  
  public void start() throws IOException
  
  start the file
  
  Throws:
  
  IOException - if there is an error while writing
- getEncryptor
  
  public org.apache.parquet.crypto.InternalFileEncryptor getEncryptor()
- startBlock
  
  public void startBlock(long recordCount) throws IOException
  
  start a block
  
  Parameters:
  
  recordCount - the record count in this block
  
  Throws:
  
  IOException - if there is an error while writing
- startColumn
  
  public void startColumn(org.apache.parquet.column.ColumnDescriptor descriptor, long valueCount, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName) throws IOException
  
  start a column inside a block
  
  Parameters:
  
  descriptor - the column descriptor
  
  valueCount - the value count in this column
  
  compressionCodecName - a compression codec name
  
  Throws:
  
  IOException - if there is an error while writing
- writeDictionaryPage
  
  public void writeDictionaryPage(org.apache.parquet.column.page.DictionaryPage dictionaryPage) throws IOException
  
  writes a dictionary page
  
  Parameters:
  
  dictionaryPage - the dictionary page
  
  Throws:
  
  IOException - if there is an error while writing
- writeDictionaryPage
  
  public void writeDictionaryPage(org.apache.parquet.column.page.DictionaryPage dictionaryPage, org.apache.parquet.format.BlockCipher.Encryptor headerBlockEncryptor, byte[] AAD) throws IOException
  
  Throws:
  
  IOException
- writeDataPage
  
  @Deprecated public void writeDataPage(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding) throws IOException
  
  Deprecated.
  
  writes a single page
  
  Parameters:
  
  valueCount - count of values
  
  uncompressedPageSize - the size of the data once uncompressed
  
  bytes - the compressed data for the page without header
  
  rlEncoding - encoding of the repetition level
  
  dlEncoding - encoding of the definition level
  
  valuesEncoding - encoding of values
  
  Throws:
  
  IOException - if there is an error while writing
- writeDataPage
  
  @Deprecated public void writeDataPage(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.statistics.Statistics<?> statistics, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding) throws IOException
  
  Deprecated.
  this method does not support writing column indexes; Use writeDataPage(int, int, BytesInput, Statistics, long, Encoding, Encoding, Encoding) instead
  
  writes a single page
  
  Parameters:
  
  valueCount - count of values
  
  uncompressedPageSize - the size of the data once uncompressed
  
  bytes - the compressed data for the page without header
  
  statistics - statistics for the page
  
  rlEncoding - encoding of the repetition level
  
  dlEncoding - encoding of the definition level
  
  valuesEncoding - encoding of values
  
  Throws:
  
  IOException - if there is an error while writing
- writeDataPage
  
  public void writeDataPage(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.statistics.Statistics<?> statistics, long rowCount, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding) throws IOException
  
  Writes a single page
  
  Parameters:
  
  valueCount - count of values
  
  uncompressedPageSize - the size of the data once uncompressed
  
  bytes - the compressed data for the page without header
  
  statistics - the statistics of the page
  
  rowCount - the number of rows in the page
  
  rlEncoding - encoding of the repetition level
  
  dlEncoding - encoding of the definition level
  
  valuesEncoding - encoding of values
  
  Throws:
  
  IOException - if any I/O error occurs during writing the file
- writeDataPage
  
  public void writeDataPage(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.statistics.Statistics<?> statistics, long rowCount, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding, org.apache.parquet.format.BlockCipher.Encryptor metadataBlockEncryptor, byte[] pageHeaderAAD) throws IOException
  
  Writes a single page
  
  Parameters:
  
  valueCount - count of values
  
  uncompressedPageSize - the size of the data once uncompressed
  
  bytes - the compressed data for the page without header
  
  statistics - the statistics of the page
  
  rowCount - the number of rows in the page
  
  rlEncoding - encoding of the repetition level
  
  dlEncoding - encoding of the definition level
  
  valuesEncoding - encoding of values
  
  metadataBlockEncryptor - encryptor for block data
  
  pageHeaderAAD - pageHeader AAD
  
  Throws:
  
  IOException - if any I/O error occurs during writing the file
- writeDataPage
  
  public void writeDataPage(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.statistics.Statistics<?> statistics, long rowCount, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding, org.apache.parquet.format.BlockCipher.Encryptor metadataBlockEncryptor, byte[] pageHeaderAAD, org.apache.parquet.column.statistics.SizeStatistics sizeStatistics) throws IOException
  
  Writes a single page
  
  Parameters:
  
  valueCount - count of values
  
  uncompressedPageSize - the size of the data once uncompressed
  
  bytes - the compressed data for the page without header
  
  statistics - the statistics of the page
  
  rowCount - the number of rows in the page
  
  rlEncoding - encoding of the repetition level
  
  dlEncoding - encoding of the definition level
  
  valuesEncoding - encoding of values
  
  metadataBlockEncryptor - encryptor for block data
  
  pageHeaderAAD - pageHeader AAD
  
  sizeStatistics - size statistics for the page
  
  Throws:
  
  IOException - if any I/O error occurs during writing the file
- writeDataPage
  
  public void writeDataPage(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.statistics.Statistics<?> statistics, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding, org.apache.parquet.format.BlockCipher.Encryptor metadataBlockEncryptor, byte[] pageHeaderAAD) throws IOException
  
  writes a single page
  
  Parameters:
  
  valueCount - count of values
  
  uncompressedPageSize - the size of the data once uncompressed
  
  bytes - the compressed data for the page without header
  
  statistics - statistics for the page
  
  rlEncoding - encoding of the repetition level
  
  dlEncoding - encoding of the definition level
  
  valuesEncoding - encoding of values
  
  metadataBlockEncryptor - encryptor for block data
  
  pageHeaderAAD - pageHeader AAD
  
  Throws:
  
  IOException - if there is an error while writing
- writeDataPage
  
  public void writeDataPage(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, org.apache.parquet.column.statistics.Statistics<?> statistics, org.apache.parquet.column.Encoding rlEncoding, org.apache.parquet.column.Encoding dlEncoding, org.apache.parquet.column.Encoding valuesEncoding, org.apache.parquet.format.BlockCipher.Encryptor metadataBlockEncryptor, byte[] pageHeaderAAD, org.apache.parquet.column.statistics.SizeStatistics sizeStatistics) throws IOException
  
  writes a single page
  
  Parameters:
  
  valueCount - count of values
  
  uncompressedPageSize - the size of the data once uncompressed
  
  bytes - the compressed data for the page without header
  
  statistics - statistics for the page
  
  rlEncoding - encoding of the repetition level
  
  dlEncoding - encoding of the definition level
  
  valuesEncoding - encoding of values
  
  metadataBlockEncryptor - encryptor for block data
  
  pageHeaderAAD - pageHeader AAD
  
  sizeStatistics - size statistics for the page
  
  Throws:
  
  IOException - if there is an error while writing
- addBloomFilter
  
  public void addBloomFilter(String column, org.apache.parquet.column.values.bloomfilter.BloomFilter bloomFilter)
  
  Add a Bloom filter that will be written out.
  
  Parameters:
  
  column - the column name
  
  bloomFilter - the bloom filter of column values
- writeDataPageV2
  
  public void writeDataPageV2(int rowCount, int nullCount, int valueCount, org.apache.parquet.bytes.BytesInput repetitionLevels, org.apache.parquet.bytes.BytesInput definitionLevels, org.apache.parquet.column.Encoding dataEncoding, org.apache.parquet.bytes.BytesInput compressedData, int uncompressedDataSize, org.apache.parquet.column.statistics.Statistics<?> statistics) throws IOException
  
  Writes a single v2 data page
  
  Parameters:
  
  rowCount - count of rows
  
  nullCount - count of nulls
  
  valueCount - count of values
  
  repetitionLevels - repetition level bytes
  
  definitionLevels - definition level bytes
  
  dataEncoding - encoding for data
  
  compressedData - compressed data bytes
  
  uncompressedDataSize - the size of uncompressed data
  
  statistics - the statistics of the page
  
  Throws:
  
  IOException - if any I/O error occurs during writing the file
- writeDataPageV2
  
  public void writeDataPageV2(int rowCount, int nullCount, int valueCount, org.apache.parquet.bytes.BytesInput repetitionLevels, org.apache.parquet.bytes.BytesInput definitionLevels, org.apache.parquet.column.Encoding dataEncoding, org.apache.parquet.bytes.BytesInput compressedData, int uncompressedDataSize, org.apache.parquet.column.statistics.Statistics<?> statistics, org.apache.parquet.format.BlockCipher.Encryptor metadataBlockEncryptor, byte[] pageHeaderAAD) throws IOException
  
  Writes a single v2 data page
  
  Parameters:
  
  rowCount - count of rows
  
  nullCount - count of nulls
  
  valueCount - count of values
  
  repetitionLevels - repetition level bytes
  
  definitionLevels - definition level bytes
  
  dataEncoding - encoding for data
  
  compressedData - compressed data bytes
  
  uncompressedDataSize - the size of uncompressed data
  
  statistics - the statistics of the page
  
  metadataBlockEncryptor - encryptor for block data
  
  pageHeaderAAD - pageHeader AAD
  
  Throws:
  
  IOException - if any I/O error occurs during writing the file
- writeDataPageV2
  
  public void writeDataPageV2(int rowCount, int nullCount, int valueCount, org.apache.parquet.bytes.BytesInput repetitionLevels, org.apache.parquet.bytes.BytesInput definitionLevels, org.apache.parquet.column.Encoding dataEncoding, org.apache.parquet.bytes.BytesInput compressedData, int uncompressedDataSize, org.apache.parquet.column.statistics.Statistics<?> statistics, org.apache.parquet.format.BlockCipher.Encryptor metadataBlockEncryptor, byte[] pageHeaderAAD, org.apache.parquet.column.statistics.SizeStatistics sizeStatistics) throws IOException
  
  Writes a single v2 data page
  
  Parameters:
  
  rowCount - count of rows
  
  nullCount - count of nulls
  
  valueCount - count of values
  
  repetitionLevels - repetition level bytes
  
  definitionLevels - definition level bytes
  
  dataEncoding - encoding for data
  
  compressedData - compressed data bytes
  
  uncompressedDataSize - the size of uncompressed data
  
  statistics - the statistics of the page
  
  metadataBlockEncryptor - encryptor for block data
  
  pageHeaderAAD - pageHeader AAD
  
  sizeStatistics - size statistics for the page
  
  Throws:
  
  IOException - if any I/O error occurs during writing the file
- invalidateStatistics
  
  public void invalidateStatistics(org.apache.parquet.column.statistics.Statistics<?> totalStatistics)
  
  Overwrite the column total statistics. This special used when the column total statistics is known while all the page statistics are invalid, for example when rewriting the column.
  
  Parameters:
  
  totalStatistics - the column total statistics
- endColumn
  
  public void endColumn() throws IOException
  
  end a column (once all rep, def and data have been written)
  
  Throws:
  
  IOException - if there is an error while writing
- endBlock
  
  public void endBlock() throws IOException
  
  ends a block once all column chunks have been written
  
  Throws:
  
  IOException - if there is an error while writing
- appendFile
  
  @Deprecated public void appendFile(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path file) throws IOException
  
  Deprecated.
  will be removed in 2.0.0; use appendFile(InputFile) instead
  
  Parameters:
  
  conf - a configuration
  
  file - a file path to append the contents of to this file
  
  Throws:
  
  IOException - if there is an error while reading or writing
- appendFile
  
  public void appendFile(org.apache.parquet.io.InputFile file) throws IOException
  
  Throws:
  
  IOException
- appendRowGroups
  
  @Deprecated public void appendRowGroups(org.apache.hadoop.fs.FSDataInputStream file, List<org.apache.parquet.hadoop.metadata.BlockMetaData> rowGroups, boolean dropColumns) throws IOException
  
  Deprecated.
  will be removed in 2.0.0; use appendRowGroups(SeekableInputStream, List, boolean) instead
  
  Parameters:
  
  file - a file stream to read from
  
  rowGroups - row groups to copy
  
  dropColumns - whether to drop columns from the file that are not in this file's schema
  
  Throws:
  
  IOException - if there is an error while reading or writing
- appendRowGroups
  
  public void appendRowGroups(org.apache.parquet.io.SeekableInputStream file, List<org.apache.parquet.hadoop.metadata.BlockMetaData> rowGroups, boolean dropColumns) throws IOException
  
  Throws:
  
  IOException
- appendRowGroup
  
  @Deprecated public void appendRowGroup(org.apache.hadoop.fs.FSDataInputStream from, org.apache.parquet.hadoop.metadata.BlockMetaData rowGroup, boolean dropColumns) throws IOException
  
  Deprecated.
  will be removed in 2.0.0; use appendRowGroup(SeekableInputStream, BlockMetaData, boolean) instead
  
  Parameters:
  
  from - a file stream to read from
  
  rowGroup - row group to copy
  
  dropColumns - whether to drop columns from the file that are not in this file's schema
  
  Throws:
  
  IOException - if there is an error while reading or writing
- appendRowGroup
  
  public void appendRowGroup(org.apache.parquet.io.SeekableInputStream from, org.apache.parquet.hadoop.metadata.BlockMetaData rowGroup, boolean dropColumns) throws IOException
  
  Throws:
  
  IOException
- appendColumnChunk
  
  public void appendColumnChunk(org.apache.parquet.column.ColumnDescriptor descriptor, org.apache.parquet.io.SeekableInputStream from, org.apache.parquet.hadoop.metadata.ColumnChunkMetaData chunk, org.apache.parquet.column.values.bloomfilter.BloomFilter bloomFilter, org.apache.parquet.internal.column.columnindex.ColumnIndex columnIndex, org.apache.parquet.internal.column.columnindex.OffsetIndex offsetIndex) throws IOException
  
  Parameters:
  
  descriptor - the descriptor for the target column
  
  from - a file stream to read from
  
  chunk - the column chunk to be copied
  
  bloomFilter - the bloomFilter for this chunk
  
  columnIndex - the column index for this chunk
  
  offsetIndex - the offset index for this chunk
  
  Throws:
  
  IOException
- end
  
  public void end(Map<String,String> extraMetaData) throws IOException
  
  ends a file once all blocks have been written. closes the file.
  
  Parameters:
  
  extraMetaData - the extra meta data to write in the footer
  
  Throws:
  
  IOException - if there is an error while writing
- close
  
  public void close() throws IOException
  
  Specified by:
  
  close in interface AutoCloseable
  
  Throws:
  
  IOException
- getFooter
  
  public org.apache.parquet.hadoop.metadata.ParquetMetadata getFooter()
- mergeMetadataFiles
  
  @Deprecated public static org.apache.parquet.hadoop.metadata.ParquetMetadata mergeMetadataFiles(List<org.apache.hadoop.fs.Path> files, org.apache.hadoop.conf.Configuration conf) throws IOException
  
  Deprecated.
  metadata files are not recommended and will be removed in 2.0.0
  
  Given a list of metadata files, merge them into a single ParquetMetadata Requires that the schemas be compatible, and the extraMetadata be exactly equal.
  
  Parameters:
  
  files - a list of files to merge metadata from
  
  conf - a configuration
  
  Returns:
  
  merged parquet metadata for the files
  
  Throws:
  
  IOException - if there is an error while writing
- mergeMetadataFiles
  
  @Deprecated public static org.apache.parquet.hadoop.metadata.ParquetMetadata mergeMetadataFiles(List<org.apache.hadoop.fs.Path> files, org.apache.hadoop.conf.Configuration conf, org.apache.parquet.hadoop.metadata.KeyValueMetadataMergeStrategy keyValueMetadataMergeStrategy) throws IOException
  
  Deprecated.
  metadata files are not recommended and will be removed in 2.0.0
  
  Given a list of metadata files, merge them into a single ParquetMetadata Requires that the schemas be compatible, and the extraMetadata be exactly equal.
  
  Parameters:
  
  files - a list of files to merge metadata from
  
  conf - a configuration
  
  keyValueMetadataMergeStrategy - strategy to merge values for same key, if there are multiple
  
  Returns:
  
  merged parquet metadata for the files
  
  Throws:
  
  IOException - if there is an error while writing
- writeMergedMetadataFile
  
  @Deprecated public static void writeMergedMetadataFile(List<org.apache.hadoop.fs.Path> files, org.apache.hadoop.fs.Path outputPath, org.apache.hadoop.conf.Configuration conf) throws IOException
  
  Deprecated.
  metadata files are not recommended and will be removed in 2.0.0
  
  Given a list of metadata files, merge them into a single metadata file. Requires that the schemas be compatible, and the extraMetaData be exactly equal. This is useful when merging 2 directories of parquet files into a single directory, as long as both directories were written with compatible schemas and equal extraMetaData.
  
  Parameters:
  
  files - a list of files to merge metadata from
  
  outputPath - path to write merged metadata to
  
  conf - a configuration
  
  Throws:
  
  IOException - if there is an error while reading or writing
- writeMetadataFile
  
  @Deprecated public static void writeMetadataFile(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path outputPath, List<org.apache.parquet.hadoop.Footer> footers) throws IOException
  
  Deprecated.
  metadata files are not recommended and will be removed in 2.0.0
  
  writes a _metadata and _common_metadata file
  
  Parameters:
  
  configuration - the configuration to use to get the FileSystem
  
  outputPath - the directory to write the _metadata file to
  
  footers - the list of footers to merge
  
  Throws:
  
  IOException - if there is an error while writing
- writeMetadataFile
  
  @Deprecated public static void writeMetadataFile(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path outputPath, List<org.apache.parquet.hadoop.Footer> footers, org.apache.parquet.hadoop.ParquetOutputFormat.JobSummaryLevel level) throws IOException
  
  Deprecated.
  metadata files are not recommended and will be removed in 2.0.0
  
  writes _common_metadata file, and optionally a _metadata file depending on the ParquetOutputFormat.JobSummaryLevel provided
  
  Parameters:
  
  configuration - the configuration to use to get the FileSystem
  
  outputPath - the directory to write the _metadata file to
  
  footers - the list of footers to merge
  
  level - level of summary to write
  
  Throws:
  
  IOException - if there is an error while writing
- getPos
  
  public long getPos() throws IOException
  
  Returns:
  
  the current position in the underlying file
  
  Throws:
  
  IOException - if there is an error while getting the current stream's position
- getNextRowGroupSize
  
  public long getNextRowGroupSize() throws IOException
  
  Throws:
  
  IOException

Class ParquetFileWriter

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

PARQUET_METADATA_FILE

MAGIC_STR

MAGIC

EF_MAGIC_STR

EFMAGIC

PARQUET_COMMON_METADATA_FILE

CURRENT_VERSION

out

Constructor Details

ParquetFileWriter

ParquetFileWriter

ParquetFileWriter

ParquetFileWriter

ParquetFileWriter

ParquetFileWriter

ParquetFileWriter

ParquetFileWriter

Method Details

start

getEncryptor

startBlock

startColumn

writeDictionaryPage

writeDictionaryPage

writeDataPage

writeDataPage

writeDataPage

writeDataPage

writeDataPage

writeDataPage

writeDataPage

addBloomFilter

writeDataPageV2

writeDataPageV2

writeDataPageV2

invalidateStatistics

endColumn

endBlock

appendFile

appendFile

appendRowGroups

appendRowGroups

appendRowGroup

appendRowGroup

appendColumnChunk

end

close

getFooter

mergeMetadataFiles

mergeMetadataFiles

writeMergedMetadataFile

writeMetadataFile

writeMetadataFile

getPos

getNextRowGroupSize