java.lang.Object

org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin<T>

Type Parameters:: T - the format plugin config for this reader

All Implemented Interfaces:: FormatPlugin

Direct Known Subclasses:: AvroFormatPlugin, BasePcapFormatPlugin, ExcelFormatPlugin, HDF5FormatPlugin, HttpdLogFormatPlugin, ImageFormatPlugin, JSONFormatPlugin, LogFormatPlugin, LTSVFormatPlugin, MSAccessFormatPlugin, PdfFormatPlugin, SasFormatPlugin, SequenceFileFormatPlugin, ShpFormatPlugin, SpssFormatPlugin, SyslogFormatPlugin, TextFormatPlugin, XMLFormatPlugin

public abstract class EasyFormatPlugin<T extends FormatPluginConfig> extends Object implements FormatPlugin

Base class for file readers.

Provides a bridge between the legacy RecordReader-style readers and the newer ManagedReader style. Over time, split the class, or provide a cleaner way to handle the differences.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

EasyFormatPlugin.EasyFormatConfig

Defines the static, programmer-defined options for this plugin.

static class

EasyFormatPlugin.EasyFormatConfigBuilder

static enum

EasyFormatPlugin.ScanFrameworkVersion
Field Summary

Fields

Modifier and Type

Field

Description

protected final T

formatConfig
Constructor Summary

Constructors

Modifier

Constructor

Description

protected

EasyFormatPlugin(String name, DrillbitContext context, org.apache.hadoop.conf.Configuration fsConf, StoragePluginConfig storageConfig, T formatConfig, boolean readable, boolean writable, boolean blockSplittable, boolean compressible, List<String> extensions, String defaultName)

Legacy constructor.

protected

EasyFormatPlugin(String name, EasyFormatPlugin.EasyFormatConfig config, DrillbitContext context, StoragePluginConfig storageConfig, T formatConfig)

Revised constructor in which settings are gathered into a configuration object.
Method Summary

Modifier and Type

Method

Description

protected void

configureScan(FileScanLifecycleBuilder builder, EasySubScan scan)

Configure an EVF (v2) scan, which must at least include the factory to create readers.

EasyFormatPlugin.EasyFormatConfig

easyConfig()

protected FileScanFramework.FileScanBuilder

frameworkBuilder(EasySubScan scan, OptionSet options)

Create the plugin-specific framework that manages the scan.

T

getConfig()

DrillbitContext

getContext()

org.apache.hadoop.conf.Configuration

getFsConf()

AbstractGroupScan

getGroupScan(String userName, FileSelection selection, List<SchemaPath> columns)

AbstractGroupScan

getGroupScan(String userName, FileSelection selection, List<SchemaPath> columns, MetadataProviderManager metadataProviderManager)

FormatMatcher

getMatcher()

String

getName()

Set<StoragePluginOptimizerRule>

getOptimizerRules()

protected CloseableRecordBatch

getReaderBatch(FragmentContext context, EasySubScan scan)

String

getReaderOperatorType()

RecordReader

getRecordReader(FragmentContext context, DrillFileSystem dfs, FileWork fileWork, List<SchemaPath> columns, String userName)

Return a record reader for the specific file format, when using the original ScanBatch scanner.

RecordWriter

getRecordWriter(FragmentContext context, EasyWriter writer)

protected ScanStats

getScanStats(PlannerSettings settings, EasyGroupScan scan)

StatisticsRecordWriter

getStatisticsRecordWriter(FragmentContext context, EasyWriter writer)

StoragePluginConfig

getStorageConfig()

AbstractWriter

getWriter(PhysicalOperator child, String location, List<String> partitionColumns)

CloseableRecordBatch

getWriterBatch(FragmentContext context, RecordBatch incoming, EasyWriter writer)

String

getWriterOperatorType()

protected void

initScanBuilder(FileScanFramework.FileScanBuilder builder, EasySubScan scan)

Initialize the scan framework builder with standard options.

boolean

isBlockSplittable()

Whether or not you can split the format based on blocks within file boundaries.

boolean

isCompressible()

Indicates whether or not this format could also be in a compression container (for example: csv.gz versus csv).

boolean

isStatisticsRecordWriter(FragmentContext context, EasyWriter writer)

ManagedReader<? extends FileScanFramework.FileSchemaNegotiator>

newBatchReader(EasySubScan scan, OptionSet options)

For EVF V1, to be removed.

DrillStatsTable.TableStatistics

readStatistics(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path statsTablePath)

protected EasyFormatPlugin.ScanFrameworkVersion

scanVersion(OptionSet options)

Choose whether to use the enhanced scan based on the row set and scan framework, or the "traditional" ad-hoc structure based on ScanBatch.

boolean

supportsAutoPartitioning()

Indicates whether this FormatPlugin supports auto-partitioning for CTAS statements

boolean

supportsFileImplicitColumns()

Whether this format plugin supports implicit file columns.

boolean

supportsLimitPushdown()

Does this plugin support pushing the limit down to the batch reader? If so, then the reader itself should have logic to stop reading the file as soon as the limit has been reached.

boolean

supportsPushDown()

Does this plugin support projection push down? That is, can the reader itself handle the tasks of projecting table columns, creating null columns for missing table columns, and so on?

boolean

supportsRead()

boolean

supportsStatistics()

boolean

supportsWrite()

void

writeStatistics(DrillStatsTable.TableStatistics statistics, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path statsTablePath)

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.drill.exec.store.dfs.FormatPlugin
getGroupScan, getGroupScan, getOptimizerRules

Field Details
- formatConfig
  
  protected final T extends FormatPluginConfig formatConfig
Constructor Details
- EasyFormatPlugin
  
  protected EasyFormatPlugin(String name, DrillbitContext context, org.apache.hadoop.conf.Configuration fsConf, StoragePluginConfig storageConfig, T formatConfig, boolean readable, boolean writable, boolean blockSplittable, boolean compressible, List<String> extensions, String defaultName)
  
  Legacy constructor.
- EasyFormatPlugin
  
  protected EasyFormatPlugin(String name, EasyFormatPlugin.EasyFormatConfig config, DrillbitContext context, StoragePluginConfig storageConfig, T formatConfig)
  
  Revised constructor in which settings are gathered into a configuration object.
  
  Parameters:
  
  name - name of the plugin
  
  config - configuration options for this plugin which determine developer-defined runtime behavior
  
  context - the global server-wide Drillbit context
  
  storageConfig - the configuration for the storage plugin that owns this format plugin
  
  formatConfig - the Jackson-serialized format configuration as created by the user in the Drill web console. Holds user-defined options
Method Details
- getFsConf
  
  public org.apache.hadoop.conf.Configuration getFsConf()
  
  Specified by:
  
  getFsConf in interface FormatPlugin
- getContext
  
  public DrillbitContext getContext()
  
  Specified by:
  
  getContext in interface FormatPlugin
- easyConfig
  
  public EasyFormatPlugin.EasyFormatConfig easyConfig()
- getName
  
  public String getName()
  
  Specified by:
  
  getName in interface FormatPlugin
- supportsLimitPushdown
  
  public boolean supportsLimitPushdown()
  
  Does this plugin support pushing the limit down to the batch reader? If so, then the reader itself should have logic to stop reading the file as soon as the limit has been reached. It makes the most sense to do this with file formats that have consistent schemata that are identified at the first row. CSV for example. If the user only wants 100 rows, it does not make sense to read the entire file.
- supportsPushDown
  
  public boolean supportsPushDown()
  
  Does this plugin support projection push down? That is, can the reader itself handle the tasks of projecting table columns, creating null columns for missing table columns, and so on?
  
  Returns:
  
  true if the plugin supports projection push-down, false if Drill should do the task by adding a project operator
- supportsFileImplicitColumns
  
  public boolean supportsFileImplicitColumns()
  
  Whether this format plugin supports implicit file columns.
  
  Returns:
  
  true if the plugin supports implicit file columns, false otherwise
- isBlockSplittable
  
  public boolean isBlockSplittable()
  
  Whether or not you can split the format based on blocks within file boundaries. If not, the simple format engine will only split on file boundaries.
  
  Returns:
  
  true if splitable.
- isCompressible
  
  public boolean isCompressible()
  
  Indicates whether or not this format could also be in a compression container (for example: csv.gz versus csv). If this format uses its own internal compression scheme, such as Parquet does, then this should return false.
  
  Returns:
  
  true if it is compressible
- getRecordReader
  
  public RecordReader getRecordReader(FragmentContext context, DrillFileSystem dfs, FileWork fileWork, List<SchemaPath> columns, String userName) throws ExecutionSetupException
  
  Return a record reader for the specific file format, when using the original ScanBatch scanner.
  
  Parameters:
  
  context - fragment context
  
  dfs - Drill file system
  
  fileWork - metadata about the file to be scanned
  
  columns - list of projected columns (or may just contain the wildcard)
  
  userName - the name of the user running the query
  
  Returns:
  
  a record reader for this format
  
  Throws:
  
  ExecutionSetupException - for many reasons
- getReaderBatch
  
  protected CloseableRecordBatch getReaderBatch(FragmentContext context, EasySubScan scan) throws ExecutionSetupException
  
  Throws:
  
  ExecutionSetupException
- scanVersion
  
  protected EasyFormatPlugin.ScanFrameworkVersion scanVersion(OptionSet options)
  
  Choose whether to use the enhanced scan based on the row set and scan framework, or the "traditional" ad-hoc structure based on ScanBatch. Normally set as a config option. Override this method if you want to make the choice based on a system/session option.
  
  Returns:
  
  true to use the enhanced scan framework, false for the traditional scan-batch framework
- initScanBuilder
  
  protected void initScanBuilder(FileScanFramework.FileScanBuilder builder, EasySubScan scan)
  
  Initialize the scan framework builder with standard options. Call this from the plugin-specific frameworkBuilder(EasySubScan, OptionSet) method. The plugin can then customize/revise options as needed.
  For EVF V1, to be removed.
  
  Parameters:
  
  builder - the scan framework builder you create in the frameworkBuilder(EasySubScan, OptionSet) method
  
  scan - the physical scan operator definition passed to the frameworkBuilder(EasySubScan, OptionSet) method
- newBatchReader
  
  public ManagedReader<? extends FileScanFramework.FileSchemaNegotiator> newBatchReader(EasySubScan scan, OptionSet options) throws ExecutionSetupException
  
  For EVF V1, to be removed.
  
  Throws:
  
  ExecutionSetupException
- frameworkBuilder
  
  protected FileScanFramework.FileScanBuilder frameworkBuilder(EasySubScan scan, OptionSet options) throws ExecutionSetupException
  
  Create the plugin-specific framework that manages the scan. The framework creates batch readers one by one for each file or block. It defines semantic rules for projection. It handles "early" or "late" schema readers. A typical framework builds on standardized frameworks for files in general or text files in particular.
  For EVF V1, to be removed.
  
  Parameters:
  
  scan - the physical operation definition for the scan operation. Contains one or more files to read. (The Easy format plugin works only for files.)
  
  Returns:
  
  the scan framework which orchestrates the scan operation across potentially many files
  
  Throws:
  
  ExecutionSetupException - for all setup failures
- configureScan
  
  protected void configureScan(FileScanLifecycleBuilder builder, EasySubScan scan)
  
  Configure an EVF (v2) scan, which must at least include the factory to create readers.
  
  Parameters:
  
  builder - the builder with default options already set, and which allows the plugin implementation to set others
- isStatisticsRecordWriter
  
  public boolean isStatisticsRecordWriter(FragmentContext context, EasyWriter writer)
- getRecordWriter
  
  public RecordWriter getRecordWriter(FragmentContext context, EasyWriter writer) throws IOException
  
  Throws:
  
  IOException
- getStatisticsRecordWriter
  
  public StatisticsRecordWriter getStatisticsRecordWriter(FragmentContext context, EasyWriter writer) throws IOException
  
  Throws:
  
  IOException
- getWriterBatch
  
  public CloseableRecordBatch getWriterBatch(FragmentContext context, RecordBatch incoming, EasyWriter writer) throws ExecutionSetupException
  
  Throws:
  
  ExecutionSetupException
- getScanStats
  
  protected ScanStats getScanStats(PlannerSettings settings, EasyGroupScan scan)
- getWriter
  
  public AbstractWriter getWriter(PhysicalOperator child, String location, List<String> partitionColumns)
  
  Specified by:
  
  getWriter in interface FormatPlugin
- getGroupScan
  
  public AbstractGroupScan getGroupScan(String userName, FileSelection selection, List<SchemaPath> columns) throws IOException
  
  Specified by:
  
  getGroupScan in interface FormatPlugin
  
  Throws:
  
  IOException
- getGroupScan
  
  public AbstractGroupScan getGroupScan(String userName, FileSelection selection, List<SchemaPath> columns, MetadataProviderManager metadataProviderManager) throws IOException
  
  Specified by:
  
  getGroupScan in interface FormatPlugin
  
  Throws:
  
  IOException
- getConfig
  
  public T getConfig()
  
  Specified by:
  
  getConfig in interface FormatPlugin
- getStorageConfig
  
  public StoragePluginConfig getStorageConfig()
  
  Specified by:
  
  getStorageConfig in interface FormatPlugin
- supportsRead
  
  public boolean supportsRead()
  
  Specified by:
  
  supportsRead in interface FormatPlugin
- supportsWrite
  
  public boolean supportsWrite()
  
  Specified by:
  
  supportsWrite in interface FormatPlugin
- supportsAutoPartitioning
  
  public boolean supportsAutoPartitioning()
  
  Description copied from interface: FormatPlugin
  
  Indicates whether this FormatPlugin supports auto-partitioning for CTAS statements
  
  Specified by:
  
  supportsAutoPartitioning in interface FormatPlugin
  
  Returns:
  
  true if auto-partitioning is supported
- getMatcher
  
  public FormatMatcher getMatcher()
  
  Specified by:
  
  getMatcher in interface FormatPlugin
- getOptimizerRules
  
  public Set<StoragePluginOptimizerRule> getOptimizerRules()
  
  Specified by:
  
  getOptimizerRules in interface FormatPlugin
- getReaderOperatorType
  
  public String getReaderOperatorType()
- getWriterOperatorType
  
  public String getWriterOperatorType()
- supportsStatistics
  
  public boolean supportsStatistics()
  
  Specified by:
  
  supportsStatistics in interface FormatPlugin
- readStatistics
  
  public DrillStatsTable.TableStatistics readStatistics(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path statsTablePath) throws IOException
  
  Specified by:
  
  readStatistics in interface FormatPlugin
  
  Throws:
  
  IOException
- writeStatistics
  
  public void writeStatistics(DrillStatsTable.TableStatistics statistics, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path statsTablePath) throws IOException
  
  Specified by:
  
  writeStatistics in interface FormatPlugin
  
  Throws:
  
  IOException

Class EasyFormatPlugin<T extends FormatPluginConfig>

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.drill.exec.store.dfs.FormatPlugin

Field Details

formatConfig

Constructor Details

EasyFormatPlugin

EasyFormatPlugin

Method Details

getFsConf

getContext

easyConfig

getName

supportsLimitPushdown

supportsPushDown

supportsFileImplicitColumns

isBlockSplittable

isCompressible

getRecordReader

getReaderBatch

scanVersion

initScanBuilder

newBatchReader

frameworkBuilder

configureScan

isStatisticsRecordWriter

getRecordWriter

getStatisticsRecordWriter

getWriterBatch

getScanStats

getWriter

getGroupScan

getGroupScan

getConfig

getStorageConfig

supportsRead

supportsWrite

supportsAutoPartitioning

getMatcher

getOptimizerRules

getReaderOperatorType

getWriterOperatorType

supportsStatistics

readStatistics

writeStatistics