Class EasyFormatPlugin<T extends FormatPluginConfig>
java.lang.Object
org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin<T>
- Type Parameters:
T
- the format plugin config for this reader
- All Implemented Interfaces:
FormatPlugin
- Direct Known Subclasses:
AvroFormatPlugin
,BasePcapFormatPlugin
,ExcelFormatPlugin
,HDF5FormatPlugin
,HttpdLogFormatPlugin
,ImageFormatPlugin
,JSONFormatPlugin
,LogFormatPlugin
,LTSVFormatPlugin
,MSAccessFormatPlugin
,PdfFormatPlugin
,SasFormatPlugin
,SequenceFileFormatPlugin
,ShpFormatPlugin
,SpssFormatPlugin
,SyslogFormatPlugin
,TextFormatPlugin
,XMLFormatPlugin
public abstract class EasyFormatPlugin<T extends FormatPluginConfig>
extends Object
implements FormatPlugin
Base class for file readers.
Provides a bridge between the legacy RecordReader
-style
readers and the newer ManagedReader
style. Over time, split the
class, or provide a cleaner way to handle the differences.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
Defines the static, programmer-defined options for this plugin.static class
static enum
-
Field Summary
-
Constructor Summary
ModifierConstructorDescriptionprotected
EasyFormatPlugin
(String name, DrillbitContext context, org.apache.hadoop.conf.Configuration fsConf, StoragePluginConfig storageConfig, T formatConfig, boolean readable, boolean writable, boolean blockSplittable, boolean compressible, List<String> extensions, String defaultName) Legacy constructor.protected
EasyFormatPlugin
(String name, EasyFormatPlugin.EasyFormatConfig config, DrillbitContext context, StoragePluginConfig storageConfig, T formatConfig) Revised constructor in which settings are gathered into a configuration object. -
Method Summary
Modifier and TypeMethodDescriptionprotected void
configureScan
(FileScanLifecycleBuilder builder, EasySubScan scan) Configure an EVF (v2) scan, which must at least include the factory to create readers.protected FileScanFramework.FileScanBuilder
frameworkBuilder
(EasySubScan scan, OptionSet options) Create the plugin-specific framework that manages the scan.org.apache.hadoop.conf.Configuration
getGroupScan
(String userName, FileSelection selection, List<SchemaPath> columns) getGroupScan
(String userName, FileSelection selection, List<SchemaPath> columns, MetadataProviderManager metadataProviderManager) getName()
protected CloseableRecordBatch
getReaderBatch
(FragmentContext context, EasySubScan scan) getRecordReader
(FragmentContext context, DrillFileSystem dfs, FileWork fileWork, List<SchemaPath> columns, String userName) Return a record reader for the specific file format, when using the originalScanBatch
scanner.getRecordWriter
(FragmentContext context, EasyWriter writer) protected ScanStats
getScanStats
(PlannerSettings settings, EasyGroupScan scan) getStatisticsRecordWriter
(FragmentContext context, EasyWriter writer) getWriter
(PhysicalOperator child, String location, List<String> partitionColumns) getWriterBatch
(FragmentContext context, RecordBatch incoming, EasyWriter writer) protected void
initScanBuilder
(FileScanFramework.FileScanBuilder builder, EasySubScan scan) Initialize the scan framework builder with standard options.boolean
Whether or not you can split the format based on blocks within file boundaries.boolean
Indicates whether or not this format could also be in a compression container (for example: csv.gz versus csv).boolean
isStatisticsRecordWriter
(FragmentContext context, EasyWriter writer) newBatchReader
(EasySubScan scan, OptionSet options) For EVF V1, to be removed.readStatistics
(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path statsTablePath) protected EasyFormatPlugin.ScanFrameworkVersion
scanVersion
(OptionSet options) Choose whether to use the enhanced scan based on the row set and scan framework, or the "traditional" ad-hoc structure based on ScanBatch.boolean
Indicates whether this FormatPlugin supports auto-partitioning for CTAS statementsboolean
Whether this format plugin supports implicit file columns.boolean
Does this plugin support pushing the limit down to the batch reader? If so, then the reader itself should have logic to stop reading the file as soon as the limit has been reached.boolean
Does this plugin support projection push down? That is, can the reader itself handle the tasks of projecting table columns, creating null columns for missing table columns, and so on?boolean
boolean
boolean
void
writeStatistics
(DrillStatsTable.TableStatistics statistics, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path statsTablePath) Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.drill.exec.store.dfs.FormatPlugin
getGroupScan, getGroupScan, getOptimizerRules
-
Field Details
-
formatConfig
-
-
Constructor Details
-
EasyFormatPlugin
protected EasyFormatPlugin(String name, DrillbitContext context, org.apache.hadoop.conf.Configuration fsConf, StoragePluginConfig storageConfig, T formatConfig, boolean readable, boolean writable, boolean blockSplittable, boolean compressible, List<String> extensions, String defaultName) Legacy constructor. -
EasyFormatPlugin
protected EasyFormatPlugin(String name, EasyFormatPlugin.EasyFormatConfig config, DrillbitContext context, StoragePluginConfig storageConfig, T formatConfig) Revised constructor in which settings are gathered into a configuration object.- Parameters:
name
- name of the pluginconfig
- configuration options for this plugin which determine developer-defined runtime behaviorcontext
- the global server-wide Drillbit contextstorageConfig
- the configuration for the storage plugin that owns this format pluginformatConfig
- the Jackson-serialized format configuration as created by the user in the Drill web console. Holds user-defined options
-
-
Method Details
-
getFsConf
public org.apache.hadoop.conf.Configuration getFsConf()- Specified by:
getFsConf
in interfaceFormatPlugin
-
getContext
- Specified by:
getContext
in interfaceFormatPlugin
-
easyConfig
-
getName
- Specified by:
getName
in interfaceFormatPlugin
-
supportsLimitPushdown
public boolean supportsLimitPushdown()Does this plugin support pushing the limit down to the batch reader? If so, then the reader itself should have logic to stop reading the file as soon as the limit has been reached. It makes the most sense to do this with file formats that have consistent schemata that are identified at the first row. CSV for example. If the user only wants 100 rows, it does not make sense to read the entire file. -
supportsPushDown
public boolean supportsPushDown()Does this plugin support projection push down? That is, can the reader itself handle the tasks of projecting table columns, creating null columns for missing table columns, and so on?- Returns:
true
if the plugin supports projection push-down,false
if Drill should do the task by adding a project operator
-
supportsFileImplicitColumns
public boolean supportsFileImplicitColumns()Whether this format plugin supports implicit file columns.- Returns:
true
if the plugin supports implicit file columns,false
otherwise
-
isBlockSplittable
public boolean isBlockSplittable()Whether or not you can split the format based on blocks within file boundaries. If not, the simple format engine will only split on file boundaries.- Returns:
true
if splitable.
-
isCompressible
public boolean isCompressible()Indicates whether or not this format could also be in a compression container (for example: csv.gz versus csv). If this format uses its own internal compression scheme, such as Parquet does, then this should return false.- Returns:
true
if it is compressible
-
getRecordReader
public RecordReader getRecordReader(FragmentContext context, DrillFileSystem dfs, FileWork fileWork, List<SchemaPath> columns, String userName) throws ExecutionSetupException Return a record reader for the specific file format, when using the originalScanBatch
scanner.- Parameters:
context
- fragment contextdfs
- Drill file systemfileWork
- metadata about the file to be scannedcolumns
- list of projected columns (or may just contain the wildcard)userName
- the name of the user running the query- Returns:
- a record reader for this format
- Throws:
ExecutionSetupException
- for many reasons
-
getReaderBatch
protected CloseableRecordBatch getReaderBatch(FragmentContext context, EasySubScan scan) throws ExecutionSetupException - Throws:
ExecutionSetupException
-
scanVersion
Choose whether to use the enhanced scan based on the row set and scan framework, or the "traditional" ad-hoc structure based on ScanBatch. Normally set as a config option. Override this method if you want to make the choice based on a system/session option.- Returns:
- true to use the enhanced scan framework, false for the traditional scan-batch framework
-
initScanBuilder
Initialize the scan framework builder with standard options. Call this from the plugin-specificframeworkBuilder(EasySubScan, OptionSet)
method. The plugin can then customize/revise options as needed.For EVF V1, to be removed.
- Parameters:
builder
- the scan framework builder you create in theframeworkBuilder(EasySubScan, OptionSet)
methodscan
- the physical scan operator definition passed to theframeworkBuilder(EasySubScan, OptionSet)
method
-
newBatchReader
public ManagedReader<? extends FileScanFramework.FileSchemaNegotiator> newBatchReader(EasySubScan scan, OptionSet options) throws ExecutionSetupException For EVF V1, to be removed.- Throws:
ExecutionSetupException
-
frameworkBuilder
protected FileScanFramework.FileScanBuilder frameworkBuilder(EasySubScan scan, OptionSet options) throws ExecutionSetupException Create the plugin-specific framework that manages the scan. The framework creates batch readers one by one for each file or block. It defines semantic rules for projection. It handles "early" or "late" schema readers. A typical framework builds on standardized frameworks for files in general or text files in particular.For EVF V1, to be removed.
- Parameters:
scan
- the physical operation definition for the scan operation. Contains one or more files to read. (The Easy format plugin works only for files.)- Returns:
- the scan framework which orchestrates the scan operation across potentially many files
- Throws:
ExecutionSetupException
- for all setup failures
-
configureScan
Configure an EVF (v2) scan, which must at least include the factory to create readers.- Parameters:
builder
- the builder with default options already set, and which allows the plugin implementation to set others
-
isStatisticsRecordWriter
-
getRecordWriter
- Throws:
IOException
-
getStatisticsRecordWriter
public StatisticsRecordWriter getStatisticsRecordWriter(FragmentContext context, EasyWriter writer) throws IOException - Throws:
IOException
-
getWriterBatch
public CloseableRecordBatch getWriterBatch(FragmentContext context, RecordBatch incoming, EasyWriter writer) throws ExecutionSetupException - Throws:
ExecutionSetupException
-
getScanStats
-
getWriter
public AbstractWriter getWriter(PhysicalOperator child, String location, List<String> partitionColumns) - Specified by:
getWriter
in interfaceFormatPlugin
-
getGroupScan
public AbstractGroupScan getGroupScan(String userName, FileSelection selection, List<SchemaPath> columns) throws IOException - Specified by:
getGroupScan
in interfaceFormatPlugin
- Throws:
IOException
-
getGroupScan
public AbstractGroupScan getGroupScan(String userName, FileSelection selection, List<SchemaPath> columns, MetadataProviderManager metadataProviderManager) throws IOException - Specified by:
getGroupScan
in interfaceFormatPlugin
- Throws:
IOException
-
getConfig
- Specified by:
getConfig
in interfaceFormatPlugin
-
getStorageConfig
- Specified by:
getStorageConfig
in interfaceFormatPlugin
-
supportsRead
public boolean supportsRead()- Specified by:
supportsRead
in interfaceFormatPlugin
-
supportsWrite
public boolean supportsWrite()- Specified by:
supportsWrite
in interfaceFormatPlugin
-
supportsAutoPartitioning
public boolean supportsAutoPartitioning()Description copied from interface:FormatPlugin
Indicates whether this FormatPlugin supports auto-partitioning for CTAS statements- Specified by:
supportsAutoPartitioning
in interfaceFormatPlugin
- Returns:
- true if auto-partitioning is supported
-
getMatcher
- Specified by:
getMatcher
in interfaceFormatPlugin
-
getOptimizerRules
- Specified by:
getOptimizerRules
in interfaceFormatPlugin
-
getReaderOperatorType
-
getWriterOperatorType
-
supportsStatistics
public boolean supportsStatistics()- Specified by:
supportsStatistics
in interfaceFormatPlugin
-
readStatistics
public DrillStatsTable.TableStatistics readStatistics(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path statsTablePath) throws IOException - Specified by:
readStatistics
in interfaceFormatPlugin
- Throws:
IOException
-
writeStatistics
public void writeStatistics(DrillStatsTable.TableStatistics statistics, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path statsTablePath) throws IOException - Specified by:
writeStatistics
in interfaceFormatPlugin
- Throws:
IOException
-