Package org.apache.drill.exec.store
Class ColumnExplorer
java.lang.Object
org.apache.drill.exec.store.ColumnExplorer
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic interface
static enum
Columns that give information from where file data comes from.static enum
Columns that give internal information about file or its parts. -
Constructor Summary
ConstructorDescriptionColumnExplorer
(OptionManager optionManager) Constructor for using the column explorer to probe existing columns in theProjectRecordBatch
.ColumnExplorer
(OptionManager optionManager, List<SchemaPath> columns) Helper class that encapsulates logic for sorting out columns between actual table columns, partition columns and implicit file columns. -
Method Summary
Modifier and TypeMethodDescriptionboolean
Checks if current column selection contains implicit columns.boolean
Checks if current column selection contains partition columns.getImplicitColumnsNames
(SchemaConfig schemaConfig) Returns list with implicit column names taken from specifiedSchemaConfig
.static String
getImplicitColumnValue
(ColumnExplorer.ImplicitFileColumn column, org.apache.hadoop.fs.Path filePath, org.apache.hadoop.fs.FileSystem fs) Returns implicit column value for specified implicit file column.static String
getImplicitColumnValue
(ColumnExplorer.ImplicitFileColumn column, org.apache.hadoop.fs.Path filePath, org.apache.hadoop.fs.FileSystem fs, Integer index, Long start, Long length) Returns implicit column value for specified implicit file column.Returns list of implicit file columns which includes all elements fromColumnExplorer.ImplicitFileColumns
,ColumnExplorer.ImplicitInternalFileColumns.LAST_MODIFIED_TIME
andColumnExplorer.ImplicitInternalFileColumns.USE_METADATA
columns.getPartitionColumnNames
(FileSelection selection, ColumnNamesOptions columnNamesOptions) Returns list with partition column names.static int
getPartitionDepth
(FileSelection selection) initImplicitFileColumns
(OptionManager optionManager) Creates case insensitive map with implicit file columns as keys and appropriate ImplicitFileColumns enum as valuesinitImplicitInternalFileColumns
(OptionManager optionManager) Creates case insensitive map with implicit internal file columns as keys and appropriate ImplicitFileColumns enum as valuesboolean
Checks whether given column is implicit or internal.static boolean
isPartitionColumn
(String partitionDesignator, String path) Checks if given column is partition or not.static boolean
isPartitionColumn
(OptionManager optionManager, SchemaPath column) Checks if given column is partition or not.boolean
listPartitionValues
(org.apache.hadoop.fs.Path file, org.apache.hadoop.fs.Path root, boolean hasDirsOnly) Compares root and file path to determine directories that are present in the file path but absent in root.static String[]
parsePartitions
(org.apache.hadoop.fs.Path file, org.apache.hadoop.fs.Path root, boolean hasDirsOnly) Low-level parse of partitions, returned as a string array.populateColumns
(org.apache.hadoop.fs.Path filePath, List<String> partitionValues, boolean includeFileImplicitColumns, org.apache.hadoop.fs.FileSystem fs) Creates map with implicit and internal columns where key is column name, value is columns actual value.populateColumns
(org.apache.hadoop.fs.Path filePath, List<String> partitionValues, boolean includeFileImplicitColumns, org.apache.hadoop.fs.FileSystem fs, int index, long start, long length) Creates map with implicit and internal columns where key is column name, value is columns actual value.populateImplicitColumns
(org.apache.hadoop.fs.Path filePath, List<String> partitionValues, boolean includeFileImplicitColumns) Creates map with implicit columns where key is column name, value is columns actual value.
-
Constructor Details
-
ColumnExplorer
Helper class that encapsulates logic for sorting out columns between actual table columns, partition columns and implicit file columns. Also populates map with implicit columns names as keys and their values -
ColumnExplorer
Constructor for using the column explorer to probe existing columns in theProjectRecordBatch
.
-
-
Method Details
-
initImplicitFileColumns
public static Map<String,ColumnExplorer.ImplicitFileColumns> initImplicitFileColumns(OptionManager optionManager) Creates case insensitive map with implicit file columns as keys and appropriate ImplicitFileColumns enum as values -
initImplicitInternalFileColumns
public static Map<String,ColumnExplorer.ImplicitInternalFileColumns> initImplicitInternalFileColumns(OptionManager optionManager) Creates case insensitive map with implicit internal file columns as keys and appropriate ImplicitFileColumns enum as values -
getImplicitColumnsNames
Returns list with implicit column names taken from specifiedSchemaConfig
.- Parameters:
schemaConfig
- the source of session options values.- Returns:
- list with implicit column names.
-
isPartitionColumn
Checks if given column is partition or not.- Parameters:
optionManager
- optionscolumn
- column- Returns:
- true if given column is partition, false otherwise
-
isPartitionColumn
Checks if given column is partition or not.- Parameters:
partitionDesignator
- partition designatorpath
- column path- Returns:
- true if given column is partition, false otherwise
-
isImplicitOrInternalFileColumn
Checks whether given column is implicit or internal.- Parameters:
name
- name of the column to check- Returns:
true
if given column is implicit or internal,false
otherwise
-
getPartitionColumnNames
public static List<String> getPartitionColumnNames(FileSelection selection, ColumnNamesOptions columnNamesOptions) Returns list with partition column names. For the case when table has several levels of nesting, max level is chosen.- Parameters:
selection
- the source of file pathscolumnNamesOptions
- the source of session option value for partition column label- Returns:
- list with partition column names.
-
getPartitionDepth
-
populateImplicitColumns
public Map<String,String> populateImplicitColumns(org.apache.hadoop.fs.Path filePath, List<String> partitionValues, boolean includeFileImplicitColumns) Creates map with implicit columns where key is column name, value is columns actual value. This map contains partition and implicit file columns (if requested). Partition columns names are formed based in partition designator and value index.- Parameters:
filePath
- file path, used to populate file implicit columnspartitionValues
- list of partition valuesincludeFileImplicitColumns
- if file implicit columns should be included into the result- Returns:
- implicit columns map
-
populateColumns
public Map<String,String> populateColumns(org.apache.hadoop.fs.Path filePath, List<String> partitionValues, boolean includeFileImplicitColumns, org.apache.hadoop.fs.FileSystem fs) Creates map with implicit and internal columns where key is column name, value is columns actual value. This map contains partition, implicit and internal file columns (if requested). Partition columns names are formed based in partition designator and value index.- Parameters:
filePath
- file path, used to populate file implicit columnspartitionValues
- list of partition valuesincludeFileImplicitColumns
- if file implicit columns should be included into the resultfs
- file system- Returns:
- implicit columns map
-
populateColumns
public Map<String,String> populateColumns(org.apache.hadoop.fs.Path filePath, List<String> partitionValues, boolean includeFileImplicitColumns, org.apache.hadoop.fs.FileSystem fs, int index, long start, long length) Creates map with implicit and internal columns where key is column name, value is columns actual value. This map contains partition, implicit and internal file columns (if requested). Partition columns names are formed based in partition designator and value index.- Parameters:
filePath
- file path, used to populate file implicit columnspartitionValues
- list of partition valuesincludeFileImplicitColumns
- if file implicit columns should be included into the resultfs
- file systemindex
- index of row group to populatestart
- start of row group to populatelength
- length of row group to populate- Returns:
- implicit columns map
-
getImplicitColumnValue
public static String getImplicitColumnValue(ColumnExplorer.ImplicitFileColumn column, org.apache.hadoop.fs.Path filePath, org.apache.hadoop.fs.FileSystem fs, Integer index, Long start, Long length) Returns implicit column value for specified implicit file column.- Parameters:
column
- implicit file columnfilePath
- file path, used to populate file implicit columnsfs
- file systemindex
- row group indexstart
- row group startlength
- row group length- Returns:
- implicit column value for specified implicit file column
-
getImplicitColumnValue
public static String getImplicitColumnValue(ColumnExplorer.ImplicitFileColumn column, org.apache.hadoop.fs.Path filePath, org.apache.hadoop.fs.FileSystem fs) Returns implicit column value for specified implicit file column.- Parameters:
column
- implicit file columnfilePath
- file pathfs
- file system- Returns:
- implicit column value for specified implicit file column
-
getImplicitFileColumns
Returns list of implicit file columns which includes all elements fromColumnExplorer.ImplicitFileColumns
,ColumnExplorer.ImplicitInternalFileColumns.LAST_MODIFIED_TIME
andColumnExplorer.ImplicitInternalFileColumns.USE_METADATA
columns.- Returns:
- list of implicit file columns
-
listPartitionValues
public static List<String> listPartitionValues(org.apache.hadoop.fs.Path file, org.apache.hadoop.fs.Path root, boolean hasDirsOnly) Compares root and file path to determine directories that are present in the file path but absent in root. Example: root - a/b/c, file - a/b/c/d/e/0_0_0.parquet, result - d/e. Stores different directory names in the list in successive order.- Parameters:
file
- file pathroot
- root directoryhasDirsOnly
- whether it is file or directory- Returns:
- list of directory names
-
parsePartitions
public static String[] parsePartitions(org.apache.hadoop.fs.Path file, org.apache.hadoop.fs.Path root, boolean hasDirsOnly) Low-level parse of partitions, returned as a string array. Returns a null array for invalid values.- Parameters:
file
- file pathroot
- root directoryhasDirsOnly
- whether it is file or directory- Returns:
- array of directory names, or null if the arguments are invalid
-
isStarQuery
public boolean isStarQuery() -
getTableColumns
-
containsPartitionColumns
public boolean containsPartitionColumns()Checks if current column selection contains partition columns.- Returns:
- true if partition columns are present, false otherwise
-
containsImplicitColumns
public boolean containsImplicitColumns()Checks if current column selection contains implicit columns.- Returns:
- true if implicit columns are present, false otherwise
-