Class ColumnExplorer

java.lang.Object
org.apache.drill.exec.store.ColumnExplorer

public class ColumnExplorer extends Object
  • Constructor Details

    • ColumnExplorer

      public ColumnExplorer(OptionManager optionManager, List<SchemaPath> columns)
      Helper class that encapsulates logic for sorting out columns between actual table columns, partition columns and implicit file columns. Also populates map with implicit columns names as keys and their values
    • ColumnExplorer

      public ColumnExplorer(OptionManager optionManager)
      Constructor for using the column explorer to probe existing columns in the ProjectRecordBatch.
  • Method Details

    • initImplicitFileColumns

      public static Map<String,ColumnExplorer.ImplicitFileColumns> initImplicitFileColumns(OptionManager optionManager)
      Creates case insensitive map with implicit file columns as keys and appropriate ImplicitFileColumns enum as values
    • initImplicitInternalFileColumns

      public static Map<String,ColumnExplorer.ImplicitInternalFileColumns> initImplicitInternalFileColumns(OptionManager optionManager)
      Creates case insensitive map with implicit internal file columns as keys and appropriate ImplicitFileColumns enum as values
    • getImplicitColumnsNames

      public static List<String> getImplicitColumnsNames(SchemaConfig schemaConfig)
      Returns list with implicit column names taken from specified SchemaConfig.
      Parameters:
      schemaConfig - the source of session options values.
      Returns:
      list with implicit column names.
    • isPartitionColumn

      public static boolean isPartitionColumn(OptionManager optionManager, SchemaPath column)
      Checks if given column is partition or not.
      Parameters:
      optionManager - options
      column - column
      Returns:
      true if given column is partition, false otherwise
    • isPartitionColumn

      public static boolean isPartitionColumn(String partitionDesignator, String path)
      Checks if given column is partition or not.
      Parameters:
      partitionDesignator - partition designator
      path - column path
      Returns:
      true if given column is partition, false otherwise
    • isImplicitOrInternalFileColumn

      public boolean isImplicitOrInternalFileColumn(String name)
      Checks whether given column is implicit or internal.
      Parameters:
      name - name of the column to check
      Returns:
      true if given column is implicit or internal, false otherwise
    • getPartitionColumnNames

      public static List<String> getPartitionColumnNames(FileSelection selection, ColumnNamesOptions columnNamesOptions)
      Returns list with partition column names. For the case when table has several levels of nesting, max level is chosen.
      Parameters:
      selection - the source of file paths
      columnNamesOptions - the source of session option value for partition column label
      Returns:
      list with partition column names.
    • getPartitionDepth

      public static int getPartitionDepth(FileSelection selection)
    • populateImplicitColumns

      public Map<String,String> populateImplicitColumns(org.apache.hadoop.fs.Path filePath, List<String> partitionValues, boolean includeFileImplicitColumns)
      Creates map with implicit columns where key is column name, value is columns actual value. This map contains partition and implicit file columns (if requested). Partition columns names are formed based in partition designator and value index.
      Parameters:
      filePath - file path, used to populate file implicit columns
      partitionValues - list of partition values
      includeFileImplicitColumns - if file implicit columns should be included into the result
      Returns:
      implicit columns map
    • populateColumns

      public Map<String,String> populateColumns(org.apache.hadoop.fs.Path filePath, List<String> partitionValues, boolean includeFileImplicitColumns, org.apache.hadoop.fs.FileSystem fs)
      Creates map with implicit and internal columns where key is column name, value is columns actual value. This map contains partition, implicit and internal file columns (if requested). Partition columns names are formed based in partition designator and value index.
      Parameters:
      filePath - file path, used to populate file implicit columns
      partitionValues - list of partition values
      includeFileImplicitColumns - if file implicit columns should be included into the result
      fs - file system
      Returns:
      implicit columns map
    • populateColumns

      public Map<String,String> populateColumns(org.apache.hadoop.fs.Path filePath, List<String> partitionValues, boolean includeFileImplicitColumns, org.apache.hadoop.fs.FileSystem fs, int index, long start, long length)
      Creates map with implicit and internal columns where key is column name, value is columns actual value. This map contains partition, implicit and internal file columns (if requested). Partition columns names are formed based in partition designator and value index.
      Parameters:
      filePath - file path, used to populate file implicit columns
      partitionValues - list of partition values
      includeFileImplicitColumns - if file implicit columns should be included into the result
      fs - file system
      index - index of row group to populate
      start - start of row group to populate
      length - length of row group to populate
      Returns:
      implicit columns map
    • getImplicitColumnValue

      public static String getImplicitColumnValue(ColumnExplorer.ImplicitFileColumn column, org.apache.hadoop.fs.Path filePath, org.apache.hadoop.fs.FileSystem fs, Integer index, Long start, Long length)
      Returns implicit column value for specified implicit file column.
      Parameters:
      column - implicit file column
      filePath - file path, used to populate file implicit columns
      fs - file system
      index - row group index
      start - row group start
      length - row group length
      Returns:
      implicit column value for specified implicit file column
    • getImplicitColumnValue

      public static String getImplicitColumnValue(ColumnExplorer.ImplicitFileColumn column, org.apache.hadoop.fs.Path filePath, org.apache.hadoop.fs.FileSystem fs)
      Returns implicit column value for specified implicit file column.
      Parameters:
      column - implicit file column
      filePath - file path
      fs - file system
      Returns:
      implicit column value for specified implicit file column
    • getImplicitFileColumns

      public static List<ColumnExplorer.ImplicitFileColumn> getImplicitFileColumns()
      Returns:
      list of implicit file columns
    • listPartitionValues

      public static List<String> listPartitionValues(org.apache.hadoop.fs.Path file, org.apache.hadoop.fs.Path root, boolean hasDirsOnly)
      Compares root and file path to determine directories that are present in the file path but absent in root. Example: root - a/b/c, file - a/b/c/d/e/0_0_0.parquet, result - d/e. Stores different directory names in the list in successive order.
      Parameters:
      file - file path
      root - root directory
      hasDirsOnly - whether it is file or directory
      Returns:
      list of directory names
    • parsePartitions

      public static String[] parsePartitions(org.apache.hadoop.fs.Path file, org.apache.hadoop.fs.Path root, boolean hasDirsOnly)
      Low-level parse of partitions, returned as a string array. Returns a null array for invalid values.
      Parameters:
      file - file path
      root - root directory
      hasDirsOnly - whether it is file or directory
      Returns:
      array of directory names, or null if the arguments are invalid
    • isStarQuery

      public boolean isStarQuery()
    • getTableColumns

      public List<SchemaPath> getTableColumns()
    • containsPartitionColumns

      public boolean containsPartitionColumns()
      Checks if current column selection contains partition columns.
      Returns:
      true if partition columns are present, false otherwise
    • containsImplicitColumns

      public boolean containsImplicitColumns()
      Checks if current column selection contains implicit columns.
      Returns:
      true if implicit columns are present, false otherwise