Class BatchSizingMemoryUtil
java.lang.Object
org.apache.drill.exec.store.parquet.columnreaders.batchsizing.BatchSizingMemoryUtil
Helper class to assist the Flat Parquet reader build batches which adhere to memory sizing constraints
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic final class
A container class to hold a column batch memory usage information.static final class
Container class which holds memory usage information about a variable lengthValueVector
; all values are in bytes. -
Field Summary
Modifier and TypeFieldDescriptionstatic final int
BYTE in-memory widthstatic final int
Default variable length column average precision; computed in such a way that 64k values will fit within one MB to minimize internal fragmentationstatic final int
INT in-memory width -
Method Summary
Modifier and TypeMethodDescriptionstatic boolean
canAddNewData
(BatchSizingMemoryUtil.ColumnMemoryUsageInfo columnMemoryUsage, long newBitsMemory, long newOffsetsMemory, long newDataMemory) This method will also load detailed information about this column's current memory usage (with regard to the value vectors).static long
computeFixedLengthVectorMemory
(ParquetColumnMetadata column, int valueCount) static long
computeVariableLengthVectorMemory
(ParquetColumnMetadata column, long averagePrecision, int valueCount) static int
This method will return a default value for variable columns; it aims at minimizing internal fragmentation.static int
static void
getMemoryUsage
(ValueVector sourceVector, int currValueCount, BatchSizingMemoryUtil.VectorMemoryUsageInfo vectorMemoryUsage) Load memory usage information for a variable length value vector
-
Field Details
-
BYTE_VALUE_WIDTH
public static final int BYTE_VALUE_WIDTHBYTE in-memory width- See Also:
-
INT_VALUE_WIDTH
public static final int INT_VALUE_WIDTHINT in-memory width- See Also:
-
DEFAULT_VL_COLUMN_AVG_PRECISION
public static final int DEFAULT_VL_COLUMN_AVG_PRECISIONDefault variable length column average precision; computed in such a way that 64k values will fit within one MB to minimize internal fragmentation- See Also:
-
-
Method Details
-
canAddNewData
public static boolean canAddNewData(BatchSizingMemoryUtil.ColumnMemoryUsageInfo columnMemoryUsage, long newBitsMemory, long newOffsetsMemory, long newDataMemory) This method will also load detailed information about this column's current memory usage (with regard to the value vectors).- Parameters:
columnMemoryUsage
- container which contains column's memory usage information (usage information will be automatically updated by this method)newBitsMemory
- New nullable data which might be inserted when processing a new input chunknewOffsetsMemory
- New offsets data which might be inserted when processing a new input chunknewDataMemory
- New data which might be inserted when processing a new input chunk- Returns:
- true if adding the new data will not lead this column's Value Vector go beyond the allowed limit; false otherwise
-
getMemoryUsage
public static void getMemoryUsage(ValueVector sourceVector, int currValueCount, BatchSizingMemoryUtil.VectorMemoryUsageInfo vectorMemoryUsage) Load memory usage information for a variable length value vector- Parameters:
sourceVector
- source value vectorcurrValueCount
- current value countvectorMemoryUsage
- result object which contains source vector memory usage information
-
getFixedColumnTypePrecision
- Parameters:
column
- fixed column's metadata- Returns:
- column byte precision
-
getAvgVariableLengthColumnTypePrecision
This method will return a default value for variable columns; it aims at minimizing internal fragmentation.Note that the
TypeHelper
uses a large default value which might not be always appropriate.- Parameters:
column
- fixed column's metadata- Returns:
- column byte precision
-
computeFixedLengthVectorMemory
- Parameters:
column
- column's metadatavalueCount
- number of column values- Returns:
- memory size required to store "valueCount" within a value vector
-
computeVariableLengthVectorMemory
public static long computeVariableLengthVectorMemory(ParquetColumnMetadata column, long averagePrecision, int valueCount) - Parameters:
column
- length column's metadataaveragePrecision
- VL column average precisionvalueCount
- number of column values- Returns:
- memory size required to store "valueCount" within a value vector
-