Class BatchValidator
Drill is not clear on how to handle a batch of zero records. Offset vectors normally have one more entry than the record count. If a batch has 1 record, the offset vector has 2 entries. The entry at 0 is always 0, the entry at 1 marks the end of the 0th record.
But, this gets a bit murky. If a batch has one record, and contains a repeated map, and the map has no entries, then the nested offset vector usually has 0 entries, not 1.
Generalizing, sometimes when a batch has zero records, the "top-level" offset vectors have 1 items, sometimes zero items.
The simplest solution would be to simply enforce here that all offset vectors must have n+1 entries, where n is the row count (top-level vectors) or item count (nested vectors.)
But, after fighting with the code, this seems an unobtainable goal. For one thing, deserialization seems to rely on nested offset vectors having zero entries when the value count is zero.
Instead, this code assumes that any offset vector, top-level or nested, will have zero entries if the value count is zero. That is an offset vector has either zero entries or n+1 entries.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
static interface
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic void
static boolean
validate
(RecordBatch batch) static boolean
validate
(VectorAccessible batch) void
validateBatch
(VectorAccessible batch, int rowCount)
-
Field Details
-
LOG_TO_STDOUT
public static final boolean LOG_TO_STDOUT- See Also:
-
MAX_ERRORS
public static final int MAX_ERRORS- See Also:
-
-
Constructor Details
-
BatchValidator
-
-
Method Details
-
validate
-
validate
-
validate
-
validateBatch
-