Class OutputBatchBuilder
Handles maps, which can overlap at the map level (two inputs can hold a map
column named `m`
, say), but the map members must be disjoint. Applies
the same rule recursively to nested maps.
Maps must be built with members in the same order as the corresponding schema. Though maps are usually thought of as unordered name/value pairs, they are actually tuples, with both a name and a defined ordering.
This code uses a name lookup in maps because the semantics of maps do not
guarantee a uniform numbering of members from 0
to n-1
, where
{code n} is the number of map members. Map members are ordered, but the
ordinal used by the map vector is not necessarily sequential.
Once the output container is built, the same value vectors reside in the input and output containers. This works because Drill requires vector persistence: the same vectors must be presented downstream in every batch until a schema change occurs.
Projection
To visualize projection, assume we have numbered table columns, lettered implicit, null or partition columns:
[ 1 | 2 | 3 | 4 ] Table columns in table order
[ A | B | C ] Static columns
Now, we wish to project them into select order.
Let's say that the SELECT clause looked like this, with "t"
indicating table columns:
SELECT t2, t3, C, B, t1, A, t2 ...
Then the projection looks like this:
[ 2 | 3 | C | B | 1 | A | 2 ]
Often, not all table columns are projected. In this case, the
result set loader presents the full table schema to the reader,
but actually writes only the projected columns. Suppose we
have:
SELECT t3, C, B, t1, A ...
Then the abbreviated table schema looks like this:
[ 1 | 3 ]
Note that table columns retain their table ordering.
The projection looks like this:
[ 2 | C | B | 1 | A ]
The projector is created once per schema, then can be reused for any number of batches.
Merging is done in one of two ways, depending on the input source:
- For the table loader, the merger discards any data in the output, then exchanges the buffers from the input columns to the output, leaving projected columns empty. Note that unprojected columns must be cleared by the caller.
- For implicit and null columns, the output vector is identical to the input vector.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
Describes an input batch with a schema and a vector container.static class
Source map as a map schema and map vector. -
Constructor Summary
ConstructorDescriptionOutputBatchBuilder
(TupleMetadata outputSchema, List<OutputBatchBuilder.BatchSource> sources, BufferAllocator allocator) -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Release per-reader resources.protected void
defineSourceBatchMapping
(TupleMetadata schema, int source) Define the mapping for one of the sources.getVector
(org.apache.drill.exec.physical.impl.scan.v3.lifecycle.OutputBatchBuilder.VectorSource source) void
load
(int rowCount)
-
Constructor Details
-
OutputBatchBuilder
public OutputBatchBuilder(TupleMetadata outputSchema, List<OutputBatchBuilder.BatchSource> sources, BufferAllocator allocator)
-
-
Method Details
-
defineSourceBatchMapping
Define the mapping for one of the sources. Mappings are stored in output order as a set of (source, offset) pairs. -
getVector
public ValueVector getVector(org.apache.drill.exec.physical.impl.scan.v3.lifecycle.OutputBatchBuilder.VectorSource source) -
load
public void load(int rowCount) -
outputContainer
-
close
public void close()Release per-reader resources. Does not release the actual value vectors as those reside in a cache.
-