Class ListVector
- All Implemented Interfaces:
- Closeable,- AutoCloseable,- Iterable<ValueVector>,- ContainerVectorLike,- RepeatedValueVector,- ValueVector
Why this odd behavior? The LIST type apparently attempts to model certain JSON types. In JSON, we can have lists like this:
 {a: [null, null]}
 {a: [10, "foo"]}
 {a: [{name: "fred", balance: 10}, null]
 {a: null}
 - A list can be null. (In Drill, an array can be empty, but not null.)
- A list element can be null. (In Drill, a repeated type is an array of non-nullable elements, so list elements can't be null.
- A list can contain heterogeneous types. (In Drill, repeated types are arrays of a single type.
- Allows the list value for a row to be null. (To handle the
 {list: null}case.
- Allows list elements to be null. (To handle the
 {list: [10, null 30]}invalid input: '<'/code case.)
- Allows the list to be a single type. (To handle the list of nullable ints above.
- Allows the list to be of multiple types, by creating a list
 of UNIONs. (To handle the
 {list: ["fred", 10]}case.
Background
The above is the theory. The problem is, the goals are very difficult to achieve, and the code here does not quite do so. The code here is difficult to maintain and understand. The first thing to understand is that union vectors are broken in most operators, and so major bugs remain in union and list vectors that have not had to be fixed. Recent revisions attempt to fix or works around some of the bugs, but many remain.Unions have a null bit for the union itself. That is, a union can be an int, say, or. a Varchar, or null. Oddly, the Int and Varchar can also be null (we use nullable vectors so we can mark the unused values as null.) So, we have a two-level null bit. The most logical way to interpret it is that a union value can be:
- Untyped null (if the type is not set and the null bit (really, the isSet bit) is unset.) Typed null if the type is set and EITHER the union's isSet bit is unset OR the union's isSet bit is set, but the data vector's isSet bit is not set. It is not clear in the code which convention is assumed, or if different code does it differently.
- Now, add all that to a list. A list can be a list of something (ints, say, or maps.) When the list is a list of maps, the entire value for a row can be null. But individual maps can't be null. In a list, however, individual ints can be null (because we use a nullable int vector.)
 Another issue is that the metadata for a list should reflect the structure
 of the list. The MaterializedField contains a child field, which
 points to the element of the list. If that child is a UNION, then the UNION's
 MaterializedField contains subtypes for each type in the
 union. Now, note that the LIST's metadata contains the child, so we need
 to update the LIST's MaterializedField each time we add a
 type to the UNION. And, since the LIST is part of a row or map, then we
 have to update the metadata in those objects to propagate the change.
 
 The problem is that the original design assumed that
 MaterializedField is immutable. The above shows that it
 clearly is not. So, we have a tension between the original immutable
 design and the necessity of mutating the MaterializedField
 to keep everything in sync.
 
 Of course, there is another solution: don't include subtypes and children
 in the MaterializedField, then we don't have the propagation
 problem.
 
The code for this class kind of punts on the issue: the metadata is not maintained and can get out of sync. THis makes the metadata useless: one must recover actual structure by traversing vectors. There was an attempt to fix this, but doing so changes the metadata structure, which broke clients. So, we have to live with broken metadata and work around the issues. The metadata sync issue exists in many places, but is most obvious in the LIST vector because of the sheer complexity in this class.
This is why the code notes say that this is a mess.
It is hard to simply fix the bugs because this is a design problem. If the list and union vectors don't need to work (they barely work today), then any design is fine. See the list of JIRA tickets below for more information.
Fundamental issue: should Drill support unions and lists? Is the current approach compatible with SQL? Is there a better approach? If such changes are made, they are breaking changes, and so must be done as part of a major version, such as the much-discussed "Drill 2.0". Or, perhaps as part of a conversion to use Apache Arrow, which also would be a major breaking change.
- 
Nested Class SummaryNested ClassesNested classes/interfaces inherited from class org.apache.drill.exec.vector.complex.BaseRepeatedValueVectorBaseRepeatedValueVector.BaseRepeatedAccessor, BaseRepeatedValueVector.BaseRepeatedMutator, BaseRepeatedValueVector.BaseRepeatedValueVectorTransferPair<T extends BaseRepeatedValueVector>Nested classes/interfaces inherited from class org.apache.drill.exec.vector.BaseValueVectorBaseValueVector.BaseAccessor, BaseValueVector.BaseMutatorNested classes/interfaces inherited from interface org.apache.drill.exec.vector.complex.RepeatedValueVectorRepeatedValueVector.RepeatedAccessor, RepeatedValueVector.RepeatedMutator
- 
Field SummaryFieldsFields inherited from class org.apache.drill.exec.vector.complex.BaseRepeatedValueVectorDATA_VECTOR_NAME, DEFAULT_DATA_VECTOR, offsets, OFFSETS_FIELD, OFFSETS_VECTOR_NAME, vectorFields inherited from class org.apache.drill.exec.vector.BaseValueVectorallocator, field, INITIAL_VALUE_ALLOCATION, MAX_ALLOCATION_SIZEFields inherited from interface org.apache.drill.exec.vector.complex.RepeatedValueVectorDEFAULT_REPEAT_PER_RECORDFields inherited from interface org.apache.drill.exec.vector.ValueVectorBITS_VECTOR_NAME, MAX_BUFFER_SIZE, MAX_ROW_COUNT, MIN_ROW_COUNT, VALUES_VECTOR_NAME
- 
Constructor SummaryConstructorsConstructorDescriptionListVector(MaterializedField field, BufferAllocator allocator, CallBack callBack) 
- 
Method SummaryModifier and TypeMethodDescription<T extends ValueVector>
 AddOrGetResult<T> addOrGetVector(VectorDescriptor descriptor) Creates and adds a child vector if none with the same name exists, else returns the vector instance.voidAllocate new buffers.booleanAllocates new buffers.voidclear()Release the underlying DrillBuf and reset the ValueVector to empty.voidcollectLedgers(Set<AllocationManager.BufferLedger> ledgers) Add the ledgers underlying the buffers underlying the components of the vector to the set provided.convertToUnion(int allocValueCount, int valueCount) Promote to a union, preserving the existing data vector as a member of the new union.voidcopyEntry(int toIndex, ValueVector from, int fromIndex) voidcopyFrom(int inIndex, int outIndex, ListVector from) voidcopyFromSafe(int inIndex, int outIndex, ListVector from) Revised form of promote to union that correctly fixes up the list field metadata to match the new union type.Returns anaccessorthat is used to read from this vector instance.DrillBuf[]getBuffers(boolean clear) Return the underlying buffers associated with this vector.intReturns the number of bytes that is used by this vector instance.protected UserBitShared.SerializedField.BuilderReturns anmutatorthat is used to write to this vector instance.intgetPayloadByteCount(int valueCount) Return the number of value bytes consumed by actual data.Returns afield readerthat supports reading values from this vector.getTransferPair(String ref, BufferAllocator allocator) booleanvoidload(UserBitShared.SerializedField metadata, DrillBuf buffer) Load the data provided in the buffer.makeTransferPair(ValueVector target) Returns a newtransfer pairthat is used to transfer underlying buffers into the target vector.Promote the list to a union.voidsetChildVector(ValueVector childVector) voidtransferTo(ListVector target) Methods inherited from class org.apache.drill.exec.vector.complex.BaseRepeatedValueVectorexchange, getAllocatedSize, getBufferSizeFor, getOffsetVector, getValueCapacity, iterator, replaceDataVector, setInitialCapacity, sizeMethods inherited from class org.apache.drill.exec.vector.BaseValueVectorcheckBufRefs, close, fillBitsVector, getAllocator, getField, getField, getMetadata, getTransferPair, toNullable, toStringMethods inherited from class java.lang.Objectclone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface java.lang.IterableforEach, spliteratorMethods inherited from interface org.apache.drill.exec.vector.ValueVectorclose, getAllocator, getField, getMetadata, getTransferPair, toNullable
- 
Field Details- 
UNION_VECTOR_NAME- See Also:
 
 
- 
- 
Constructor Details- 
ListVector
 
- 
- 
Method Details- 
getWriter
- 
allocateNewDescription copied from interface:ValueVectorAllocate new buffers. ValueVector implements logic to determine how much to allocate.- Throws:
- OutOfMemoryException- Thrown if no memory can be allocated.
 
- 
transferTo
- 
copyFromSafe
- 
copyFrom
- 
copyEntry
- 
getDataVector- Specified by:
- getDataVectorin interface- RepeatedValueVector
- Overrides:
- getDataVectorin class- BaseRepeatedValueVector
- Returns:
- the underlying data vector or null if none exists.
 
- 
getBitsVector
- 
getTransferPair
- 
makeTransferPairDescription copied from interface:ValueVectorReturns a newtransfer pairthat is used to transfer underlying buffers into the target vector.
- 
getAccessorDescription copied from interface:ValueVectorReturns anaccessorthat is used to read from this vector instance.
- 
getMutatorDescription copied from interface:ValueVectorReturns anmutatorthat is used to write to this vector instance.
- 
getReaderDescription copied from interface:ValueVectorReturns afield readerthat supports reading values from this vector.
- 
allocateNewSafepublic boolean allocateNewSafe()Description copied from interface:ValueVectorAllocates new buffers. ValueVector implements logic to determine how much to allocate.- Specified by:
- allocateNewSafein interface- ValueVector
- Overrides:
- allocateNewSafein class- BaseRepeatedValueVector
- Returns:
- Returns true if allocation was successful.
 
- 
getMetadataBuilder- Overrides:
- getMetadataBuilderin class- BaseRepeatedValueVector
 
- 
addOrGetVectorDescription copied from interface:ContainerVectorLikeCreates and adds a child vector if none with the same name exists, else returns the vector instance.- Specified by:
- addOrGetVectorin interface- ContainerVectorLike
- Overrides:
- addOrGetVectorin class- BaseRepeatedValueVector
- Parameters:
- descriptor- vector descriptor
- Returns:
- result of operation wrapping vector corresponding to the given descriptor and whether it's newly created
 
- 
getBufferSizepublic int getBufferSize()Description copied from interface:ValueVectorReturns the number of bytes that is used by this vector instance. This is a bit of a misnomer. Returns the number of bytes used by data in this instance.- Specified by:
- getBufferSizein interface- ValueVector
- Overrides:
- getBufferSizein class- BaseRepeatedValueVector
 
- 
clearpublic void clear()Description copied from interface:ValueVectorRelease the underlying DrillBuf and reset the ValueVector to empty.- Specified by:
- clearin interface- ValueVector
- Overrides:
- clearin class- BaseRepeatedValueVector
 
- 
getBuffersDescription copied from interface:ValueVectorReturn the underlying buffers associated with this vector. Note that this doesn't impact the reference counts for this buffer so it only should be used for in-context access. Also note that this buffer changes regularly thus external classes shouldn't hold a reference to it (unless they change it).- Specified by:
- getBuffersin interface- ValueVector
- Overrides:
- getBuffersin class- BaseRepeatedValueVector
- Parameters:
- clear- Whether to clear vector before returning; the buffers will still be refcounted; but the returned array will be the only reference to them
- Returns:
- The underlying buffersthat is used by this vector instance.
 
- 
isEmptyTypepublic boolean isEmptyType()
- 
setChildVector- Overrides:
- setChildVectorin class- BaseRepeatedValueVector
 
- 
promoteToUnionPromote the list to a union. Called from old-style writers. This implementation relies on the caller to set the types vector for any existing values. This method simply clears the existing vector.- Returns:
- the new union vector
 
- 
fullPromoteToUnionRevised form of promote to union that correctly fixes up the list field metadata to match the new union type. Since this form handles both the vector and metadata revisions, it is a "full" promotion.- Returns:
- the new union vector
 
- 
convertToUnionPromote to a union, preserving the existing data vector as a member of the new union. Back-fill the types vector with the proper type value for existing rows.- Returns:
- the new union vector
 
- 
collectLedgersDescription copied from interface:ValueVectorAdd the ledgers underlying the buffers underlying the components of the vector to the set provided. Used to determine actual memory allocation.- Specified by:
- collectLedgersin interface- ValueVector
- Overrides:
- collectLedgersin class- BaseRepeatedValueVector
- Parameters:
- ledgers- set of ledgers to which to add ledgers for this vector
 
- 
getPayloadByteCountpublic int getPayloadByteCount(int valueCount) Description copied from interface:ValueVectorReturn the number of value bytes consumed by actual data.- Specified by:
- getPayloadByteCountin interface- ValueVector
- Overrides:
- getPayloadByteCountin class- BaseRepeatedValueVector
 
 
-