Class OffsetVectorWriterImpl
- All Implemented Interfaces:
ColumnWriter,ScalarWriter,ValueWriter,OffsetVectorWriter,WriterEvents,WriterPosition
Note that the lastWriteIndex tracked here corresponds to the data values; it is one less than the actual offset vector last write index due to the nature of offset vector layouts. The selection of last write index basis makes roll-over processing easier as only this writer need know about the +1 translation required for writing.
The states illustrated in the base class apply here as well, remembering that the end offset for a row (or array position) is written one ahead of the vector index.
The vector index does create an interesting dynamic for the child writers. From the child writer's perspective, the states described in the super class are the only states of interest. Here we want to take the perspective of the parent.
The offset vector is an implementation of a repeat level. A repeat level can occur for a single array, or for a collection of columns within a repeated map. (A repeat level also occurs for variable-width fields, but this is a bit harder to see, so let's ignore that for now.)
The key point to realize is that each repeat level introduces an isolation level in terms of indexing. That is, empty values in the outer level have no affect on indexing in the inner level. In fact, the nature of a repeated outer level means that there are no empties in the inner level.
To illustrate:
Offset Vector Data Vector Indexes
lw, v > | 10 | - - - - - > | X | 10
| 12 | - - + | X | invalid input: '<' lw' 11
| | + - - > | | invalid input: '<' v' 12
In the above, the client has just written an array of two elements
at the current write position. The data starts at offset 10 in
the data vector, and the next write will be at 12. The end offset
is written one ahead of the vector index.
From the data vector's perspective, its last-write (lw') reflects the last element written. If this is an array of scalars, then the write index is automatically incremented, as illustrated by v'. (For map arrays, the index must be incremented by calling save() on the map array writer.)
Suppose the client now skips some arrays:
Offset Vector Data Vector
lw > | 10 | - - - - - > | X | 10
| 12 | - - + | X | invalid input: '<' lw' 11
| | + - - > | | invalid input: '<' v' 12
| | | | 13
v > | | | | 14
The last write position does not move and there are gaps in the
offset vector. The vector index points to the current row. Note
that the data vector last write and vector indexes do not change,
this reflects the fact that the the data vector's vector index
(v') matches the tail offset
The client now writes a three-element vector:
Offset Vector Data Vector
| 10 | - - - - - > | X | 10
| 12 | - - + | X | 11
| 12 | - - + - - > | Y | 12
| 12 | - - + | Y | 13
lw, v > | 12 | - - + | Y | invalid input: '<' lw' 14
| 15 | - - - - - > | | invalid input: '<' v' 15
Quite a bit just happened. The empty offset slots were back-filled
with the last write offset in the data vector. The client wrote
three values, which advanced the last write and vector indexes
in the data vector. And, the last write index in the offset
vector also moved to reflect the update of the offset vector.
Note that as a result, multiple positions in the offset vector
point to the same location in the data vector. This is fine; we
compute the number of entries as the difference between two successive
offset vector positions, so the empty positions have become 0-length
arrays.
Note that, for an array of scalars, when overflow occurs, we need only worry about two states in the data vector. Either data has been written for the row (as in the third example above), and so must be moved to the roll-over vector, or no data has been written and no move is needed. We never have to worry about missing values because the cannot occur in the data vector.
See ObjectArrayWriter for information about arrays of
maps (arrays of multiple columns.)
Empty Slots
The offset vector writer handles empty slots in two distinct ways. First, the writer handles its own empties. Suppose that this is the offset vector for a VarChar column. Suppose we write "Foo" in the first slot. Now we have an offset vector with the values [ 0 3 ]. Suppose the client skips several rows and next writes at slot 5. We must copy the latest offset (3) into all the skipped slots: [ 0 3 3 3 3 3 ]. The result is a set of four empty VarChars in positions 1, 2, 3 and 4. (Here, remember that the offset vector always has one more value than the the number of rows.)
The second way to fill empties is in the data vector. The data vector may choose
to fill the four "empty" slots with a value, say "X". In this case, it is up to
the data vector to fill in the values, calling into this vector to set each
offset. Note that when doing this, the calls are a bit different than for writing
a regular value because we want to write at the "last write position", not the
current row position. See BaseVarWidthWriter for an example.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractFixedWidthWriter
AbstractFixedWidthWriter.BaseFixedWidthWriter, AbstractFixedWidthWriter.BaseIntWriterNested classes/interfaces inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriterImpl
AbstractScalarWriterImpl.ScalarObjectWriterNested classes/interfaces inherited from interface org.apache.drill.exec.vector.accessor.writer.WriterEvents
WriterEvents.ColumnWriterListener, WriterEvents.State -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected intCached value of the end offset for the current value.Fields inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractFixedWidthWriter
lastWriteIndexFields inherited from class org.apache.drill.exec.vector.accessor.writer.BaseScalarWriter
capacity, drillBuf, emptyValue, listener, MIN_BUFFER_SIZEFields inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriterImpl
schema, vectorIndex -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidcopy(ColumnReader from) Copy a single value from the given reader, which must be of the same type as this writer.voiddump(HierarchicalFormatter format) protected final voidfillEmpties(int fillCount) final voidfillOffset(int newOffset) intvoidThe vectors backing this writer rolled over.final intprotected final intReturn the write offset, which is one greater than the index reported by the vector index.voidThe vectors backing this vector are about to roll over.protected voidrealloc(int size) voidDuring a writer to a row, rewind the the current index position to restart the row.final voidreviseOffset(int newOffset) intvoidsetDefaultValue(Object value) Set the default value to be used to fill empties for this writer.final voidsetNextOffset(int newOffset) final voidWrite value to a vector as a Java object of the "native" type for the column.voidsetValueCount(int valueCount) voidvoidstartRow()Start a new row.voidStart a write (batch) operation.Describe the type of the value.vector()intwidth()Methods inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractFixedWidthWriter
endWrite, lastWriteIndex, mandatoryResize, resize, setBuffer, setLastWriteIndexMethods inherited from class org.apache.drill.exec.vector.accessor.writer.BaseScalarWriter
appendBytes, bindListener, bindSchema, canExpand, nullable, overflowed, setBoolean, setBytes, setDate, setDecimal, setDouble, setFloat, setInt, setLong, setNull, setPeriod, setString, setTime, setTimestampMethods inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriterImpl
bindIndex, endArrayValue, isProjected, rowStartIndex, saveRow, schema, type, writeIndexMethods inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriter
conversionError, extendedType, setObject, toStringMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface org.apache.drill.exec.vector.accessor.ColumnWriter
isProjected, nullable, schema, setNull, setObject, typeMethods inherited from interface org.apache.drill.exec.vector.accessor.ScalarWriter
extendedTypeMethods inherited from interface org.apache.drill.exec.vector.accessor.ValueWriter
appendBytes, setBoolean, setBytes, setDate, setDecimal, setDouble, setFloat, setInt, setLong, setNull, setPeriod, setString, setTime, setTimestampMethods inherited from interface org.apache.drill.exec.vector.accessor.writer.WriterEvents
bindIndex, bindListener, endArrayValue, endWrite, saveRowMethods inherited from interface org.apache.drill.exec.vector.accessor.WriterPosition
lastWriteIndex, rowStartIndex, writeIndex
-
Field Details
-
nextOffset
protected int nextOffsetCached value of the end offset for the current value. Used primarily for variable-width columns to allow the column to be rewritten multiple times within the same row. The start offset value is updated with the end offset only when the value is committed in}.invalid @link
{@link @endValue()
-
-
Constructor Details
-
OffsetVectorWriterImpl
-
-
Method Details
-
vector
- Specified by:
vectorin classAbstractScalarWriterImpl
-
width
public int width()- Specified by:
widthin classAbstractFixedWidthWriter
-
realloc
protected void realloc(int size) - Overrides:
reallocin classBaseScalarWriter
-
valueType
Description copied from interface:ScalarWriterDescribe the type of the value. This is a compression of the value vector type: it describes which method will return the vector value.- Specified by:
valueTypein interfaceScalarWriter- Returns:
- the value type which indicates which get method is valid for the column
-
startWrite
public void startWrite()Description copied from interface:WriterEventsStart a write (batch) operation. Performs any vector initialization required at the start of a batch (especially for offset vectors.)- Specified by:
startWritein interfaceWriterEvents- Overrides:
startWritein classAbstractFixedWidthWriter
-
nextOffset
public int nextOffset()- Specified by:
nextOffsetin interfaceOffsetVectorWriter
-
rowStartOffset
public int rowStartOffset()- Specified by:
rowStartOffsetin interfaceOffsetVectorWriter
-
startRow
public void startRow()Description copied from interface:WriterEventsStart a new row. To be called only when a row is not active. To restart a row, callWriterEvents.restartRow()instead.- Specified by:
startRowin interfaceWriterEvents- Overrides:
startRowin classAbstractScalarWriterImpl
-
prepareWrite
protected final int prepareWrite()Return the write offset, which is one greater than the index reported by the vector index.- Returns:
- the offset in which to write the current offset of the end of the current data value
-
prepareFill
public final int prepareFill() -
fillEmpties
protected final void fillEmpties(int fillCount) - Specified by:
fillEmptiesin classAbstractFixedWidthWriter
-
setNextOffset
public final void setNextOffset(int newOffset) - Specified by:
setNextOffsetin interfaceOffsetVectorWriter
-
reviseOffset
public final void reviseOffset(int newOffset) -
fillOffset
public final void fillOffset(int newOffset) -
setValue
Description copied from interface:ValueWriterWrite value to a vector as a Java object of the "native" type for the column. This form is available only on scalar writers. The object must be of the form for the primary write method above.Primarily to be used when the code already knows the object type.
- Specified by:
setValuein interfaceValueWriter- Parameters:
value- a value that matches the primary setter above, or null to set the column to null- See Also:
-
skipNulls
public void skipNulls()- Overrides:
skipNullsin classAbstractFixedWidthWriter
-
restartRow
public void restartRow()Description copied from interface:WriterEventsDuring a writer to a row, rewind the the current index position to restart the row. Done when abandoning the current row, such as when filtering out a row at read time.- Specified by:
restartRowin interfaceWriterEvents- Overrides:
restartRowin classAbstractFixedWidthWriter
-
preRollover
public void preRollover()Description copied from interface:WriterEventsThe vectors backing this vector are about to roll over. Finish the current batch up to, but not including, the current row.- Specified by:
preRolloverin interfaceWriterEvents- Overrides:
preRolloverin classAbstractFixedWidthWriter
-
postRollover
public void postRollover()Description copied from interface:WriterEventsThe vectors backing this writer rolled over. This means that data for the current row has been rolled over into a new vector. Offsets and indexes should be shifted based on the understanding that data for the current row now resides at the start of a new vector instead of its previous location elsewhere in an old vector.- Specified by:
postRolloverin interfaceWriterEvents- Overrides:
postRolloverin classAbstractFixedWidthWriter
-
setValueCount
public void setValueCount(int valueCount) - Overrides:
setValueCountin classAbstractFixedWidthWriter
-
dump
- Specified by:
dumpin interfaceOffsetVectorWriter- Specified by:
dumpin interfaceWriterEvents- Overrides:
dumpin classAbstractFixedWidthWriter
-
setDefaultValue
Description copied from interface:ScalarWriterSet the default value to be used to fill empties for this writer. Only valid for required writers: null writers set this is-set bit to 0 and set the data value to 0.- Specified by:
setDefaultValuein interfaceScalarWriter- Parameters:
value- the value to set. Cannot be null. The type of the value must match that legal forValueWriter.setValue(Object)
-
copy
Description copied from interface:ColumnWriterCopy a single value from the given reader, which must be of the same type as this writer.- Specified by:
copyin interfaceColumnWriter- Parameters:
from- reader to provide the data
-