Class BaseScalarWriter
- All Implemented Interfaces:
ColumnWriter
,ScalarWriter
,ValueWriter
,WriterEvents
,WriterPosition
- Direct Known Subclasses:
AbstractFixedWidthWriter
,BaseVarWidthWriter
The only tricky part to this class is understanding the state of the write indexes as the write proceeds. There are two pointers to consider:
- lastWriteIndex: The position in the vector at which the client last asked us to write data. This index is maintained in this class because it depends only on the actions of this class.
- vectorIndex: The position in the vector at which we will write if the client chooses to write a value at this time. The vector index is shared by all columns at the same repeat level. It is incremented as the client steps through the write and is observed in this class each time a write occurs.
- The set of top-level scalar columns, or those within a top-level, non-repeated map, or nested to any depth within non-repeated maps rooted at the top level.
- The values for a single scalar array.
- The set of scalar columns within a repeated map, or nested within non-repeated maps within a repeated map.
Let's illustrate the states. Let's focus on one column and illustrate the three states that can occur during write:
- Behind: the last write index is more than one position behind the vector index. Zero-filling will be needed to catch up to the vector index.
- Written: the last write index is the same as the vector index because the client wrote data at this position (and previous values were back-filled with nulls, empties or zeros.)
- Unwritten: the last write index is one behind the vector index. This occurs when the column was written, then the client moved to the next row or array position.
- Restarted: The current row is abandoned (perhaps filtered out) and is to be rewritten. The last write position moves back one position. Note that, the Restarted state is indistinguishable from the unwritten state: the only real difference is that the current slot (pointed to by the vector index) contains the previous written value that must be overwritten or back-filled. But, this is fine, because we assume that unwritten values are garbage anyway.
Behind Written Unwritten Restarted
|X| |X| |X| |X|
lw >|X| |X| |X| |X|
| | |0| |0| lw > |0|
v >| | lw, v > |X| lw > |X| v > |X|
v > | |
The illustrated state transitions are:
- Suppose the state starts in Behind.
- If the client writes a value, then the empty slot is back-filled and the state moves to Written.
- If the client does not write a value, the state stays at Behind, and the gap of unfilled values grows.
- When in the Written state:
- If the client saves the current row or array position, the vector index increments and we move to the Unwritten state.
- If the client abandons the row, the last write position moves back one to recreate the unwritten state. We've shown this state separately above just to illustrate the two transitions from Written.
- When in the Unwritten (or Restarted) states:
- If the client writes a value, then the writer moves back to the Written state.
- If the client skips the value, then the vector index increments again, leaving a gap, and the writer moves to the Behind state.
We've already noted that the Restarted state is identical to the Unwritten state (and was discussed just to make the flow a bit clearer.) The astute reader will have noticed that the Behind state is the same as the Unwritten state if we define the combined state as when the last write position is behind the vector index.
Further, if one simply treats the gap between last write and the vector indexes as the amount (which may be zero) to back-fill, then there is just one state. This is, in fact, how the code works: it always writes to the vector index (and can do so multiple times for a single row), back-filling as necessary.
The states, then, are more for our use in understanding the algorithm. They are also very useful when working through the logic of performing a roll-over when a vector overflows.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriterImpl
AbstractScalarWriterImpl.ScalarObjectWriter
Nested classes/interfaces inherited from interface org.apache.drill.exec.vector.accessor.writer.WriterEvents
WriterEvents.ColumnWriterListener, WriterEvents.State
-
Field Summary
Modifier and TypeFieldDescriptionprotected int
Capacity, in values, of the currently allocated buffer that backs the vector.protected DrillBuf
protected byte[]
Value to use to fill empties.protected WriterEvents.ColumnWriterListener
Listener invoked if the vector overflows.static final int
Fields inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriterImpl
schema, vectorIndex
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
appendBytes
(byte[] value, int len) void
Bind a listener to the underlying vector writer.void
bindSchema
(ColumnMetadata schema) protected boolean
canExpand
(int delta) The vector is about to grow.void
dump
(HierarchicalFormatter format) boolean
nullable()
Whether this writer allows nulls.protected void
Handle vector overflow.protected void
realloc
(int size) void
setBoolean
(boolean value) protected abstract void
All change of buffer comes through this function to allow capturing the buffer address and capacity.void
setBytes
(byte[] value, int len) void
void
setDecimal
(BigDecimal value) void
setDouble
(double value) void
setFloat
(float value) void
setInt
(int value) void
setLong
(long value) void
setNull()
Set the current value to null.void
setPeriod
(org.joda.time.Period value) void
void
void
setTimestamp
(Instant value) abstract void
Methods inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriterImpl
bindIndex, endArrayValue, isProjected, rowStartIndex, saveRow, schema, startRow, startWrite, type, vector, writeIndex
Methods inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriter
conversionError, extendedType, setObject, toString
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.apache.drill.exec.vector.accessor.ColumnWriter
copy
Methods inherited from interface org.apache.drill.exec.vector.accessor.ScalarWriter
setDefaultValue, valueType
Methods inherited from interface org.apache.drill.exec.vector.accessor.ValueWriter
setValue
Methods inherited from interface org.apache.drill.exec.vector.accessor.writer.WriterEvents
endWrite, postRollover, preRollover, restartRow
Methods inherited from interface org.apache.drill.exec.vector.accessor.WriterPosition
lastWriteIndex
-
Field Details
-
MIN_BUFFER_SIZE
public static final int MIN_BUFFER_SIZE- See Also:
-
listener
Listener invoked if the vector overflows. If not provided, then the writer does not support vector overflow. -
emptyValue
protected byte[] emptyValueValue to use to fill empties. Must be at least as wide as each value. -
drillBuf
-
capacity
protected int capacityCapacity, in values, of the currently allocated buffer that backs the vector. Updated each time the buffer changes. The capacity is in values (rather than bytes) to streamline the per-write logic.
-
-
Constructor Details
-
BaseScalarWriter
public BaseScalarWriter()
-
-
Method Details
-
bindListener
Description copied from interface:WriterEvents
Bind a listener to the underlying vector writer. This listener reports on vector events (overflow, growth), and so is called only when the writer is backed by a vector. The listener is ignored (and never called) for dummy (non-projected) columns. If the column is compound (such as for a nullable or repeated column, or for a map), then the writer is bound to the individual components.- Specified by:
bindListener
in interfaceWriterEvents
- Overrides:
bindListener
in classAbstractScalarWriter
- Parameters:
listener
- the vector event listener to bind
-
bindSchema
- Overrides:
bindSchema
in classAbstractScalarWriterImpl
-
setBuffer
protected abstract void setBuffer()All change of buffer comes through this function to allow capturing the buffer address and capacity. Only two ways to set the buffer: by binding to a vector in bindVector(), or by resizing the vector in prepareWrite(). -
realloc
protected void realloc(int size) -
canExpand
protected boolean canExpand(int delta) The vector is about to grow. Give the listener a chance to veto the growth and opt for overflow instead.- Parameters:
delta
- the new amount of memory to allocate- Returns:
- true if the vector can be grown, false if an overflow should be triggered
-
overflowed
protected void overflowed()Handle vector overflow. If this is an array, then there is a slim chance we may need to grow the vector immediately after overflow. Since a double overflow is not allowed, this recursive call won't continue forever. -
skipNulls
public abstract void skipNulls() -
nullable
public boolean nullable()Description copied from interface:ColumnWriter
Whether this writer allows nulls. This is not as simple as checking for theTypeProtos.DataMode.OPTIONAL
type in the schema. List entries are nullable, if they are primitive, but not if they are maps or lists. Unions are nullable, regardless of cardinality.- Returns:
- true if a call to
ColumnWriter.setNull()
is supported, false if not
-
setNull
public void setNull()Description copied from interface:ColumnWriter
Set the current value to null. Support depends on the underlying implementation: only nullable types support this operation. throws IllegalStateException if called on a non-nullable value. -
setBoolean
public void setBoolean(boolean value) -
setInt
public void setInt(int value) -
setLong
public void setLong(long value) -
setFloat
public void setFloat(float value) -
setDouble
public void setDouble(double value) -
setString
-
setBytes
public void setBytes(byte[] value, int len) -
appendBytes
public void appendBytes(byte[] value, int len) -
setDecimal
-
setPeriod
public void setPeriod(org.joda.time.Period value) -
setDate
-
setTime
-
setTimestamp
-
dump
- Specified by:
dump
in interfaceWriterEvents
- Overrides:
dump
in classAbstractScalarWriterImpl
-