Package org.apache.drill.exec.record
Class BatchSchema
java.lang.Object
org.apache.drill.exec.record.BatchSchema
- All Implemented Interfaces:
Iterable<MaterializedField>
Historically
BatchSchema
is used to represent the schema of a batch. However, it does not handle complex types well. If you have a choice, use
TupleMetadata
instead.-
Nested Class Summary
-
Constructor Summary
ConstructorDescriptionBatchSchema
(BatchSchema.SelectionVectorMode selectionVector, List<MaterializedField> fields) -
Method Summary
Modifier and TypeMethodDescriptionclone()
boolean
DRILL-5525: the semantics of this method are badly broken.format()
Format the schema into a multi-line format.getColumn
(int index) int
int
hashCode()
boolean
isEquivalent
(BatchSchema other) Compare that two schemas are identical according to the rules defined inMaterializedField.isEquivalent(MaterializedField)
.iterator()
merge
(BatchSchema otherSchema) Merge two schemas to produce a new, merged schema.static SchemaBuilder
toString()
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Constructor Details
-
BatchSchema
-
-
Method Details
-
newBuilder
-
getFieldCount
public int getFieldCount() -
getColumn
-
iterator
- Specified by:
iterator
in interfaceIterable<MaterializedField>
-
getSelectionVectorMode
-
clone
-
toString
-
hashCode
public int hashCode() -
equals
DRILL-5525: the semantics of this method are badly broken. Caveat emptor. This check used for detecting actual schema change inside operator record batch will not work for AbstractContainerVectors (like MapVector). In each record batch a reference to incoming batch schema is stored (let say S:{a: int}) and then equals is called on that stored reference and current incoming batch schema. Internally schema object has references to Materialized fields from vectors in container. If there is change in incoming batch schema, then the upstream will create a new ValueVector in its output container with the new detected type, which in turn will have new instance for Materialized Field. Then later a new BatchSchema object is created for this new incoming batch (let say S":{a":varchar}). The operator calling equals will have reference to old schema object (S) and hence first check will not be satisfied and then it will call equals on each of the Materialized Field (a.equals(a")). Since new materialized field is created for newly created vector the equals check on field will return false. And schema change will be detected in this case. Now consider instead of int vector there is a MapVector such that initial schema was (let say S:{a:{b:int, c:int}} and then later schema for Map field c changes, then in container Map vector will be found but later the children vector for field c will be replaced. This new schema object will be created as (S":{a:{b:int, c":varchar}}). Now when S.equals(S") is called it will eventually call a.equals(a) which will return true even though the schema of children value vector c has changed. This is because no new vector is created for field (a) and hence it's object reference to MaterializedField has not changed which will be reflected in both old and new schema instances. Hence we should make use ofisEquivalent(BatchSchema)
method instead sinceMaterializedField.isEquivalent(MaterializedField)
method is updated to remove the reference check. -
isEquivalent
Compare that two schemas are identical according to the rules defined inMaterializedField.isEquivalent(MaterializedField)
. In particular, this method requires that the fields have a 1:1 ordered correspondence in the two schemas.- Parameters:
other
- another non-null batch schema- Returns:
- true if the two schemas are equivalent according to
the
MaterializedField.isEquivalent(MaterializedField)
rules, false otherwise
-
merge
Merge two schemas to produce a new, merged schema. The caller is responsible for ensuring that column names are unique. The order of the fields in the new schema is the same as that of this schema, with the other schema's fields appended in the order defined in the other schema.Merging data with selection vectors is unlikely to be useful, or work well. With a selection vector, the two record batches would have to be correlated both in their selection vectors AND in the underlying vectors. Such a use case is hard to imagine. So, for now, this method forbids merging schemas if either of them carry a selection vector. If we discover a meaningful use case, we can revisit the issue.
- Parameters:
otherSchema
- the schema to merge with this one- Returns:
- the new, merged, schema
-
format
Format the schema into a multi-line format. Useful when debugging a query with a very wide schema as the usual single-line format is far too hard to read.
-