public class TupleParser extends ObjectParser
The structure parser maintains a map of known fields. Each time a field is parsed, looks up the field in the map. If not found, the parser looks ahead to find a value token, if any, and calls this class to add a new column. This class creates a column writer based either on the type provided in a provided schema, or inferred from the JSON token.
As it turns out, most of the semantic action occurs at the tuple level: that is where fields are defined, types inferred, and projection is computed.
true,
false and null.
But what happens if the first value for a field is null? We
don't know what kind of parser to create because we don't have a schema.
Instead, we have to create a temporary placeholder parser that will consume
nulls, waiting for a real type to show itself. Once that type appears, the
null parser can replace itself with the correct form. Each vector's
"fill empties" logic will back-fill the newly created vector with nulls
for prior rows.
Two null parsers are needed: one when we see an empty list, and one for
when we only see null. The one for null{@code must morph into
the one for empty lists if we see:
{@code {a: null} {a: [ ] }}
If we get all the way through the batch, but have still not seen a type, then we have to guess. A prototype type system can tell us, otherwise we guess {@code VARCHAR}. ({@code VARCHAR} is the right choice for all-text mode, it is as good a guess as any for other cases.)
For scalars the pattern is: {a: null} {a: "foo"}. Type
selection happens on the value {@code "foo"}.
For arrays, the pattern is: {a: []} {a: ["foo"]}. Type
selection happens on the first array element. Note that type selection
must happen on the first element, even if tha element is null (which,
as we just said, ambiguous.)
If we are forced to pick a type (because we hit the end of a batch, or we see {@code [null]}, then we pick {@code VARCHAR} as we allow any scalar to be converted to {@code VARCHAR}. This helps for a single-file query, but not if multiple fragments each make their own (inconsistent) decisions. Only a schema provides a consistent answer.
logger| Constructor and Description |
|---|
TupleParser(JsonLoaderImpl loader,
TupleWriter tupleWriter,
TupleMetadata providedSchema) |
TupleParser(JsonStructureParser structParser,
JsonLoaderImpl loader,
TupleWriter tupleWriter,
TupleMetadata providedSchema) |
| Modifier and Type | Method and Description |
|---|---|
protected FieldFactory |
fieldFactory() |
void |
forceEmptyArrayResolution(String key) |
void |
forceNullResolution(String key) |
JsonLoaderImpl |
loader() |
ElementParser |
onField(String key,
TokenIterator tokenizer)
The structure parser has just encountered a new field for this
object.
|
protected TupleMetadata |
providedSchema() |
ElementParser |
resolveArray(String key,
TokenIterator tokenizer) |
ElementParser |
resolveField(String key,
TokenIterator tokenizer) |
TupleWriter |
writer() |
fieldParser, onEnd, onStart, parse, replaceFieldParsererrorFactory, structParserpublic TupleParser(JsonStructureParser structParser, JsonLoaderImpl loader, TupleWriter tupleWriter, TupleMetadata providedSchema)
public TupleParser(JsonLoaderImpl loader, TupleWriter tupleWriter, TupleMetadata providedSchema)
public JsonLoaderImpl loader()
public TupleWriter writer()
protected TupleMetadata providedSchema()
protected FieldFactory fieldFactory()
public ElementParser onField(String key, TokenIterator tokenizer)
ObjectParserFieldParserFactory class.
However, special cases (such as Mongo extended types) can create a
custom parser.
If the field is not projected, the method should return a dummy parser
from FieldParserFactory.ignoredFieldParser().
The dummy parser will "free-wheel" over whatever values the
field contains. (This is one way to avoid structure errors in a JSON file:
just ignore them.) Otherwise, the parser will look ahead to guess the
field type and will call one of the "add" methods, each of which should
return a value listener for the field itself.
A normal field will respond to the structure of the JSON file as it appears. The associated value listener receives events for the field value. The value listener may be asked to create additional structure, such as arrays or nested objects.
Parse position: { ... field : ^ ? for a newly-seen field.
Constructs a value parser and its listeners by looking ahead
some number of tokens to "sniff" the type of the value. For
example:
foo: <value> - Field valuefoo: [ <value> ] - 1D array valuefoo: [ [<value> ] ] - 2D array valueThere are two cases in which no type estimation is possible:
foo: nullfoo: []onField in class ObjectParserpublic ElementParser resolveField(String key, TokenIterator tokenizer)
public ElementParser resolveArray(String key, TokenIterator tokenizer)
public void forceNullResolution(String key)
public void forceEmptyArrayResolution(String key)
Copyright © 2021 The Apache Software Foundation. All rights reserved.