public class TestSchemaSmoothing extends SubOperatorTest
Focuses on the SmoothingProjection class itself.
Note that, at present, schema smoothing does not work for entire maps. That is, if file 1 has, say {a: {b: 10, c: "foo"}} and file 2 has, say, {a: null}, then schema smoothing does not currently know how to recreate the map. The same is true of lists and unions. Handling such cases is complex and is probably better handled via a system that allows the user to specify their intent by providing a schema to apply to the two files.
Note that schema smoothing itself is an experimental work-around to a fundamental limitation in Drill:
DrillTest.MemWatcherdirTestWatcher, fixturelogOutcome, objectMapper, REPEAT_RULE, thrownException, TIMEOUT| Constructor and Description |
|---|
TestSchemaSmoothing() |
| Modifier and Type | Method and Description |
|---|---|
void |
testDifferentCase()
The prior and table schemas are identical, but the cases of names differ.
|
void |
testDifferentTypes()
Column names match, but types differ.
|
void |
testDiscrete()
Sanity test for the simple, discrete case.
|
void |
testDisjoint()
Case in which the table schema and prior are disjoint
sets.
|
void |
testLongerPartitionLength()
If using the legacy wildcard expansion, we are able to use the same
schema even if the new partition path is longer than the previous.
|
void |
testMissingNullableColumns()
Preserve the prior schema if table is a subset and missing columns
are nullable or repeated.
|
void |
testReordering()
Preserve the prior schema if table is a subset.
|
void |
testRequired()
Can't preserve the prior schema if it had required columns
where the new schema has no columns.
|
void |
testSamePartitionLength()
If using the legacy wildcard expansion, reuse schema if partition paths
are the same length.
|
void |
testSameSchemas()
The prior and table schemas are identical.
|
void |
testShorterPartitionLength()
If using the legacy wildcard expansion, reuse schema if the new partition path
is shorter than the previous.
|
void |
testSmaller()
Case in which the table schema is a superset of the prior
schema.
|
void |
testSmoothableSchemaBatches()
Integrated test across multiple schemas at the batch level.
|
void |
testSmoothingProjection()
Low-level test of the smoothing projection, including the exceptions
it throws when things are not going its way.
|
void |
testWildcardSmoothing()
A SELECT * query uses the schema of the table as the output schema.
|
classSetup, classTeardownescapeJsonString, finishDrillTest, initDrillTestpublic void testDiscrete()
public void testSmoothingProjection()
public void testSmaller()
public void testDisjoint()
public void testDifferentTypes()
public void testSameSchemas()
public void testDifferentCase()
public void testRequired()
public void testMissingNullableColumns()
public void testReordering()
public void testSamePartitionLength()
public void testShorterPartitionLength()
public void testLongerPartitionLength()
public void testSmoothableSchemaBatches()
public void testWildcardSmoothing()
It is an open question whether previous columns should be preserved on a hard reset. For now, the code implements, and this test verifies, that a hard reset clears the "memory" of prior schemas.
Copyright © 2021 The Apache Software Foundation. All rights reserved.