public class TestCsvTableProperties extends BaseCsvTest
The tests also verify that, without headers, if a schema is provided, the text format plugin will create columns using that schema rather than using the "columns" array column.
DrillTest.MemWatcherBIG_COL_SIZE, EMPTY_FILE, FILE_N_NAME, NESTED_DIR, NESTED_FILE, PART_DIR, ROOT_FILE, SCHEMA_BATCH_ENABLED, secondFile, testDir, validHeadersclient, cluster, dirTestWatcherlogOutcome, objectMapper, REPEAT_RULE, thrownException, TIMEOUT| Constructor and Description |
|---|
TestCsvTableProperties() |
| Modifier and Type | Method and Description |
|---|---|
static void |
setup() |
void |
testComment() |
void |
testDelimiter() |
void |
testDoubleQuoteChars()
Test that the quote escape can be the quote character
itself.
|
void |
testHeadersWithoutSchema() |
void |
testHeadersWithSchema() |
void |
testKeepWitespace()
Trim leading and trailing whitespace.
|
void |
testMessyQuotes()
The legacy "V2" text reader had special handling for quotes
that appear inside fields.
|
void |
testNewlineProp()
Verify that a custom newline character works, and that the symbol
'\n' can be used in SQL and is stored properly in the schema file.
|
void |
testNoComment()
Users have complained about the comment character.
|
void |
testNoHeadersWithoutSchema() |
void |
testNoHeadersWithSchema() |
void |
testNoHeadersWithSchemaExtraCols() |
void |
testQuoteChars()
Test quote and quote escape
|
void |
testQuotesAndCustomNewLine() |
void |
testSkipHeadersWithoutSchema() |
void |
testSkipHeadersWithSchema() |
void |
testSpecialChars()
End-to-end test of special characters for delimiter (a control
character, ASCII 0x01) and quote (same as the SQL quote.)
|
void |
testTrimWitespace()
Trim leading and trailing whitespace.
|
buildBigColFile, buildFile, buildFile, buildNestedTable, buildTable, enableMultiScan, enableSchema, enableSchemaSupport, resetMultiScan, resetSchema, resetSchemaSupport, setup, setupgetFile, queryBuilder, run, runAndLog, runAndPrint, runAndPrint, shutdown, startCluster, testBuilderescapeJsonString, finishDrillTest, initDrillTestpublic void testNoHeadersWithoutSchema()
throws Exception
Exceptionpublic void testNoHeadersWithSchemaExtraCols()
throws Exception
Exceptionpublic void testSkipHeadersWithSchema()
throws Exception
Exceptionpublic void testSkipHeadersWithoutSchema()
throws Exception
Exceptionpublic void testNoComment()
throws Exception
Exceptionpublic void testQuoteChars()
throws Exception
Exceptionpublic void testDoubleQuoteChars()
throws Exception
Exceptionpublic void testQuotesAndCustomNewLine()
throws Exception
Exceptionpublic void testSpecialChars()
throws Exception
Exceptionpublic void testNewlineProp()
throws Exception
Exceptionpublic void testMessyQuotes()
throws Exception
first"field"here,another "field
Since behavior in this case is ill-defined, the reader apparently treated quotes as normal characters unless the field started with a quote. There is an option in the UniVocity code to set this behavior, but it is not exposed in Drill. So, this test verifies the non-customizable messy quote handling logic.
If a field starts with a quote, quoting rules kick in, including
the quote escape, which is, by default, itself a quote. So
"foo""bar"
is read as
foo"bar
But, for fields not starting with a quote, the quote escape
is ignored, so:
foo""bar
is read as
foo""bar
This seems more like a bug than a feature, but it does appear to be
how the "new" text reader always worked, so the behavior is preserved.
Also, seems that the text reader supported embedded newlines, even though such behavior will not work if the embedded newline occurs near a split. In this case, the reader will scan forward to find a record delimiter (a newline by default), will find the embedded newline, and will read a partial first record. Again, this appears to be legacy behavior, and so is preserved, even if broken.
The key thing is that if the CSV is well-formed (no messy quotes, properly quoted fields with proper escapes, no embedded newlines) then things will work OK.
Exceptionpublic void testKeepWitespace()
throws Exception
ExceptionCopyright © 2021 The Apache Software Foundation. All rights reserved.