public class TestCorruptParquetDateCorrection extends PlanTestBase
ParquetRecordWriter.WRITER_VERSION_PROPERTY <
ParquetReaderUtility.DRILL_WRITER_VERSION_STD_DATE_FORMAT
The values have been read correctly by Drill, but external tools like Spark reading the files will see
corrupted values for all dates that have been written by Drill.
This change corrects the behavior of the Drill parquet writer to correctly
store dates in the format given in the parquet specification.
To maintain compatibility with old files, the parquet reader code has
been updated to check for the old format and automatically shift the
corrupted values into corrected ones automatically.
The test cases included here should ensure that all files produced by
historical versions of Drill will continue to return the same values they
had in previous releases. For compatibility with external tools, any old
files with corrupted dates can be re-written using the CREATE TABLE AS
command (as the writer will now only produce the specification-compliant
values, even if after reading out of older corrupt files).
While the old behavior was a consistent shift into an unlikely range
to be used in a modern database (over 10,000 years in the future), these are still
valid date values. In the case where these may have been written into
files intentionally, an option is included to turn off the auto-correction.
Use of this option is assumed to be extremely unlikely, but it is included
for completeness.BaseTestQuery.ClassicTestServices, BaseTestQuery.SilentListener, BaseTestQuery.TestResultSetDrillTest.MemWatcherEXPECTED_NOT_FOUND, JSON_FORMAT, OPTIQ_FORMAT, UNEXPECTED_FOUNDallocator, bits, client, config, serviceSetc, dirTestWatcher, optionManagerlogOutcome, objectMapper, REPEAT_RULE, thrownException, TIMEOUT| Constructor and Description |
|---|
TestCorruptParquetDateCorrection() |
| Modifier and Type | Method and Description |
|---|---|
static void |
initFs() |
void |
testCorrectDatesAndExceptionWhileParsingCreatedBy() |
void |
testCorrectDateValuesGeneratedByOldVersionOfDrill() |
void |
testCorruptValueDetectionDuringPruning() |
void |
testDatePartitionedReadWithCorruption() |
void |
testQueryWithCorruptedDates() |
void |
testReadCorruptDatesWithNullFilledColumns()
To fix some of the corrupted dates fixed as part of DRILL-4203 it requires
actually looking at the values stored in the file.
|
void |
testReadMixedOldAndNewBothReaders()
Test reading a directory full of parquet files with dates, some of which have corrupted values
due to DRILL-4203.
|
void |
testReadNewMetadataCacheFileOverOldAndNewFiles() |
void |
testReadOldMetadataCacheFile() |
void |
testReadOldMetadataCacheFileWithPruning() |
void |
testReadPartitionedOnCorrectDates()
Test reading a directory full of partitioned parquet files with dates, these files have a drill version
number of "1.9.0-SNAPSHOT" and parquet-writer version number of "2" in their footers, so we can be certain
they do not have corruption.
|
void |
testUserOverrideDateCorrection() |
void |
testVarcharPartitionedReadWithCorruption() |
checkForMetadataFile, createMetadataDir, getPhysicalJsonPlan, getPlanInString, testPhysicalPlan, testPhysicalPlanExecutionBasedOnQuery, testPlanMatchingPatterns, testPlanMatchingPatterns, testPlanMatchingPatterns, testPlanMatchingPatterns, testPlanOneExcludedPattern, testPlanOneExpectedPattern, testPlanOneExpectedPatternOneExcluded, testPlanSubstrPatterns, testPlanWithAttributesMatchingPatterns, testRelLogicalJoinOrder, testRelLogicalPlanLevDigest, testRelLogicalPlanLevExplain, testRelPhysicalJoinOrder, testRelPhysicalPlanLevDigest, testRelPhysicalPlanLevExplainalterSession, cloneDefaultTestConfigProperties, closeClient, errorMsgTestHelper, getAllocator, getDrillbitContext, getFile, getPhysicalFileFromResource, getResultString, getUserPort, logResult, newTest, parseErrorHelper, printResult, resetAllSessionOptions, resetDrillbitCount, resetSessionOption, runSQL, setColumnWidth, setColumnWidths, setSessionOption, setSessionOption, setSessionOption, setSessionOption, setupDefaultTestCluster, test, test, testBuilder, testLogicalWithResults, testNoResult, testNoResult, testPhysical, testPhysicalFromFile, testPhysicalWithResults, testPreparedStatement, testRunAndPrint, testRunAndReturn, testSql, testSqlWithResults, testWithListener, updateClient, updateClient, updateClient, updateTestCluster, updateTestClusterclear, getLocalFileSystem, mockDrillbitContext, mockUsDateFormatSymbols, mockUtcDateTimeZone, parseExpr, setupOptionManagerescapeJsonString, finishDrillTest, initDrillTestpublic void testReadPartitionedOnCorrectDates()
throws Exception
Exceptionpublic void testVarcharPartitionedReadWithCorruption()
throws Exception
Exceptionpublic void testDatePartitionedReadWithCorruption()
throws Exception
Exceptionpublic void testCorrectDatesAndExceptionWhileParsingCreatedBy()
throws Exception
Exceptionpublic void testQueryWithCorruptedDates()
throws Exception
Exceptionpublic void testCorruptValueDetectionDuringPruning()
throws Exception
Exceptionpublic void testReadCorruptDatesWithNullFilledColumns()
throws Exception
Exceptionpublic void testUserOverrideDateCorrection()
throws Exception
Exceptionpublic void testReadMixedOldAndNewBothReaders()
throws Exception
Exceptionpublic void testReadOldMetadataCacheFile()
throws Exception
Exceptionpublic void testReadOldMetadataCacheFileWithPruning()
throws Exception
Exceptionpublic void testReadNewMetadataCacheFileOverOldAndNewFiles()
throws Exception
ExceptionCopyright © 2021 The Apache Software Foundation. All rights reserved.