See: Description
| Interface | Description |
|---|---|
| RowBatchReader |
Extended version of a record reader used by the revised
scan batch operator.
|
| ScanOperatorEvents |
Interface to the set of readers, and reader schema, that the scan operator
manages.
|
| Class | Description |
|---|---|
| BaseScanOperatorExecTest |
Test of the scan operator framework.
|
| BaseScanOperatorExecTest.BaseMockBatchReader |
Base class for the "mock" readers used in this test.
|
| BaseScanOperatorExecTest.BaseScanFixtureBuilder | |
| BaseScanOperatorExecTest.MockEarlySchemaReader |
Mock reader that pretends to have a schema at open time
like an HBase or JDBC reader.
|
| ScanOperatorExec |
Implementation of the revised scan operator that uses a mutator aware of
batch sizes.
|
| ScanTestUtils | |
| ScanTestUtils.MockScanBuilder | |
| ScanTestUtils.ScanFixture | |
| ScanTestUtils.ScanFixtureBuilder | |
| TestColumnsArray |
Test the "columns" array mechanism integrated with the scan schema
orchestrator including simulating reading data.
|
| TestColumnsArrayFramework |
Test the columns-array specific behavior in the columns scan framework.
|
| TestColumnsArrayFramework.ColumnsScanFixtureBuilder | |
| TestColumnsArrayFramework.DummyColumnsReader | |
| TestColumnsArrayFramework.MockFileReaderFactory | |
| TestColumnsArrayParser | |
| TestFileScanFramework |
Tests the file metadata extensions to the file operator framework.
|
| TestFileScanFramework.DummyFileWork |
For schema-based testing, we only need the file path from the file work.
|
| TestFileScanFramework.FileScanFixtureBuilder | |
| TestFileScanFramework.MockFileReaderFactory |
Mock file reader that returns readers already created for specific
test cases.
|
| TestImplicitColumnParser | |
| TestImplicitColumnProjection | |
| TestScanBatchWriters |
Test of the "legacy" scan batch writers to ensure that the revised
set follows the same semantics as the original set.
|
| TestScanOperExecBasics |
Tests the basics of the scan operator protocol: error conditions,
etc.
|
| TestScanOperExecEarlySchema |
Test "early schema" readers: those that can declare a schema at
open time.
|
| TestScanOperExecLateSchema |
Test "late schema" readers: those like JSON that discover their schema
as they read data.
|
| TestScanOperExecOuputSchema |
Test the addition of an output schema to a reader.
|
| TestScanOperExecOverflow |
Test vector overflow in the context of the scan operator.
|
| TestScanOperExecSmoothing |
Test the ability of the scan operator to "absorb" schema changes by
"smoothing" out data types and modes across readers.
|
| TestScanOrchestratorEarlySchema |
Test the early-schema support of the scan orchestrator.
|
| TestScanOrchestratorImplicitColumns |
Tests the scan orchestrator's ability to merge table schemas
with implicit file columns provided by the file metadata manager.
|
| TestScanOrchestratorLateSchema |
Test the late-schema support in the scan orchestrator.
|
Two versions of the scan operator exist:
ScanBatch: the original version that uses readers based on the
RecordReader interface. ScanBatch cannot, however, handle
limited-length vectors.ScanOperatorExec: the revised version that uses a more modular
design and that offers a mutator that is a bit easier to use, and can limit
vector sizes.Further, the new version is designed to allow intensive unit test without the need for the Drill server. New readers should exploit this feature to include intensive tests to keep Drill quality high.
See ScanOperatorExec for details of the scan operator protocol
and components.
+------------+ +-----------+
| Scan Batch | +---> | ScanBatch |
| Creator | | +-----------+
+------------+ | |
| | |
v | |
+------------+ | v
| Format | ---+ +---------------+
| Plugin | -----> | Record Reader |
+------------+ +---------------+
The scan batch creator is unique to each storage plugin and is created
based on the physical operator configuration ("pop config"). The
scan batch creator delegates to the format plugin to create both the
scan batch (the scan operator) and the set of readers which the scan
batch will manage.
The scan batch
provides a Mutator that creates the vectors used by the
record readers. Schema continuity comes from reusing the Mutator from one
file/block to the next.
One characteristic of this system is that all the record readers are created up front. If we must read 1000 blocks, we'll create 1000 record readers. Developers must be very careful to only allocate resources when the reader is opened, and release resources when the reader is closed. Else, resource bloat becomes a large problem.
+------------+ +---------------+
| Scan Batch | -------> | Format Plugin |
| Creator | +---------------+
+------------+ / | \
/ | \
+---------------------+ | \ +---------------+
| OperatorRecordBatch | | +---->| ScanFramework |
+---------------------+ | | +---------------+
v | |
+------------------+ |
| ScanOperatorExec | |
+------------------+ v
| +--------------+
+----------> | Batch Reader |
+--------------+
Here, the scan batch creator again delegates to the format plugin. The
format plugin creates three objects:
OperatorRecordBatch, which encapsulates the Volcano
iterator protocol. It also holds onto the output batch. This allows the
operator implementation to just focus on its specific job.ScanOperatorExec is the operator implementation for
the new result-set-loader based scan.A key part of the scan strategy is the batch reader. ("Batch" because it reads an entire batch at a time, using the result set loader.) The framework creates batch readers one by one as needed. Resource bloat is less of an issue because only one batch reader instance exists at any time for each scan operator instance.
Each of the above is further broken down into additional classes to handle projection and so on.
Copyright © 2021 The Apache Software Foundation. All rights reserved.