Deepgreen DB

Reading Data With Deepgreen External Table

You can access the data on Xdrive through the external table mechanism provided in Deepgreen DB. The syntax is essentially the same as Greenplum external table.

CREATE EXTERNAL TABLE table_name (columnspec)
LOCATION (’xdrive://host:port/mountpoint/reader-path’)
FORMAT ’CSV’ [csv options]
     | ’SPQ’;

Xdrive currently supports two formats: CSV and SPQ. SPQ stands for Simple Parquet Format; it is Vitesse Data’s proprietary high performance, column store format.

The reader-path specifies a UNIX file path glob pattern that can match many files. If we had partition our lineitem spq table and store the table fragments under the path:

/data/warehouse/lineitem/YYYY/MM/lineitem_part.spq

where YYYY and MM are substituted with the year and month values respectively, we would specify our LOCATION clause as:

LOCATION ('xdrive://host:port/dw/lineitem/*/*/lineitem*.spq')

Writing Data With Deepgreen External Table

With some effort, you can write to Xdrive using a more elaborate external table construct:

CREATE WRITABLE EXTERNAL TABLE write_table_name (columnspec)
LOCATION (’xdrive://host:port/mountpoint/writer-path’)
FORMAT ’CSV’ [csv options]
     | ’SPQ’;

You may also limit the number of xdrive workers for write by setting the location URL as follow.

xdrive://host:port/mountpoint?nwriter=5/...
e.g. xdrive://127.0.0.1:50051/kafka?nwriter=5/customer

In contrast to the reader-path, the writer-path is a very different animal. While a reader-path may resolve to multiple files on Xdrive, a writer-path should map to only one target file. On any INSERT to the external table, the target file will be created (or truncated if it already exists) and appended to. To facilitate creation of unique file names for the writes, the writer-path may be annotated with a #UUID# substitution pattern. With the #UUID# substitution, every INSERT statement will create a distinct new file.

Continuing with our example above, we would use the following external table to insert lineitems into Jan 2016 partition:

CREATE WRITABLE EXTERNAL TABLE write_lineitem_2016_01 (columnspec)
LOCATION (’xdrive://host:port/dw/lineitem/2016/01/lineitem_#UUID#.spq')
FORMAT ’SPQ’;

Each INSERT will create a new SPQ file. If you recall, our reader-path was xdrive://host:port/dw/lineitem/*/*/lineitem*.spq; the wildcards in the path allow our scan to encompass the new files created by the INSERT.