Deepgreen DB

Reading Data With Deepgreen External Table

You can access the data on Xdrive through the external table mechanism provided in Deepgreen DB. The syntax is essentially the same as Greenplum external table.

CREATE EXTERNAL TABLE table_name (columnspec)
LOCATION (’xdrive://host:port/mountpoint/reader-path’)
FORMAT ’CSV’ [csv options]
     | ’SPQ’;

Xdrive currently supports two formats: CSV and SPQ. SPQ stands for Simple Parquet Format; it is Vitesse Data’s proprietary high performance, column store format.

The reader-path specifies a UNIX file path glob pattern that can match many files. If we had partition our lineitem spq table and store the table fragments under the path:

/data/warehouse/lineitem/YYYY/MM/lineitem_part.spq

where YYYY and MM are substituted with the year and month values respectively, we would specify our LOCATION clause as:

LOCATION ('xdrive://host:port/dw/lineitem/*/*/lineitem*.spq')

Writing Data With Deepgreen External Table

With some effort, you can write to Xdrive using a more elaborate external table construct:

CREATE WRITABLE EXTERNAL TABLE write_table_name (columnspec)
LOCATION (’xdrive://host:port/mountpoint/writer-path’)
FORMAT ’CSV’ [csv options]
     | ’SPQ’;

In contrast to the reader-path, the writer-path is a very different animal. While a reader-path may resolve to multiple files on Xdrive, a writer-path should map to only one target file. On any INSERT to the external table, the target file will be created (or truncated if it already exists) and appended to. To facilitate creation of unique file names for the writes, the writer-path may be annotated with a #UUID# substitution pattern. With the #UUID# substitution, every INSERT statement will create a distinct new file.

Continuing with our example above, we would use the following external table to insert lineitems into Jan 2016 partition:

CREATE WRITABLE EXTERNAL TABLE write_lineitem_2016_01 (columnspec)
LOCATION (’xdrive://host:port/dw/lineitem/2016/01/lineitem_#UUID#.spq')
FORMAT ’SPQ’;

Each INSERT will create a new SPQ file. If you recall, our reader-path was xdrive://host:port/dw/lineitem/*/*/lineitem*.spq; the wildcards in the path allow our scan to encompass the new files created by the INSERT.


Quick Start

We will demonstrate an example to read/write the CSV file from local filesystem via xdrive built-in csv plugin from scratch.

Before you run xdrive, you have to setup the configuration files which is in toml file format.

1. Create the xdrive.toml file with the content below. Setup the standalone local server with port 50051 and also setup the CSV mountpoint with name “mynfs_csv” and file base directory /tmp/xdrive/tests/data.

[xdrive]
dir = "/tmp/xdrive"                     # xdrive base directory
host = [ "localhost" ]            # list of xdrive server hosts
port = 50051


[[xdrive.mount]]
  name = "mynfs_csv"                                                  # name of the plugin
  argv = ["xdr_fs/xdr_fs", "csv", "/tmp/xdrive/tests/data"]      # list of arguments to invoke the plugin

#arg0 is the plugin executable, arg1, arg2, arg3, argN are the command arguments

2. Run xdrctl deploy command to deploy the xdrive server to the list of server hosts specified in the xdrive.toml

% xdrctl deploy xdrive.toml
mkdir destdir ...
scp xdrive to remote ...
scp xdrplugin to remote ...
scp conffile to remote ...

After the command, /tmp/xdrive directory will be created and your xdrive.toml will be copied to /tmp/xdrive/xdrive.toml and /tmp/xdrive/plugin is the plugin base directory

ls /tmp/xdrive/
bin  log  plugin  xdrive.toml

3. Now, you can start the xdrive servers by xdrctl start.

% cd /tmp/xdrive/
% xdrctl start xdrive.toml
% ps -ef | grep xdrive
ericlam   2349     1  0 08:24 ?        00:00:00 /tmp/xdrive/bin/xdrive -p 50051 -D /tmp/xdrive

4. Start the Deepgreen database.

% gpstart

5. Setup the DDL to create the read-only and writable external tables for the CSV files

DROP EXTERNAL TABLE IF EXISTS myxx1; 
CREATE EXTERNAL TABLE myxx1
    (
        i int,
        t text
    )
LOCATION ('xdrive://127.0.0.1:50051/mynfs_csv/x1.csv') 
FORMAT 'CSV';

DROP EXTERNAL TABLE IF EXISTS myxx2; 
CREATE EXTERNAL TABLE myxx2
    (
        i int,
        t text
    )
LOCATION ('xdrive://127.0.0.1:50051/mynfs_csv/x?.csv') 
FORMAT 'CSV';

DROP EXTERNAL TABLE IF EXISTS myxx3; 
CREATE EXTERNAL TABLE myxx3
    (
        i int,
        t text
    )
LOCATION ('xdrive://127.0.0.1:50051/mynfs_csv/x*.csv') 
FORMAT 'CSV';

DROP EXTERNAL TABLE IF EXISTS myxxw; 
CREATE WRITABLE EXTERNAL TABLE myxxw
    (
        i int,
        t text
    )
LOCATION ('xdrive://127.0.0.1:50051/mynfs_csv/x#UUID#.csv') 
FORMAT 'CSV';

6. Create the data file /tmp/xdrive/tests/data/x1.csv

% cat /tmp/xdrive/tests/data/x1.csv
1,1
2,2
3,3
4,4
1,1
2,2
3,3
4,4
1,1
2,2
3,3
4,4

% wc -l x1.csv
12 x1.csv

7. You can now run the psql to read the x1.csv from the local filesystem

% psql
psql wetestdata
psql (8.2.15)
Type "help" for help.

wetestdata=# select * from myxx1;
 i | t 
---+---
 1 | 1
 2 | 2
 3 | 3
 4 | 4
 1 | 1
 2 | 2
 3 | 3
 4 | 4
 1 | 1
 2 | 2
 3 | 3
 4 | 4
(12 rows)

wetestdata=# select * from myxx2;
 i | t 
---+---
 1 | 1
 2 | 2
 3 | 3
 4 | 4
 1 | 1
 2 | 2
 3 | 3
 4 | 4
 1 | 1
 2 | 2
 3 | 3
 4 | 4
(12 rows)

wetestdata=# select * from myxx3;
 i | t 
---+---
 1 | 1
 2 | 2
 3 | 3
 4 | 4
 1 | 1
 2 | 2
 3 | 3
 4 | 4
 1 | 1
 2 | 2
 3 | 3
 4 | 4
(12 rows)

8. You can also write the data to the local filesystem by writing the data to the writable external table myxxw

wetestdata=# insert into myxxw select * from myxx1;
INSERT 0 12

Check the datafiles in /tmp/xdrive/tests/data

% ls -1 x*.csv
x1.csv  
x7907cde3-b048-4440-94bf-d9d31df37d72.csv 
xd7a467fd-614b-4641-876f-ede4116ffd42.csv

% cat x7907cde3-b048-4440-94bf-d9d31df37d72.csv 
2,2
4,4
2,2
4,4
2,2
4,4
% cat xd7a467fd-614b-4641-876f-ede4116ffd42.csv 
1,1
3,3
1,1
3,3
1,1
3,3

9. Fetch the myxx3 table to see how many rows you can get. The number of rows should be 24 now.

wetestdata=# select count(*) from myxx3;
 count 
-------
    24
(1 row)

10. To stop the xdrive, simply run xdrctl stop.

% xdrctl stop /tmp/xdrive/xdrive.toml

11. You can run xdrive and csv plugin successfully. For more plugin settings, please refer to the documentation in the Deepgreen distribution.