License Statement
Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Revision History
Version | Date |
---|---|
2.2.0 |
TBD |
2.1.0 |
May 1, 2017 |
2.0.1 |
July 7, 2016 |
2.0.0 |
June 6, 2016 |
1.3.0 |
January, 2016 |
1. About This Document
This guide describes how to use odb, a multi-threaded, ODBC-based command-line tool, to perform various operations on a Trafodion database.
In the current release of Trafodion, only loading, extracting, and copying data operations are production ready, meaning that that have
been fully tested and are ready to be used in a production environment. Other features are designated as Technology Preview meaning that they have not been fully tested and are not ready for production use. |
1.1. Intended Audience
This guide is intended for database administrators and other users who want to run scripts that operate on a Trafodion database, primarily for parallel data loading.
1.2. New and Changed Information
This manual guide is new.
1.3. Notation Conventions
This list summarizes the notation conventions for syntax presentation in this manual.
-
UPPERCASE LETTERS
Uppercase letters indicate keywords and reserved words. Type these items exactly as shown. Items not enclosed in brackets are required.
SELECT
-
lowercase letters
Lowercase letters, regardless of font, indicate variable items that you supply. Items not enclosed in brackets are required.
file-name
-
[ ] Brackets
Brackets enclose optional syntax items.
DATETIME [start-field TO] end-field
A group of items enclosed in brackets is a list from which you can choose one item or none.
The items in the list can be arranged either vertically, with aligned brackets on each side of the list, or horizontally, enclosed in a pair of brackets and separated by vertical lines.
For example:
DROP SCHEMA schema [CASCADE] DROP SCHEMA schema [ CASCADE | RESTRICT ]
-
{ } Braces
Braces enclose required syntax items.
FROM { grantee [, grantee ] ... }
A group of items enclosed in braces is a list from which you are required to choose one item.
The items in the list can be arranged either vertically, with aligned braces on each side of the list, or horizontally, enclosed in a pair of braces and separated by vertical lines.
For example:
INTERVAL { start-field TO end-field } { single-field } INTERVAL { start-field TO end-field | single-field }
-
| Vertical Line
A vertical line separates alternatives in a horizontal list that is enclosed in brackets or braces.
{expression | NULL}
-
… Ellipsis
An ellipsis immediately following a pair of brackets or braces indicates that you can repeat the enclosed sequence of syntax items any number of times.
ATTRIBUTE[S] attribute [, attribute] ... {, sql-expression } ...
An ellipsis immediately following a single syntax item indicates that you can repeat that syntax item any number of times.
For example:
expression-n ...
-
Punctuation
Parentheses, commas, semicolons, and other symbols not previously described must be typed as shown.
DAY (datetime-expression) @script-file
Quotation marks around a symbol such as a bracket or brace indicate the symbol is a required character that you must type as shown.
For example:
"{" module-name [, module-name] ... "}"
-
Item Spacing
Spaces shown between items are required unless one of the items is a punctuation symbol such as a parenthesis or a comma.
DAY (datetime-expression) DAY(datetime-expression)
If there is no space between two items, spaces are not permitted. In this example, no spaces are permitted between the period and any other items:
myfile.sh
-
Line Spacing
If the syntax of a command is too long to fit on a single line, each continuation line is indented three spaces and is separated from the preceding line by a blank line.
This spacing distinguishes items in a continuation line from items in a vertical list of selections.
match-value [NOT] LIKE _pattern [ESCAPE esc-char-expression]
1.4. Comments Encouraged
We encourage your comments concerning this document. We are committed to providing documentation that meets your needs. Send any errors found, suggestions for improvement, or compliments to user@trafodion.apache.org.
Include the document title and any comment, error found, or suggestion for improvement you have concerning this document.
2. Introduction
2.1. What is odb
odb is a platform independent, multi-threaded, ODBC command-line tool you can use as a:
-
Parallel data loader/extractor
-
Query driver (Technology Preview)
-
SQL interpreter (Technology Preview)
odb is written in ANSI C. Currently, odb is available only in a 64-bit version for the Linux platform, linked to the unixODBC driver manager.
odb executables use the following naming convention, odbAABCC
, where:
-
AA
can be 64 (bit) (32 bit is not currently supported). -
B
identifies the platform/compiler: -
l
= Linux/gcc -
w
= Windows/MS Visual Studio (not yet tested) -
CC
identifies the ODBC Driver Manager to which odb was linked: -
uo
= unixODBC Driver Manager -
ms
= Microsoft ODBC Driver Manager (not yet tested)
So, for example:
-
odb64luo
is the 64-bit executable for Linux linked with the unixODBC Driver Manager.
This document contains examples run with the odb64luo
executable.
3. Installation and Configuration
See the Trafodion Client Installation Guide for install instructions.
Refer to the unixODBC documentation for additional
information for unixODBC
.
4. Basic Concepts
4.1. Get Help
The following command shows the odb help:
~/Devel/odb $ ./odb64luo -h
odb version 1.3.0
Build: linux, amd64, gcc generic m64, uodbc, mreadline, dynamic gzip, dynamic libhdfs, dynamic libxml2 [Mar 30 2015 00:29:25]
-h: print this help
-version: print odb version and exit
-lsdrv: list available drivers @ Driver Manager level
-lsdsn: list available Data Sources
Connection related options. You can connect using either:
-u User: (default $ODB_USER variable)
-p Password: (default $ODB_PWD variable)
-d Data_Source_Name: (default $ODB_DSN variable)
-ca Connection_Attributes (normally used instead of -d DSN)
-U sets SQL_TXN_READ_UNCOMMITTED isolation level
-ndsn [+]<number>: adds 1 to <number> to DSN
-nps <nbytes>[:<nbytes>]: specify source[:target] network packet size
SQL interpreter options:
-I [$ODB_INI SECTION]: interactive mode shell
-noconnect: do not connect on startup General options:
-q [cmd|res|all|off]: do not print commands/results/both
-i [TYPE[MULT,WIDE_MULT]:CATALOG.SCHEMA[.TABLE]]: lists following object types:
(t)ables, (v)iews, s(y)nonyns, (s)chemas, (c)atalogs, syst(e)m tables
(l)ocal temp, (g)lobal temp, (m)at views, (M)mat view groups, (a)lias
(A)ll object types, (T)table desc, (D)table DDL, (U) table DDL with multipliers
-r #rowset: rowset to be used insert/selects (default 100)
-soe: Stop On Error (script execution/loading task)
-N : Null run. Doesn't SQLExecute statements
-v : be verbose
-vv : Print execution table
-noschema : do not use schemas: CAT.OBJ instead of CAT.SCH.OBJ
-nocatalog : do not use catalogs: SCH.OBJ instead of CAT.SCH.OBJ
-nocatnull : like -nocatalog but uses NULL instead of empty CAT strings
-ucs2toutf8 : set UCS-2 to UTF-8 conversion in odb
-var var_name var_value: set user defined variables
-ksep char/code: Thousands Separator Character (default ',')
-dsep char/code: Decimal Separator Character (default '.')
SQL execution options [connection required]:
-x [#inst:]'command': runs #inst (default 1) command instances
-f [#inst:]'script': runs #inst (default 1) script instances
-P script_path_regexp: runs in parallel scripts_path_regexp if script_path_regexp ends with / all files in that dir
-S script_path_regexp: runs serially scripts_path_regexp if script_path_regexp ends with / all files in that dir
-L #loops: runs everything #loops times
-T max_threads: max number of execution threads
-dlb: use Dynamic Load Balancing
-timeout #seconds: stops everything after #seconds (no Win32)
-delay #ms: delay (ms) before starting next thread
-ldelay #ms: delay (ms) before starting next loop in a thread
-ttime #ms[:ms]: delay (ms) before starting next command in a thread random delay if a [min:max] range is specified
-F #records: max rows to fetch
-c : output in csv format
-b : print start time in the headers when CSV output
-pcn: Print Column Names
-plm: Print Line Mode
-fs char/code: Field Sep <char> ASCII_dec> 0<ASCII_OCT> X<ASCII_HEX>
-rs char/code: Rec Sep <char> ASCII_dec> 0<ASCII_OCT> X<ASCII_HEX>
-sq char/code: String Qualifier (default none)
-ec char/code: Escape Character (default '\')
-ns nullstring: print nullstring when a field is NULL
-trim: Trim leading/trailing white spaces from txt cols
-drs: describe result set (#cols, data types...) for each Q)
-hint: do not remove C style comments (treat them as hints)
-casesens: set case sensitive DB
-Z : shuffle the execution table randomizing Qs start order
Data loading options [connection required]:
-l src=[-]file:tgt=table[:map=mapfile][:fs=fieldsep][:rs=recsep][:soe]
[:skip=linestoskip][:ns=nullstring][:ec=eschar][:sq=stringqualifier]
[:pc=padchar][:em=embedchar][:errmax=#max_err][:commit=auto|end|#rows|x#rs]
[:rows=#rowset][:norb][:full][:max=#max_rec][:truncate][:show][:bpc=#][:bpwc=#]
[:nomark][:parallel=number][:iobuff=#size][:buffsz=#size]][:fieldtrunc={0-4}]
[:pre={@sqlfile}|{[sqlcmd]}][:post={@sqlfile}|{[sqlcmd]}][:ifempty]
[:direct][:bad=[+]badfile][:tpar=#tables][:maxlen=#bytes][:time][:loadcmd=IN|UP|UL]
[:xmltag=[+]element][:xmlord][:xmldump]
Defaults/notes:
* src file: local file or {hdfs,mapr}[@host,port[,huser]].<HDFS_PATH>
* fs: default ','. Also <ASCII_dec> 0<ASCII_OCT> X<ASCII_HEX>
* rs: default '\n'. Also <ASCII_dec> 0<ASCII_OCT> X<ASCII_HEX>
* ec: default '\'. Also <ASCII_dec> 0<ASCII_OCT> X<ASCII_HEX>
* pc: no default. Also <ASCII_dec> 0<ASCII_OCT> X<ASCII_HEX>
* direct: only for Vertica databases
* bpc: default 1,bpwc: default 4
* loadcmd: default IN. only for {project-name} databases
Data extraction options [connection required]:
-e {src={table|-file}|sql=<custom sql>}:tgt=[+]file[:pwhere=where_cond]
[:fs=fieldsep][:rs=recsep][:sq=stringqualifier][:ec=escape_char][:soe]
[:ns=nullstring][es=emptystring][:rows=#rowset][:nomark][:binary][:bpc=#][:bpwc=#]
[:max=#max_rec][:[r]trim[+]][:cast][:multi][parallel=number][:gzip[=lev]]
[:splitby=column][:uncommitted][:iobuff=#size][hblock=#size][:ucs2toutf8]
[:pre={@sqlfile}|{[sqlcmd]}[:mpre={@sqlfile}|{[sqlcmd]}[:post={@sqlfile}|{[sqlcmd]}]
[tpar=#tables][:time][:cols=[-]columns]][:maxlen=#bytes][:xml]
Defaults/notes:
* tgt file: local file or {hdfs,mapr}.[@host,port[,huser]].<HDFS_PATH>
* fs: default ','. Also <ASCII_dec> 0<ASCII_OCT> X<ASCII_HEX>
* rs: default '\n'. Also <ASCII_dec> 0<ASCII_OCT> X<ASCII_HEX>
* ec: default '\'. Also <ASCII_dec> 0<ASCII_OCT> X<ASCII_HEX>
* sq: no default. Also <ASCII_dec> 0<ASCII_OCT> X<ASCII_HEX>
* gzip compression level between 0 and 9
* bpc: default 1,bpwc: default 4
Data copy options [connection required]:
-cp src={table|-file:tgt=schema[.table][pwhere=where_cond][:soe][:roe=#][:roedel=#ms]
[:truncate][:rows=#rowset][:nomark][:max=#max_rec][:bpc=#][:bpwc=#][:[r]trim[+]]
[:parallel=number][errmax=#max_err][:commit=auto|end|#rows|x#rs][:time] ][:cast]
[:direct][:uncommitted][:norb][:splitby=column][:pre={@sqlfile}|{[sqlcmd]}]
[:post={@sqlfile}|{[sqlcmd]}][:mpre={@sqlfile}|{[sqlcmd]}][:ifempty]
[:loaders=#loaders][:tpar=#tables][:cols=[-]columns][:errdmp=file] ][:loadcmd=IN|UP|UL]
[sql={[sqlcmd]|@sqlfile|-file}[:bind=auto|char|cdef][:seq=field#[,start]]
[tmpre={@sqlfile}|{[sqlcmd]}][:ucs2toutf8=[skip,force,cpucs2,qmark]]
Defaults/notes:
* loaders: default 2 load threads for each 'extractor'
* direct: only work if target database is Vertica
* ucs2toutf8: default is 'skip'
* roe: default 3 if no arguments
* bpc: default 1,bpwc: default 4
* loadcmd: default IN. only for {project-name} databases
Data pipe options [connection required]:
-pipe sql={[sqlcmd]|@sqlscript|-file}:tgtsql={@sqlfile|[sqlcmd]}[:soe]
[:rows=#rowset][:nomark][:max=#max_rec][:bpc=#][:bpwc=#][:errdmp=file]
[:parallel=number][errmax=#max_err][:commit=auto|end|#rows|x#rs][:time]
[:pre={@sqlfile}|{[sqlcmd]}][:post={@sqlfile}|{[sqlcmd]}]
[:mpre={@sqlfile}|{[sqlcmd]}][:tmpre={@sqlfile}|{[sqlcmd]}]
[:loaders=#loaders][:tpar=#tables][:bind=auto|char|cdef]
Defaults/notes:
* loaders: default 1 load threads for each extraction thread
* bpc: default 1,bpwc: default 4
Table diff options [connection required]:
-diff src={table|-file}:tgt=table:[key=columns][:output=[+]file][:pwhere=where_cond]
[:pwhere=where_cond][:nomark][:rows=#rowset][:odad][:fs=fieldsep][:time][trim[+]]
[:rs=recsep][:quick][:splitby=column][:parallel=number][:max=#max_rec]
[:print=[I][D][C]][:ns=nullstring][:es=emptystring][:bpc=#][:bpwc=#][:uncommitted]
[:pre={@sqlfile}|{[sqlcmd]}][:post={@sqlfile}|{[sqlcmd]}][tpar=#tables]
Defaults/notes:
* bpc: default 1,bpwc: default 4
* print: default is Inserted Deleted Changed
4.2. Connect to Database
odb uses standard ODBC APIs to connect to a database.
Normally you have to provide the following information: user, password and ODBC data source.
Example
$ ./odb64luo –u user –p password –d dsn ...
You can provide Driver-specific connection attributes using th -ca
command line option.
Command-line passwords are protected against ps -ef sniffing attacks under *nix. You can safely pass your
password via –p . An alternative approach is to use environment variables or the odb password prompt (see below).
|
odb will use the following environment variables (if defined):
Variable | Meaning | Corresponding Command-Line Option |
---|---|---|
|
User name to use for database connections |
|
|
Password for database connections |
|
|
DSN for database connection |
|
|
Init file for interactive shell |
|
|
history file name to save command history on exit |
Command-line options take precedence over environment variables. |
4.3. List ODBC Drivers and Data Sources
You can list available drivers with -lsdrv
:
~/Devel/odb $ ./odb64luo -lsdrv
Trafodion - Description=Trafodion ODBC Stand Alone Driver
...
You can list locally configured data sources with -lsdsn
:
~/Devel/odb $ ./odb64luo -lsdsn
traf - Trafodion
VMFELICI – Vertica
...
4.4. Obtain Database Information
The -i
option allows you to get information about the database you’re connecting to as well as the ODBC driver.
It’s a simple way to check your credentials and database connection.
Example
~/mauro/odb $ ./odb64luo -u xxx -p xxx -d traf -i
odb [2015-04-20 21:20:47]: starting ODBC connection(s)... 0
[odb version 1.3.0]
Build: linux, amd64, gcc generic m64, uodbc, mreadline, dynamic gzip, dynamic libhdfs, dynamic libxml2 [Apr 8 2015 16:47:49]
DBMS product name (SQL_DBMS_NAME) : Trafodion
DBMS product version (SQL_DBMS_VER) : 01.03.0000
Database name (SQL_DATABASE_NAME) : TRAFODION
Server name (SQL_SERVER_NAME) : --name--
Data source name (SQL_DATA_SOURCE_NAME) : traf
Data source RO (SQL_DATA_SOURCE_READ_ONLY) : N
ODBC Driver name (SQL_DRIVER_NAME) : libhpodbc64.so
ODBC Driver version (SQL_DRIVER_VER) : 03.00.0000
ODBC Driver level (SQL_DRIVER_ODBC_VER) : 03.51
ODBC Driver Manager version (SQL_DM_VER) : 03.52.0002.0002
ODBC Driver Manager level (SQL_ODBC_VER) : 03.52
Connection Packet Size (SQL_ATTR_PACKET_SIZE): 0
odb [2015-04-20 21:20:48]: exiting. Session Elapsed time 0.229 seconds (00:00:00.229)
4.5. List Database Objects
The previous section used the -i
option without any argument.
This option accepts arguments with the following syntax:
[TYPE:][CATALOG.SCHEMA][.OBJECT]
where type can be:
Type | Meaning |
---|---|
|
All database object types |
|
All database object types |
|
Tables |
|
Views |
|
Aliases |
|
Synonyms |
|
Local Temporary |
|
Global Temporary |
|
Materialized views |
|
Materialized view groups |
|
Schemas |
|
Catalogs |
|
Table descriptions |
|
Table DDL |
|
Table DDL multiplying wide columns by Y and non-wide columns by X |
Example | Action |
---|---|
|
List all catalogs. |
|
List all schemas. |
|
List all objects in |
|
List all tables in |
|
List all tables in |
|
List all views in |
|
List all views in |
|
Describe all tables starting with |
Extended Examples
~/mauro/odb $ ./odb64luo -u MFELICI -p xxx -d MFELICI -i T:TRAFODION.MAURIZIO.T%
odb [2011-12-07 14:43:51]: starting (1) ODBC connection(s)... 1
Describing: TRAFODION.MAURIZIO.T1
+------+--------------+----+-------+-------+
|COLUMN|TYPE |NULL|DEFAULT|INDEX |
+------+--------------+----+-------+-------+
|ID |INTEGER SIGNED|YES | | |
|NAME |CHAR(10) |YES | | |
|LASTN |VARCHAR(20) |YES | | |
+------+--------------+----+-------+-------+
Describing: TRAFODION.MAURIZIO.T11
+------+--------------+----+-------+-------+
|COLUMN|TYPE |NULL|DEFAULT|INDEX |
+------+--------------+----+-------+-------+
|ID |INTEGER SIGNED|NO | |T11 1 U|
|NAME |CHAR(10) |YES | | |
+------+--------------+----+-------+-------+
The INDEX
column (when using type T
) contains the following information:
-
name
of theINDEX
(in Trafodion indexes having the same name as the table are Primary Keys). -
ordinal number
to identify the order of that field in the index. -
(U)nique o (M)ultiple
values allowed. -
(+)
means that more than one index includes that field.
4.6. Perform Actions on Multiple Database Objects
odb uses extended SQL syntax to execute actions on multiple objects: &<type>:<path>
- where <type>
is one
of the object types listed in the previous section.
Example
Example | Action |
---|---|
|
Purge ALL tables (t:) staring with |
|
Drop ALL views (v:) ending with |
|
Update Stats for ALL tables in |
You can use this extended SQL syntax in the SQL Interpreter or generic SQL scripts.
4.7. Run Commands and Scripts
The –x
switch can be used to run generic SQL commands. You can also use –f
to run SQL scripts:
-
-x "SQL command"
to run a specific SQL command. -
-f <script>
to run a script file.
Example
~/Devel/odb $ ./odb64luo -x "select count(*) from customer"
150000
[0.0.0]--- 1 row(s) selected in 0.137s (prep 0.000s, exec 0.137s, 1st fetch 0.000s,
fetch 0.000s)
The meaning of [0.0.0]
will be explained later.
~/Devel/odb $ cat script.sql
SELECT COUNT(*) FROM T1;
-- This is a comment
SELECT
L_RETURNFLAG
, L_LINESTATUS
, SUM(L_QUANTITY) AS SUM_QTY
, SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE
, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS SUM_DISC_PRICE
, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)) AS SUM_CHARGE
, AVG(L_QUANTITY) AS AVG_QTY
, AVG(L_EXTENDEDPRICE) AS AVG_PRICE
, AVG(L_DISCOUNT) AS AVG_DISC
, COUNT(*) AS COUNT_ORDER
FROM
LINEITEM
WHERE
L_SHIPDATE <= DATE '1998-12-01' - INTERVAL '90' DAY
GROUP BY
L_RETURNFLAG, L_LINESTATUS
ORDER BY
L_RETURNFLAG, L_LINESTATUS
;
~/Devel/odb $ ./odb64luo -f script.sql
[0.0.0]Executing: 'SELECT COUNT(*) FROM T1;'
5
[0.0.0]--- 1 row(s) selected in 0.015s (prep 0.000s, exec 0.015s, 1st fetch -0.000s,
fetch -0.000s)
[0.0.1]Executing: 'SELECT L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY) AS SUM_QTY,
SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS
SUM_DISC_PRICE, SUM(L_EXTENDEDPRICE*(1- L_DISCOUNT)*(1+L_TAX)) AS SUM_CHARGE,
AVG(L_QUANTITY) AS AVG_QTY, AVG(L_EXTENDEDPRICE) AS AVG_PRICE, AVG(L_DISCOUNT) AS
AVG_DISC, COUNT(*) AS COUNT_ORDER FROM LINEITEM WHERE L_SHIPDATE <= DATE '1998-12-01'
- INTERVAL '90' DAY GROUP BY L_RETURNFLAG, L_LINESTATUS ORDER BY L_RETURNFLAG,
L_LINESTATUS;'
A,F,37734107.00,56586554400.73,53758257134.8700,55909065222.827692,25.522006,
38273.129735,0.049985,1478493
...
R,F,37719753.00,56568041380.90,53741292684.6040,55889619119.831932,25.505794,
38250.854626,0.050009,1478870
[0.0.1]--- 4 row(s) selected in 21.344s (prep 0.000s, exec 21.344s, 1st fetch 0.000s, fetch 0.000s)
You can use the -q
switch to omit selected output components.
Example
–q
cmd will not print the commands being executed:
~/Devel/odb $ ./odb64luo -f script.sql –q cmd
5
[0.0.0]--- 1 row(s) selected in 0.015s (prep 0.000s, exec 0.015s, 1st fetch -0.000s,
fetch -0.000s)
A,F,37734107.00,56586554400.73,53758257134.8700,55909065222.827692,25.522006,
38273.129735,0.049985,1478493
...
R,F,37719753.00,56568041380.90,53741292684.6040,55889619119.831932,25.505794,
38250.854626,0.050009,1478870
[0.0.1]--- 4 row(s) selected in 21.344s (prep 0.000s, exec 21.344s, 1st fetch 0.000s,
fetch 0.000s)
While -q
res will nit print the results:
~/Devel/odb $ ./odb64luo -f script.sql –q res
[0.0.0]Executing: 'SELECT COUNT(*) FROM T1;'
[0.0.0]--- 1 row(s) selected in 0.015s (prep 0.000s, exec 0.015s, 1st fetch -0.000s,
fetch -0.000s)
[0.0.1]Executing: 'SELECT L_RETURNFLAG,L_LINESTATUS, SUM(L_QUANTITY) AS
SUM_QTY, SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT))
AS SUM_DISC_PRICE, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX)) AS SUM_CHARGE,
AVG(L_QUANTITY) AS AVG_QTY, AVG(L_EXTENDEDPRICE) AS AVG_PRICE, AVG(L_DISCOUNT) AS
AVG_DISC, COUNT(*) AS COUNT_ORDER FROM LINEITEM WHERE L_SHIPDATE <= DATE '1998-12-01'
- INTERVAL '90' DAY GROUP BY L_RETURNFLAG, L_LINESTATUS ORDER BY L_RETURNFLAG,
L_LINESTATUS;'
[0.0.1]--- 4 row(s) selected in 21.344s (prep 0.000s, exec 21.344s, 1st fetch 0.000s,
fetch 0.000s)
-q all
(or just -q
) will not print neither the commands nor the results:
~/Devel/odb $ ./odb64luo -f script.sql -q all
[0.0.0]--- 1 row(s) selected in 0.015s (prep 0.000s, exec 0.015s, 1st fetch -0.000s,
fetch -0.000s)
[0.0.1]--- 4 row(s) selected in 21.344s (prep 0.000s, exec 21.344s, 1st fetch 0.000s,
fetch 0.000s)
This is often used with odb as query driver.
Even when odb doesn’t print query results (-q res ), the result set will be fetched and data is
transferred from the database server to the client. In other words, -q res is somehow similar (but
not exactly equivalent) to a /dev/null output redirection.
|
A special file name you can use with -f
is -
(dash).
It means: read the script to be executed from the standard input.
Example
The following command will copy table definitions from one system to another recreating, on the target system, the same table structures as in the source system:
$ odb64luo –u u1 -p p1 -d SRC -i t:TRAFODION.CIV04 -x "SHOWDDL &1" \
| odb64luo –u u2 –p p2 -d TGT -f -
4.8. Shell Script "here document" Syntax
Commonly, there’s a need to embed SQL commands in shell scripts.
Use the -f -
(read commands from standard input) odb syntax.
Example
odb64luo -f - <<-EOF 2>&1 | tee -a $\{LOG}
drop table &t:TRAFODION.maurizio.ml%;
create table ml2
(
id integer
, fname char(10)
, bdate date
, lname char(10) default 'Felici'
, comment char(20)
, city char(10)
) no partitions;
EOF
4.9. Parallelize Multiple Commands and Scripts
odb uses threads to run multiple commands in parallel. Each command (-x
) or
script (-f
) will be executed, independently from the others, using a different thread.
Example
Running scripts in parallel.
~/Devel/odb $ ./odb64luo -x "select count(*) from types" -f script1.sql
Uses two independent threads executed in parallel. The first thread will run select count(*) from types
and the other script1.sql
.
You can also run multiple copies of the same command by adding <num>:
before
-x
or -f
arguments.
The following command runs the instances of select count(*) from types `, five instances
of `script1.sql
and three instances of script2.sql
in parallel using 3 + 5 + 3 = 11
threads in total:
Example
Running eleven commands and scripts in parallel
~/Devel/odb $ ./odb64luo -x 3:"select count(*) from types" -f 5:script1.sql \
-f 3:script2.sql -q
[1.0.0]--- 1 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, fetch 0.000s/0.000s)
[0.0.0]--- 1 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, fetch 0.000s/0.000s)
[2.0.0]--- 1 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, fetch 0.000s/0.000s)
[4.0.0]--- 1 row(s) selected in 0.001s (prep 0.000s, exec 0.001s, fetch 0.000s/0.000s)
[6.0.0]--- 1 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, fetch 0.000s/0.000s)
[5.0.0]--- 1 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, fetch 0.000s/0.000s)
[3.0.0]--- 1 row(s) selected in 0.001s (prep 0.000s, exec 0.001s, fetch 0.000s/0.000s)
[8.0.0]--- 1 row(s) selected in 0.001s (prep 0.000s, exec 0.001s, fetch 0.000s/0.000s)
[7.0.0]--- 1 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, fetch 0.000s/0.000s)
[9.0.0]--- 1 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, fetch 0.000s/0.000s)
[10.0.0]--- 1 row(s) selected in 0.001s prep 0.000s, exec 0.001s, fetch 0.000s/0.000s)
The first number in [1.0.0]
is the thread ID. Thread IDs are assigned by odb starting from zero.
You can limit the maximum number of threads with -T
option.
Example
The following command runs the same 11 commands/scripts limiting the number of threads (and ODBC connections) to 4:
~/Devel/odb $ ./odb64luo -x 3:"select count(*) from types" -f 5:script1.sql \
-f 3:script2.sql -q -T 4
[1.0.0]--- 1 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, fetch 0.000s/0.000s)
[0.0.0]--- 1 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, fetch 0.000s/0.000s)
[2.0.0]--- 1 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, fetch 0.000s/0.000s)
[1.3.0]--- 1 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, fetch 0.000s/0.000s)
[2.1.0]--- 1 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, fetch 0.000s/0.000s)
[0.1.0]--- 1 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, fetch 0.000s/0.000s)
[3.0.0]--- 1 row(s) selected in 0.001s (prep 0.000s, exec 0.001s, fetch 0.000s/0.000s)
[2.2.0]--- 1 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, fetch 0.000s/0.000s)
[3.1.0]--- 1 row(s) selected in 0.001s (prep 0.000s, exec 0.001s, fetch 0.000s/0.000s)
[0.2.0]--- 1 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, fetch 0.000s/0.000s)
[1.2.0]--- 1 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, fetch 0.000s/0.000s)
The thread ID is now in the 0-3 range because the 11 executions were queued into four threads.
odb offers several alternatives to queue M
executions in N(<M)
threads. See below.
4.10. Limit Number of odb Threads
By default, odb creates as many threads as the numbers of executions.
The command in the following example creates 1 + 3 + 3 = 7
threads. Each thread will start its own ODBC connection.
Example
~/src/C/odb $ ./odb64luo -f script1.sql -f 3:script2.sql -x 3:"<mysqlcmd>"
You can limit the max number of threads using –T
.
Example
~/src/C/odb $ ./odb64luo -f script1.sql -f 3:script2.sql -x 3:"<mysqlcmd>" -T 2
This command creates just two threads to execute the seven commands/scripts. odb will never create more threads than needed:
Example
~/Devel/odb $ ./odb64luo -f 2:script1.sql -f 3:script2.sql -T 8 -c -q odb [main(1017)] - Warning: won’t be created more thread (8) than needed (5).
4.11. Change Executions Distributed Across Threads
By default, executions are distributed in round-robin across threads.
Example
~/src/C/odb $ ./odb64luo -f script1.sql -f 3:script2.sql -x 3:"<mysqlcmd>" -T 3
Using the command above, the execution queue will be as follows:
Thread 1 | Thread 2 | Thread3 | |
---|---|---|---|
Third Execution |
|
||
Second Execution |
|
|
|
First Execution |
|
|
|
This (standard) behavior can be modified using the following options:
-
–Z
(shuffle): This option randomizes the execution order. -
factor sign with
–P
option: See Run All Scripts With a Given Path. -
-dlb
(Dynamic Load Balancing): See Dynamic Load Balancing.
4.12. Dynamic Load Balancing
As discussed in the previous section, executions are normally pre-assigned to threads using a simple round-robin algorithm. This way, the total elapsed time for each thread depends on the complexity of its own executions.
Example
Suppose you have two threads and two executions per thread:
Thread 1 | Thread 2 | |
---|---|---|
Second Execution |
|
|
First Execution |
|
|
If thread 2.1 and 2.2 require a very short time to be executed you can have a situation where Thread2 has nothing to do (it will be terminated) while Thread1 is still busy with its own Script1.3 and Script1.2.
In some cases, for example during data extractions (see Load Binary Files), you might want to keep all
threads busy at any given time. In these cases you can use Dynamic Load Balancing (-dlb
). With Dynamic
Load Balancing jobs are not pre-assigned to threads when odb starts; each thread will pick the
next job to run from the job list at run-time.
4.13. Use Variables in odb Scripts
odb let you to use two kinds of variables:
-
Internal Variables defined through the
set param
command and identified by the ampersand character; -
Environment variables defined at operating system level and identified by a dollar sign;
You can mix internal and environment variables in your scripts. If a variable is not expanded to a valid Internal/Environment variable the text will remain unchanged.
~/Devel/odb $ cat scr.sql set param region1 ASIA
-- region1 is defined as an internal odb parameter
select * from tpch.region where r_name = '®ion1';
-- region2 is defined as an environment variable
select * from tpch.region where r_name = '$region2';
-- you can mix internal and environment variables
select * from tpch.region where r_name = '$region2' or r_name = '®ion1';
-- region3 variable does not exists so it won’t be not expanded
select * from tpch.region where r_name = '®ion3';
After you define region2
at operating system level:
~/Devel/odb $ export region2=AMERICA
Output:
~/Devel/odb $ ./odb64luo -u mauro -p xx -d pglocal -f scr.sql
odb [2011-12-12 08:01:31]: starting (1) ODBC connection(s)... 1 [0.0.0]Executing:
'select * from tpch.region where r_name = 'ASIA';' 2,ASIA,ges. thinly even pinto beans ca
[0.0.0]--- 1 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, 1st fetch 0.000s, fetch 0.000s)
[0.0.1]Executing: 'select * from tpch.region where r_name = 'AMERICA';' 1,AMERICA,hs use ironic, even requests. s
[0.0.1]--- 1 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, 1st fetch 0.000s, fetch 0.000s)
[0.0.2]Executing: 'select * from tpch.region where r_name = 'AMERICA' or r_name = 'ASIA';' 1,AMERICA,hs use ironic,
even requests.s2,ASIA,ges. thinly even pinto beans ca
[0.0.2]--- 2 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, 1st fetch 0.000s, fetch 0.000s)
[0.0.3]Executing: 'select * from tpch.region where r_name = '®ion3';'
[0.0.3]--- 0 row(s) selected in 0.000s (prep 0.000s, exec 0.000s, 1st fetch 0.000s, fetch 0.000s)
4.14. Thread ID, Thread Execution#, and Script Command#
Consider a script containing two commands:
~/odb $ cat script.sql
SELECT COUNT(*) FROM ORDERS;
SELECT COUNT(*) FROM SUPPLIER;
Running this script ten times using two threads yields:
~/odb $ ./odb64luo -f 10:script.sql -q -T 2
[0.0.0]--- 1 row(s) selected in 0.102s (prep 0.022s, exec 0.044s, 1st fetch 0.037s, fetch 0.037s)
[1.0.0]--- 1 row(s) selected in 0.125s (prep 0.022s, exec 0.068s, 1st fetch 0.036s, fetch 0.036s)
[0.0.1]--- 1 row(s) selected in 0.520s (prep 0.022s, exec 0.048s, 1st fetch 0.450s, fetch 0.450s)
[1.0.1]--- 1 row(s) selected in 0.564s (prep 0.017s, exec 0.480s, 1st fetch 0.067s, fetch 0.067s)
[0.1.0]--- 1 row(s) selected in 0.131s (prep 0.022s, exec 0.060s, 1st fetch 0.048s, fetch 0.048s)
[0.1.3]--- 1 row(s) selected in 0.086s (prep 0.022s, exec 0.057s, 1st fetch 0.007s, fetch 0.007s)
[1.3.0]--- 1 row(s) selected in 0.136s (prep 0.035s, exec 0.058s, 1st fetch 0.042s, fetch 0.042s)
[0.2.0]--- 1 row(s) selected in 0.123s (prep 0.029s, exec 0.068s, 1st fetch 0.026s, fetch 0.026s)
[1.3.1]--- 1 row(s) selected in 0.119s (prep 0.016s, exec 0.082s, 1st fetch 0.021s, fetch 0.021s)
[0.2.1]--- 1 row(s) selected in 0.089s (prep 0.031s, exec 0.054s, 1st fetch 0.004s, fetch 0.004s)
[1.2.0]--- 1 row(s) selected in 0.138s (prep 0.023s, exec 0.041s, 1st fetch 0.073s, fetch 0.073s)
[0.3.0]--- 1 row(s) selected in 0.144s (prep 0.038s, exec 0.045s, 1st fetch 0.061s, fetch 0.061s)
[1.2.1]--- 1 row(s) selected in 0.127s (prep 0.016s, exec 0.041s, 1st fetch 0.070s, fetch 0.070s)
[0.3.1]--- 1 row(s) selected in 0.136s (prep 0.033s, exec 0.056s, 1st fetch 0.048s, fetch 0.048s)
[1.3.0]--- 1 row(s) selected in 0.131s (prep 0.023s, exec 0.037s, 1st fetch 0.071s, fetch 0.071s)
[0.4.0]--- 1 row(s) selected in 0.111s (prep 0.033s, exec 0.045s, 1st fetch 0.033s, fetch 0.033s)
[0.4.1]--- 1 row(s) selected in 0.076s (prep 0.033s, exec 0.037s, 1st fetch 0.005s, fetch 0.006s)
[1.3.1]--- 1 row(s) selected in 0.098s (prep 0.016s, exec 0.065s, 1st fetch 0.017s, fetch 0.017s)
[1.4.0]--- 1 row(s) selected in 0.133s (prep 0.023s, exec 0.074s, 1st fetch 0.035s, fetch 0.035s)
[1.4.1]--- 1 row(s) selected in 0.098s (prep 0.017s, exec 0.064s, 1st fetch 0.016s, fetch 0.016s)
The numbers between square brackets have the following meaning:
-
The first digit is the thread ID. The example above has two threads; the ID is either 0 or 1.
-
The second digit is the execution# for a given thread. The example above has ten script executions for two threads, each thread will have to execute this script five times.; execution#, is between 0 and 4.
-
The third (last) digit is the command# in a given script. The script in the example above contains two commands; this value is 0 or 1.
Example
[0.3.1]
means that the first thread (thread id=0) was executing its fourth job
(thread execution#=3) and — more specifically &8212; the second command in that script
(script command=1).
4.15. Validate SQL Scripts
You can check commands and SQL scripts with odb using the -N
(null run) flag. This will just prepare
(compile) the commands without executing them and fetching the results.
4.16. Different Data Sources for Different Threads
Normally all ODBC connections started by odb will use the same Data Source. However, there could be
special cases where you want to use different DSN for different threads. In these cases you can use the
–ndsn <number>
option. This will append to the Data Source name specified via –d
a suffix from 1
to -ndsn
argument.
Example
$ ./odb64luo ... –d MYDSN –ndsn 4
It will use the following (round-robin) DSN/thread association: MYDSN1
for the first thread, MYDSN2
for the
second thread and so on. The fifth thread (if any) will use MYDSN1
again. You can use a sequential
DSN/thread association by using a +
sign in front of the –ndsn
argument.
Example
If you have 16 threads and –d MYDSN
:
Thread ID | DSN with –ndsn 8 |
DSN with –ndsn +8 |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This technique has been used to maximize extraction throughput from a multi-segment Trafodion system. Each (local) Data Source was linked to a corresponding remote Data Source extracting its own data through its own network interface card.
4.17. Format Query Results
Normally odb prints query results using a very basic CSV format.
Example
$ ./odb64luo -x "select s_suppkey, s_name, s_phone from tpch.supplier limit 5
1,Supplier#000000001,27-918-335-1736
2,Supplier#000000002,15-679-861-2259
3,Supplier#000000003,11-383-516-1199
4,Supplier#000000004,25-843-787-7479
5,Supplier#000000005,21-151-690-3663
Adding the option -pad
you generates the output in table format:
$ ./odb64luo -x "select s_suppkey, s_name, s_phone from tpch.supplier limit 5" -pad
s_suppkey |s_name |s_phone
---------------+-------------------------+---------------
1 |Supplier#000000001 |27-918-335-1736
2 |Supplier#000000002 |15-679-861-2259
3 |Supplier#000000003 |11-383-516-1199
4 |Supplier#000000004 |25-843-787-7479
5 |Supplier#000000005 |21-151-690-3663
4.18. Extract Table DDL
You can extract DDL from one or several tables using either the -i D…
or
-i U…
option.
Example
$ ./odb64luo -u xxx -p xxx -d traf -i D:TRAFODION.SEABASE.REGIONS
odb [2015-04-20 21:25:35]: starting ODBC connection(s)... 0
Connected to Trafodion
CREATE TABLE TRAFODION.SEABASE."REGIONS" ( REGION_ID INTEGER NOT NULL
, REGION_NAME VARCHAR(25)
);
The &
wild card allows you to extract the DDL for multiple objects.
Example
The following command will extract the DDL for all tables in schema tpch
starting with P
:
$ ./odb64luo -u xxx -p xxx -d traf -i D:TRAFODION.TPCH.P%
odb [2015-04-20 21:33:43]: starting ODBC connection(s)... 0
Connected to Trafodion
CREATE TABLE TRAFODION.TPCH."PART" ( P_PARTKEY BIGINT NOT NULL
,P_NAME VARCHAR(55) NOT NULL
,P_MFGR CHAR(25) NOT NULL
,P_BRAND CHAR(10) NOT NULL
,P_TYPE VARCHAR(25) NOT NULL
,P_SIZE INTEGER NOT NULL
,P_CONTAINER CHAR(10) NOT NULL
,P_RETAILPRICE BIGINT NOT NULL
,P_COMMENT VARCHAR(23) NOT NULL
,PRIMARY KEY (P_PARTKEY)
);
CREATE TABLE TRAFODION.TPCH."PARTSUPP" ( PS_PARTKEY BIGINT NOT NULL
,PS_SUPPKEY BIGINT NOT NULL
,PS_AVAILQTY INTEGER NOT NULL
,PS_SUPPLYCOST BIGINT NOT NULL
,PS_COMMENT VARCHAR(199) NOT NULL
,PRIMARY KEY (PS_PARTKEY,PS_SUPPKEY)
);
odb [2015-04-20 21:33:45]: exiting. Session Elapsed time 2.069 seconds (00:00:02.069)
You should consider possible differences in text column length semantic when porting DDLs from one database to another; some databases use "character oriented" text columns length while others use a "byte oriented" semantic.
You can ask odb to multiply text column length when printing DDL using the switch
–U[non-wide_char_multiplier,wide_char_multiplier]
.
Example
$ ./odb64luo -u xxx -p xxx -d traf -i U2,4:TRAFODION.TPCH.P%
odb [2015-04-20 21:35:17]: starting ODBC connection(s)... 0
Connected to Trafodion
CREATE TABLE TRAFODION.TPCH."PART" ( P_PARTKEY BIGINT NOT NULL
,P_NAME VARCHAR(110) NOT NULL
,P_MFGR CHAR(50) NOT NULL
,P_BRAND CHAR(20) NOT NULL
,P_TYPE VARCHAR(50) NOT NULL
,P_SIZE INTEGER NOT NULL
,P_CONTAINER CHAR(20) NOT NULL
,P_RETAILPRICE BIGINT NOT NULL
,P_COMMENT VARCHAR(46) NOT NULL
,PRIMARY KEY (P_PARTKEY)
);
CREATE TABLE TRAFODION.TPCH."PARTSUPP" ( PS_PARTKEY BIGINT NOT NULL
,PS_SUPPKEY BIGINT NOT NULL
,PS_AVAILQTY INTEGER NOT NULL
,PS_SUPPLYCOST BIGINT NOT NULL
,PS_COMMENT VARCHAR(398) NOT NULL
,PRIMARY KEY (PS_PARTKEY,PS_SUPPKEY)
);
odb [2015-04-20 21:35:18]: exiting. Session Elapsed time 1.620 seconds (00:00:01.620)
The command in the above example multiplies the length of "non-wide" text fields by 2 and the length of wide text fields by 4.
5. Load, Extract, Copy
5.1. Load Files
You can load a data file using -l
option.
Example
$ odb64luo –u user –p xx –d dsn –l src=customer.tbl:tgt=TRAFODION.MAURIZIO.CUSTOMER \
:fs=\|:rows=1000:loadcmd=UL:truncate:parallel=4
This command:
-
Loads the file named
customer.tbl
(src=customer.tbl
) -
in the table
TRAFODION.MAURIZIO.CUSTOMER
(tgt=TRAFODION.MAURIZIO.CUSTOMER
) -
using
|
(vertical bar) as a field separator (fs=\|
) -
using
1000 rows
as row-set buffer (rows=1000
) -
using
UPSERT USING LOAD
syntax to achieve better throughput as described in Trafodion Load and Transform Guide -
truncating the target table before loading (
truncate
) -
using
4 parallel threads
to load the target table (parallel=4
)
5.1.1. Data Loading Operators
-l src=[-]file:tgt=table[:map=mapfile][:fs=fieldsep][:rs=recsep][:soe]
[:skip=linestoskip][:ns=nullstring][:ec=eschar][:sq=stringqualifier]
[:pc=padchar][:em=embedchar][:errmax=#max_err][:commit=auto|end|#rows|x#rs]
[:rows=#rowset][:norb][:full][:max=#max_rec][:truncate][:show][:bpc=#][:bpwc=#]
[:nomark][:parallel=number][:iobuff=#size][:buffsz=#size]][:fieldtrunc=\{0-4}]
[:pre=\{@sqlfile}|\{[sqlcmd]}][:post=\{@sqlfile}|\{[sqlcmd]}][:ifempty]
[:direct][:bad=[+]badfile][:tpar=#tables][:maxlen=#bytes][:time]
[:xmltag=[+]element][:xmlord][:xmldump][:loadcmd=IN|UP|UL]
The following table describes each data loading operator:
Load option | Meaning |
---|---|
|
Input file. You can use the following keywords for this field: |
|
This is the target table |
|
This is the field separator. You can define the field separator: |
|
This is the record separator. You can define the record separator the
same way as the field separator. Default record separator is |
|
Pad character used when loading fixed format files. You can use the same notation as the field separator. |
|
Uses mapfile to map source file to target table columns. See Map Source File Fields to Target Table Columns. |
|
Skips a given number of lines when loading. This can be useful to skip headers in the source file. |
|
The max number of records to load. Default is to load all records in the input file |
|
odb inserts NULL when it finds nullstring in the input file. By default the nullstring is the empty string |
|
The string qualifier character used to enclose strings. You can define the escape character the same way as the field separator. |
|
The character used as escape character. You can define the escape character the same way as the field separator. |
|
This defines the size of the I/O buffer for each loading thread. |
|
Where to write rejected rows. If you omit this parameter, then rejected rows is printed to standard error together with the error returned by
the ODBC Driver. |
|
Truncates the target table before loading. |
|
Loads the target table only if it contains no records. |
|
Loads |
|
Don’t print the number of records loaded so far during loads. |
|
Stop On Error — stop as soon as odb encounters an error. |
|
Number of loading threads. odb uses: |
|
odb prints up to num error messages per rowset. Normally used with soe to limit the number of error messages printed to the standard error stream. |
|
Defines how odb commits the inserts. You have the following choices: |
|
Adds |
|
Defines how odb manages fields longer than the destination target column: |
|
Character used to embed binary files. See Load Default Values. You can define the embed character the same way as the field separator. No default value. |
|
odb runs a single instance of either |
|
odb runs a single instance of either |
|
odb loads |
|
odb prints what would be loaded in each column but no data is actually loaded. This is useful if you want to see how the input file
fits into the target tables, Normally used to analyze the first few rows of CSV files (use |
|
odb limits the amount of memory allocated in the ODBC buffers for CHAR/VARCHAR fields to |
|
odb prints a time line (milliseconds from start) for each insert. |
|
Bytes allocated in the ODBC buffer for each (non wide) CHAR/VARCHAR column length unit. (Default: 1) |
|
Bytes allocated in the ODBC buffer for each (wide) CHAR/VARCHAR column length unit. (Default: 4) |
|
Input file is XML. Load all XML nodes under the one specified with this option. If a plus sign is specified, then odb loads node-attributes values. |
|
By default, odb matches target table columns with XML node or attributes using their names. If this option is specified, then odb loads the first node/attribute to the first column, the second node/attribute to the second column and so on without checking node/attribute names. |
|
odb does not load the XML file content. Instead, XML attribute/tage names are printed to standard output so you can check what is going to be loaded. |
|
SQL operation to be used for load. (Default: |
You can load multiple files using different -l
options. By default odb creates as many threads (and ODBC connections) as the sum of
parallel load threads. You can limit this number using -T
option.
Example
$ odb64luo –u user –p xx –d dsn –T 5 \
-l src=./data/%t.tbl.gz:tgt=TRAFODION.MAURO.CUSTOMER:fs=\
|:rows=m2:truncate:norb:parallel=4 \
-l src=./data/%t.tbl.gz:tgt=TRAFODION.MAURO.ORDERS:fs=\
|:rows=1000:truncate:norb:parallel=4 \
-l src=./data/%t.tbl.gz:tgt=TRAFODION.MAURO.LINEITEM:fs=\
|:rows=m10:truncate:norb:parallel=4
The above command truncates and loads the CUSTOMER, ORDERS and LINEITEM tables. The input files have the same name as the target tables -– in lower case). Loads are distributed among available threads this way:
Load Order | Thread 0 | Thread 1 | Thread2 | Thread3 | Thread4 |
---|---|---|---|---|---|
Third |
Read |
Load |
Load |
Load |
Load |
Second |
Read |
Load |
Load |
Load |
Load |
First |
Read |
Load |
Load |
Load |
Load |
If you want to load more than one table in parallel you should use a number of threads defined as:
(parallel + 1) * tables_to_load_in_parallel
You can load gzipped files without any special option. odb automatically checks input files and decompress them on the fly when needed. |
odb using one single loading thread (parallel=1
) is faster than without parallel — if you do not specify
parallel, odb uses one thread to both read from file and write into the target table:
Read buffer #1>Write Buffer #1>Read Buffer #2>Write Buffer #2>Read Buffer #3>Write Buffer#3>…
parallel=1
defines that there is one thread to read from file and one thread to write:
-
Read buffer #1>Read Buffer #2>Read Buffer #3>…
-
Write Buffer #1>Write Buffer #2>Write Buffer #3>…
Reading from file is normally much faster than writing via ODBC so a single reading thread can serve different loading threads. One could ask: what the right number of loading threads is?
In order to define the right number of loading threads you should run a few test and monitor the Wait Cycles reported by odb. Wait Cycles represent the number of times the reading thread had to wait for one loading thread to become available.
-
When you have high Wait Cycles/Total Cycles” ratio… it’s better to increase the number of writers.
-
When the Wait Cycles/Total Cycles is less than 5%, adding more loading threads is useless or counterproductive.
5.2. Map Source File Fields to Target Table Columns
odb, by default, assumes that input files contain as many fields as the target table columns, and that file fields and target table columns are in the same order. This means that the first field in the input file is loaded in the first table column, second input field goes to the second column and so on.
If this basic assumption is not true and you need more flexibility to link input fields to target table columns, then
odb provides mapping/transformation capabilities though mapfiles. By specifying map=<mapfile>
load option you can:
-
Associate any input file field to any table column
-
Skip input file fields
-
Generate sequences
-
Insert constants
-
Transform dates/timestamps formats
-
Extract substrings
-
Replace input file strings. For example: insert
Maurizio Felici
when you readMF
-
Generate random values
-
… and much more
A generic mapfile contains:
-
Comments (line starting with
#
) -
Mappings to link input file fields to the corresponding target table columns.
Mappings use the following syntax:
<colname>:<field>[:transformation operator]
Where:
-
<colname>
is the target table column name. (Case sensitive) -
<field>
is one of the following: -
The ordinal position (
starting from zero
) of the input file field.First input field is
0
(zero), second input field is1
and so on -
CONST:<CONSTANT>
to load a constant value -
SEQ:<START>
to generate/load a sequence starting from<START>
-
IRAND:<MIN>:<MAX>
to generate/load a random integer between<MIN>
and<MAX>
-
DRAND:<MIN_YEAR>:<MAX_YEAR>
to generate/load a random date (YYYY-MM-DD
) between<MIN_YEAR>
and<MAX_YEAR>
-
TMRAND
: to generate/load a random time (hh:mm:ss
) between00:00:00
and23:59:59
-
TSRAND
: to generate/load a random timestamp (YYYY-MM-DD hh:mm:ss
) between midnight UTC –- 01 Jan 1970 and the current timestamp -
CRAND:<LENGTH>
generates/loads a string of<LENGTH>
characters randomly selected in the following ranges:a-z
,A-Z
,0-9
-
NRAND:<PREC>:<SCALE>
generates/loads a random NUMERIC field with precision<PREC>
and scale<SCALE>
-
DSRAND:<file>
selects and loads a random line from<file>
-
TXTRAND:<MIN_LENGTH>:<MAX_LENGTH>:<file>:
selects and loads a random portion of test from<file>
with length between<MIN_LENGTH>
and<MAX_LENGTH>
-
LSTRAND:<VALUE1,VALUE2,…>
selects and loads a random value from<VALUE1,VALUE2,…>
-
EMRAND:<MIN_ULENGTH>:<MAX_ULENGTH>:<MIN_DLENGTH>:<MAX_DLENGTH>:<SUFFIX1,SUFFIX2,…>
generates and loads a string made oflocal@domain.suffix
where:-
local is a string of random characters (
a-z
,A-Z
,0-9
) with length between<MIN_ULENGTH>
and<MAX_ULENGTH>
-
domain is a string of random characters (
a-z
,A-Z
,0-9
) with length between<MIN_DLENGTH>
and<MAX_DLENGTH>
-
suffix is a randomly selected suffix from
<SUFFIX1,SUFFIX2,…>
-
-
CDATE
: to load the current date (YYYY-MM-DD
) -
CTIME
: to load the current time (hh:mm:ss
) -
CTSTAMP
: to load the current timestamp (YYYY-MM-SS hh:mm:ss
) -
FIXED:<START>:<LENGTH>
to load fixed format fields made of<LENGTH>
characters starting at<START>
.<START>
starts from zero. -
EMPTYASEMPTY
: loads empty strings in the input file as empty strings (default is to load empty string as NULLs). -
EMPTYASCONST:<CONSTANT>
: loads empty fields in the input file as<CONSTANT>
.Currently, only EMPTYASEMPTY
andEMPTYASCONST
are valid empty mapping options. If the empty mapping option is specified in the mapfile, then an empty string is treated as an empty string or a constant.
Ifns
option is specified, then an empty string is treated as an empty string.
odb always prefers to use the empty mapping option if it conflicts with thens
option.
If both the empty mapping option and thens
option are not specified, then an empty string is treated as NULL. -
NULL
: insertsNULL
-
:transformation operators
(optional): -
SUBSTR:<START>:<END>
. For example, if you have an input field containingTib:student
a transformation rule likeSUBSTR:3:6`m then `Tib
is loaded into the database. -
TSCONV:<FORMAT>
. Converts timestamps from the input file format defined through<FORMAT>
toYYYY-MM-DD HH:MM:SS
before loading. The input format is defined through any combination of the following characters:Char Meaning b
abbreviated month name
B
full month name
d
day of the month
H
hour (24 hour format)
m
month number
M
Minute
S
Second
y
year (four digits)
D#
#decimal digits
.
ignore a single char
_
ignore up to the next digit
-
DCONV:<FORMAT>
. Converts dates from the input file format defined through<FORMAT>
toYYYY-MM-DD
(seeTSCONV
operator).
Example:DCONV:B.d.y
convertsAugust,24 1991
to1991-08-24
-
TCONV:<FORMAT>
. Converts times from the input file format defined through<FORMAT>
toHH:MM:SS
(seeTSCONV
operator). -
REPLACE:<READ>:<WRITTEN>
. Loads the string<WRITTEN>
when the input file fields contains<READ>
. If the input file string doesn’t match<READ>
, then it is loaded as is. -
TOUPPER
. Converts the string read from the input file to uppercase before loading.Example:
proGRAmMEr —> PROGRAMMER
-
TOLOWER
. Converts the string read from the input file to lowercase before loading.Example:
proGRAmMEr —> programmer
-
FIRSTUP
. Converts the first character of the string read from the input file to uppercase and the remaining characters to lowercase before loading.Example:
proGRAmMEr —> Programmer
-
TRANSLIT:<LIST OF CHARS>:<LIST OF CHARS>
. Lets you to delete or change any character with another.Examples
-
WORK:7:translit:Gp:HP
loads the seventh input field into the target column namedWORK
and replaces allG
withH
and allp
withP
-
WORK:7:translit:Gp\r:HP\d
behaves like the previous example but also deletes allcarriage returns
(\r
) -
CSUBSTR
. This operator is somehow similar toSUBSTR
but instead of using fixed position to extract substrings will use delimiting characters. For example, suppose your input fields (comma is the field separator) are:... other fields...,name_Maurizio.programmer,...other fields ... other fields...,_name_Lucia.housewife, ...other fields... ... other fields...,first_name_Giovanni.farmer,... other fields... ... other fields...,_Antonella,... other fields... ... other fields...,Martina,...other fields... ... other fields...,Marco.student, ...other fields...
Using a transformation like:
NAME:4:CSUBSTR:95:46
(where95
is the ASCII code for_
and 46 is the ASCII code for.
) results in loading the following values into the target (NAME
) column:Maurizio Lucia Giovanni Antonella Martina Marco
-
-
COMP
. Transform a packed binaryCOMP
into a target database number.For example:
hex 80 00 00 7b
is loaded as-123
-
COMP3:PRECISION:SCALE
. Transform a packed binaryCOMP-3
format into a target database number.For example:
hex 12 34 56 78 90 12 34 56 78 9b
is loaded as-1234567890123456.789
-
ZONED:PRECISION:SCALE
. Transform a packed binaryZONED
format into a target database number.For example:
hex 31 32 33 34 35 36
is loaded as+.123456
5.3. Use mapfiles to Ignore and/or Transform Fields When Loading
The following example explains mapfile usage to skip/transform or generate fields. Suppose you have a target table like this:
+------+---------------+----+-------+------------+
|COLUMN|TYPE |NULL|DEFAULT|INDEX |
+------+---------------+----+-------+------------+
|ID |INTEGER SIGNED |NO | |mf_pkey 1 U |
|NAME |CHAR(10) |YES | | |
|AGE |SMALLINT SIGNED|YES | | |
|BDATE |DATE |YES | | |
+------+---------------+----+-------+------------+
And an input file like this:
uno,00,51,due,Maurizio,tre,07 Mar 1959, ignore,remaining, fields
uno,00,46,due,Lucia,tre,13 Oct 1964, ignore, this
uno,00,34,due,Giovanni,tre,30 Mar 1976
uno,00,48,due,Antonella,tre,24 Apr 1962
*
-
Bold text represents age.
-
Italics text represents name.
-
Underline text represents birth date.
You want to load the marked fields into the appropriate column, generate a unique key for ID and ignore the remaining fields.
In addition: you need to convert date format and replace all occurrences of Lucia
with Lucy
.
The following map file accomplishes these goals:
$ cat test/load_map/ml1.map
# Map file to load TRAFODION.MFTEST.FRIENDS from friends.dat
ID:seq:1 # Inserts into ID column a sequence starting from 1
NAME:4:REPLACE:Lucia:Lucy # Loads field #4 into NAME and replace all occurrences of Lucia with Lucy
AGE:2 # Loads field #2 (they start from zero) into AGE
BDATE:6:DCONV:d.b.y # Loads field #6 into BDATE converting date format from dd mmm yyyy
Load as follows:
$ odb64luo –u user –p xx –d dsn \
-l src=friends.dat:tgt=TRAFODION.MFTEST.FRIENDS:map=ml1.map:fs=,
5.4. Use mapfiles to Load Fixed Format Files
Suppose you have a target table like this:
+------+---------------+----+-------+------------+
|COLUMN|TYPE |NULL|DEFAULT|INDEX |
+------+---------------+----+-------+------------+
|NAME |CHAR(10) |YES | | |
|JOB |CHAR(10) |YES | | |
|BDATE |DATE |YES | | |
+------+---------------+----+-------+------------+
And an input file like this:
GiovanniXXX30 Mar 1976YFarmer
Lucia XXX13 Oct 1964YHousewife
Martina XXX28 Oct 1991Y?
Marco XXX06 Nov 1994Y?
MaurizioXXX07 Mar 1959YProgrammer
You want to load the fixed-position fields into the appropriate columns and to convert date format. Null values in the input file are represented by question marks. In this case you can use a mapfile like this:
~/Devel/odb $ cat test/fixed/ff.map
NAME:FIXED:0:8 <- insert into NAME characters starting at position 0, length 8
BDATE:FIXED:11:11:DCONV:d.b.y <- insert into BDATE characters starting at col 11, length 11 and convert date
JOB:FIXED:23:10 <- insert into JOB characters starting at position 23, length 10
Load as follows:
$ odb64luo –u user –p xx –d dsn \
-l src=frends1.dat:tgt=TRAFODION.MFTEST.FRIENDS1:map=ff.map:ns=\?:pc=32
Where: pc=32
identify the pad character in the input file (space
= ASCII 32) and ns=?
defines
the null string in the input file.
5.5. Generate and Load Data
odb can generate and load data for testing purposes. The following example illustrates the odb capabilities in this area through an example.
Suppose you want to fill with test data a table like this:
CREATE TABLE TRAFODION.MAURIZIO."PERSON"
( PID BIGINT SIGNED NOT NULL
, FNAME CHAR(20) NOT NULL
, LNAME CHAR(20) NOT NULL
, COUNTRY VARCHAR(40) NOT NULL
, CITY VARCHAR(40) NOT NULL
, BDATE DATE NOT NULL
, SEX CHAR(1) NOT NULL
, EMAIL VARCHAR(40) NOT NULL
, SALARY NUMERIC(9,2) NOT NULL
, EMPL VARCHAR(40) NOT NULL
, NOTES VARCHAR(80)
, LOADTS TIMESTAMP(0)
, PRIMARY KEY (PID)
)
;
You can use a mapfile like this:
~/Devel/odb $ cat person.map
PID:SEQ:100
FNAME:DSRAND:datasets/first_names.txt
LNAME:DSRAND:datasets/last_names.txt
COUNTRY:DSRAND:datasets/countries.txt
CITY:DSRAND:datasets/cities.txt
BDATE:DRAND:1800:2012
SEX:LSTRAND:M,F,U
EMAIL:EMRAND:3:12:5:8:com,edu,org,net
SALARY:NRAND:9:2
EMPL:DSRAND:datasets/fortune500.txt
NOTES:TXTRAND:20:80:datasets/lorem_ipsum.txt
LOADTS:CTSTAMP
Where:
-
PID:SEQ:100
— Loads a sequence starting from100
intoPID
-
FNAME:DSRAND:datasets/first_names.txt
— LoadsFNAME
with a randomly selected value fromfirst_names.txt
. There are plenty of sample datasets available to generate all sort of data using realistic values. -
LNAME:DSRAND:datasets/last_names.txt
— LoadsLNAME
with a random value fromlast_names.txt
. -
COUNTRY:DSRAND:datasets/countries.txt
— LoadsCOUNTRY
with a random value fromcountries.txt
. -
CITY:DSRAND:datasets/cities.txt
— LoadsCITY
with a random value fromcities.txt
. -
BDATE:DRAND:1800:2012
— Generates and loads intoBDATE
a random date between1800-01-01
and2012-12-31
. -
SEX:LSTRAND:M,F,U
— LoadsSEX
with a random value in theM
,F
,U
range. -
EMAIL:EMRAND:3:12:5:8:com,edu,org,net
— Generates and loads alocal@domain.suffix email
addresses where: -
local
is made of 3 to 12 random characters. -
domain
is made of 5 to 8 random characters. -
suffix
iscom
,ord
,edu
, ornet
. -
SALARY:NRAND:9:2
— Generate and loads a random NUMERIC(9,2). -
EMPL:DSRAND:datasets/fortune500.txt
— LoadsEMPL
with a random value fromfortune500.txt
. -
NOTES:TXTRAND:20:80:datasets/lorem_ipsum.txt
— LoadsNOTES
with a random section oflorem_ipsum.txt
with length between 20 and 80 characters` -
LOADTS:CTSTAMP
— Loads the current timestamp intoLOADTS
.
You generate and load test data with a command like this:
$ bin/odb64luo -u user -p password -d traf -l src=nofile:
tgt=traf.maurizio.person:max=1000000:
map=person.map:rows=5000:parallel=8:loadcmd=INSERT
Please note src=nofile” (it means there is no input file) and `max=1000000
(generate and load one million rows). The above command
has generated and loaded 1M rows of realistic data in about ten seconds:
[0] odb Loading statistics:
[0] Target table: TRAFODION.MAURIZIO.PERSON
[0] Source: nofile
[0] Pre-loading time: 2.911 s
[0] Loading time: 7.466 s
[0] Total records read: 1,000,000
[0] Total records inserted: 1,000,000
[0] Total number of columns: 12
[0] Total bytes read: 3,963
[0] Average input row size: 0.0 B
[0] ODBC row size: 323 B (data) + 88 B (len ind) [0] Rowset size: 5,000
[0] Rowset buffer size: 2,006.83 KiB
[0] Load Performances (real data): 0.518 KiB/s
[0] Load Performances(ODBC): 42,243.161 KiB/s
[0] Reader Total/Wait Cycles: 200/16
5.6. Load Default Values
The simpler way to load database generated defaults is to ignore the associated columns in the map file. For example, suppose you have a table like this under Trafodion:
create table TRAFODION.maurizio.dtest
( id largeint generated by default as identity not null
, fname char(10)
, lname char(10) default 'Felici'
, bdate date
, comment varchar(100)
)
;
If you have an input file containing:
ignoreme,Maurizio,xyz,commentM, ignore,remaining, fields
ignoreme,Lucia,xyz,commentL, ignore, this
ignoreme,Giovanni,xyz,commentG,
ignoreme,Antonella,xyz,commentA
and a map-file like this:
FNAME:2
BDATE:CDATE
COMMENT:4
Then:
-
First column (
ID
) is loaded with its default value (not in the map file) -
Second column (
FNAME
) is loaded with the second input field from file (FNAME:2
) -
Third column (
LNAME
) is loaded with its default value (not in the map file) -
Fourth column (
BDATE
) is loaded with the Current Data generated by odb (BDATE:CDATE
) -
Fifth column (
COMMENT
) is loaded with the fourth column in the input file (COMMENT:4
)
When loaded, the result will look like (your results in ID and BDATE will vary):
>>select * from trafodion.maurizio.dtest;
ID FNAME LNAME BDATE COMMENT
-------------------- ---------- ---------- ---------- ----------------------------------------------------------------------------------------------------
6 Maurizio Felici 2019-08-06 commentM
7 Lucia Felici 2019-08-06 commentL
8 Giovanni Felici 2019-08-06 commentG
9 Antonella Felici 2019-08-06 commentA
--- 4 row(s) selected.
>>
5.7. Load Binary Files
Assuming that your back-end database (and your ODBC Driver) supports BLOB data types, or equivalent,
you can use odb to directly load binary (or any other) files into a database column using the [:em=char]
symbol
to identify the file to be loaded into that specific database field.
Example
Suppose you have a table like this (MySQL):
create table pers.myphotos
( id integer
, image mediumblob
, phts timestamp
)
;
Then, you can load a file like this:
$ cat myphotos.csv
001,@/home/mauro/images/image1.jpg,2012-10-21 07:31:21
002,@/home/mauro/images/image2.jpg,2012-10-21 07:31:21
003,@/home/mauro/images/image3.jpg,2012-10-21 07:31:21
by running a command like this:
$ odb64luo –u user –p xx –d dsn -l src=myphotos.csv:tgt=pers.myphotos:em=\@
odb considers the string following the “em” character as the path of the file to be loaded in that specific field.
odb does not load rows where the size of the input file is greater than the target database column. |
5.8. Load XML Files
Trafodion odb supports loading XML files into tables, the key construct for XML files can be an element or an attribute.
5.8.1. Load XML Files Where Data is Stored in Element Nodes
-
Create a table.
./odb64luo -x "create table testxmlload(id int, name char(20))"
-
Suppose you have a xml file where data is stored in element nodes like the following.
-bash-4.1$ cat test.xml
<?xml version="1.0" encoding="UTF-8"?>
<data>
<id>1</id>
<name>hello</name>
</data>
To check what will be loaded before loading XML file into table, run the following command
./odb64luo -l src=test.xml :tgt=testxmlload:xmltag=data:xmldump
|
-
Load the test.xml file into the table, run the following command.
./odb64luo -l src=test.xml:tgt=testxmlload:xmltag=data
xmltag=data means odb will load data from the element nodes. For more information, see Data Loading Operators.
|
5.8.2. Load XML Files Where Data is Stored in Attribute Nodes
-
Create a table.
./odb64luo -x "create table testxmlload(id int, name char(20))"
-
Suppose you have a XML file where data is stored in attribute nodes like the following.
-bash-4.1$ cat test.xml
<?xml version="1.0" encoding="UTF-8"?>
<data id="1" name="hello"></data>
To check what will be loaded before loading XML file into table, run the following command.
./odb64luo -l src=test.xml:tgt=testxmlload:xmltag=data:xmldump
|
-
Load the test.xml file into the table, run the following command.
./odb64luo -l src=test.xml:tgt=testxmlload:xmltag=+data
xmltag=+data (with a plus sign specified) means odb will load data from the attribute nodes. For more information, see Data Loading Operators.
|
5.9. Reduce the ODBC Buffer Size
odb allocates memory for the ODBC buffers during load/extract operations based on the max possible length of the source/target columns.
If you have a column defined as VARCHAR(2000
), then odb allocates enough space for 2,000 characters in the ODBC buffer.
If you know in advance that you never will load/extract 2,000 characters, then you can limit the amount of space allocated by odb. This reduces memory usage and increase performances because of the reduced network traffic.
Given the following table:
~/Devel/odb $ ./odb64luo -u xxx -p xxx -d traf -i D:TRAFODION.USR.TMX
odb [2015-04-20 21:41:38]: starting ODBCconnection(s)... 0
Connected to Trafodion
CREATE TABLE TRAFODION.USR."TMX"
( ID INTEGER NOT NULL
, NAME VARCHAR(400)
, PRIMARY KEY (ID)
)
;
And an input file that contains:
~/Devel/odb $ cat tmx.dat
1,Maurizio
2,Lucia
3,Martina
4,Giovanni
5,Marco
6,Roland
7,Randy
8,Paul
9,Josef
10,Some other name
The max length of the second field in this file is:
~/Devel/odb $ awk -F\, 'BEGIN\{max=0} \{if(NF==2)\{len=length($i);if(len>max)max=len}}
END\{print max}' tmx.dat
15
In this case you can use :maxlen=15
to limit the amount of the ODBC buffer:
~/Devel/odb $ ./odb64luo -u xxx -p xxx -d traf -l src=tmx.dat:tgt=usr.tmx:truncate:maxlen=15
odb [2015-04-20 21:46:11]:starting ODBC connection(s)... 0
Connected to Trafodion
[0.0.0]--- 0 row(s) deleted in 0.052s (prep 0.012s, exec 0.040s, fetch 0.000s/0.000s)
[0] 10 records inserted [commit]
[0] odb version 1.3.0 Load(2) statistics:
[0] Target table: (null).USR.TMX
[0] Source: tmx.dat
[0] Pre-loading time: 1.254 s (00:00:01.254)
[0] Loading time: 0.026 s(00:00:00.026)
[0] Total records read: 10
[0] Total records inserted: 10
[0] Total number of columns: 2
[0] Total bytes read: 99
[0] Average input row size: 9.9 B
[0] ODBC row size: *26 B (data) + 16 B (len ind)*
[0] Rowset size: 100
[0] Rowset buffer size: *4.10 KiB*
[0] Load throughput (real data): 3.718 KiB/s
[0] Load throughput (ODBC): 9.766 KiB/s
odb [2015-04-20 21:46:12]: exiting. Session Elapsed time 1.294 seconds (00:00:01.294)
If you do not specify this parameter odb allocates the buffer for the max possible length of each field:
~/Devel/odb $ ./odb64luo -u xxx -p xxx -d traf -l src=tmx.dat:tgt=usr.tmx:truncate
odb [2015-04-20 21:47:13]: starting ODBC connection(s)... 0
Connected to Trafodion
[0.0.0]--- 10 row(s) deleted in 0.107s (prep 0.012s, exec 0.095s, fetch 0.000s/0.000s)
[0] 10 records inserted [commit]
[0] odb version 1.3.0 Load(2) statistics:
[0] Target table: (null).USR.TMX
[0] Source: tmx.dat
[0] Pre-loading time: 1.330 s (00:00:01.330)
[0] Loading time: 0.032 s(00:00:00.032)
[0] Total records read: 10
[0] Total records inserted: 10
[0] Total number of columns: 2
[0] Total bytes read: 99
[0] Average input row size: 9.9 B
[0] ODBC row size: 411 B (data) + 16 B (len ind)
[0] Rowset size: 100
[0] Rowset buffer size: 41.70 KiB
[0] Load throughput (real data): 3.021 KiB/s
[0] Load throughput (ODBC): 125.427 KiB/s
odb [2015-04-20 21:47:14]: exiting. Session Elapsed time 1.373 seconds (00:00:01.373)
5.10. Extract Tables
You can use odb to extract tables from a database and write them to standard files (or named pipes).
Example
$ odb64luo –u user –p xx –d dsn –T 3 \
-e src=TRAFODION.MAURIZIO.LIN%:tgt=$\{DATA}/ext_%t.csv.gz:rows=m10:fs=\|:trim:gzip: \
-e src=TRAFODION.MAURIZIO.REGION:tgt=$\{DATA}/ext_%t.csv.gz:rows=m10:fs=\|:trim:gzip \
-e src=TRAFODION.MAURIZIO.NATION:tgt=$\{DATA}/ext_%t.csv.gz:rows=m10:fs=\|:trim:gzip
The example above:
-
Extracts tables
REGION
,NATION
, and all tables starting withLIN
fromTRAFODION.MAURIZIO
schema. -
Saves data into files
ext_%t.csv.gz
(%t
is expanded to the real table name). -
Compresses the output file (gzip) on the fly (uncompressed data never lands to disk).
-
Trims text fields.
-
Uses a 10 MB IO buffer.
-
Uses three threads (ODBC connection) for the extraction process.
5.10.1. Extraction Options
-e {src={table|-file}|sql=<customsql>}:tgt=[+]file[:pwhere=where_cond]
[:fs=fieldsep][:rs=recsep][:sq=stringqualifier][:ec=escape_char][:soe]
[:ns=nullstring][es=emptystring][:rows=#rowset][:nomark][:binary][:fwc]
[:max=#max_rec][:trim=[cCvVdt]][:rtrim][:cast][:multi][:efs=string]
[:parallel=number][:gzip][:gzpar=wb??][:uncommitted][:splitby=column]
[:pre={@sqlfile}|{[sqlcmd]}[:mpre=\{@sqlfile}|{[sqlcmd]}[:post={@sqlfile}|{[sqlcmd]}]
[tpar=#tables][:time][:nts][:cols=[-]columns]][:maxlen=#bytes][:xml]
The following table describes each extract operator:
Extract option | Meaning |
---|---|
|
Defines the source table(s). You can use: |
|
A custom SQL command you can use to extract data. This is alternative to |
|
Output file. You can use the following keywords for this field: |
|
Field separator. You can define the field separator as: |
|
Record separator. You can define the record separator the same way as the field separator. |
|
Max number of records to extract. |
|
The string qualifier character used to enclose strings. You can define the string qualifier the same way as the field separator |
|
Character used as escape character. You can define the
escape character the same way as the field separator. |
|
Defines the size of the I/O buffer for each extraction thread. You
can define the size of this buffer in two different ways: |
|
How odb represents NULL values in the output file. |
|
How odb represents VARCHAR empty strings (NOT NULL with zero
length) values in the output file. |
|
This are extra parameters you can pass to tune the gzip compression algorithm. |
|
Accept the following optional parameters: |
|
Don’t print the number of records extracted so far by each thread. |
|
Stop On Error. odb stop as soon as it encounters an error. |
|
odb uses as many threads as the parallel argument to extract data from partitioned source tables. You have to use splitby. |
|
This option can be used in conjunction with parallel operator to write as many output files as the number of extraction
threads. Output file names are built adding four digits at the end of the file identified by the |
|
This option is used in conjunction with parallel limiting the
extraction to records satisfying the where condition. |
|
odb prints up to num error messages per rowset. Normally used with soe to limit the number of error messages printed to the standard error stream. |
|
Adds FOR READ UNCOMMITTED ACCESS to the select(s) command(s). |
|
RTRIM() CHAR columns on the server. From a functional point of view this is equivalent to |
|
Performs a (server side) cast to VARCHAR for all non-text columns. Main scope of this operator is to move CPU cycles from the client to
the database server. It increases network traffic. To be used when: |
|
This operator let you to use parallel extract from any database. <column> has to be a SINGLE, numeric column.
odb calculates min()/max() value for |
|
odb runs a single instance of either the |
|
Each odb thread runs either sqlfile script or sqlcmd SQL command (enclosed between
square brackets)on the source system immediately before table extraction. You can use |
|
odb runs a single instance of either a |
|
odb extracts |
|
odb limits the amount of memory allocated in the ODBC buffers for CHAR/VARCHAR fields to |
|
Writes output file in XML format |
|
odb prints a timeline (milliseconds from start). |
-
The following characters are considered spaces:
blank
,tab
,new line
,carriage return
,form feed
, andvertical tab
. -
When the source table column is defined as NOT NULL and the specific field contains only blanks, odb leaves in the output file one single blank. This helps to distinguish between NULL fields (
<field_sep><field_sep>
) and NOT NULL fields containing all blanks (<field_sep><blank><field_sep>
).
5.11. Extract a List of Tables
You can use odb to extract all tables listed in a file.
Example
~/Devel/odb $ cat tlist.txt
# List of tables to extract src=TRAFODION.MAURIZIO.ORDERS
src=TRAFODION.MAURIZIO.CUSTOMER src=TRAFODION.MAURIZIO.PART
src=TRAFODION.MAURIZIO.LINEITEM
You can extract all these tables by running:
$ odb64luo –u user –p xx –d dsn -e src=-tlist.txt:tgt=%t_%d%m:rows=m20:sq=\”
Please note the src=-tlist.txt
.
5.12. Copy Tables From One Database to Another
odb can directly copy tables from one data-source to another. For example, from Trafodion to Teradata or vice-versa). Data never lands to disk when using this option.
The target table has to be be created in advance and should have a compatible structure.
5.12.1. Copy Operators
-cp src={table|-file:tgt=schema[.table][pwhere=where_cond][:soe][:nts]
[:truncate][:rows=#rowset][:nomark][:max=#max_rec][:fwc][:bpwc=#]
[:parallel=number][errmax=#max_err][:commit=auto|end|#rows|x#rs][:time]
[:direct][:uncommitted][:norb][:splitby=column][:pre={@sqlfile}|{[sqlcmd]}]
[:post={@sqlfile}|{[sqlcmd]}][:mpre={@sqlfile}|{[sqlcmd]}][:ifempty]
[:loaders=#loaders][:tpar=#tables][:cols=[-]columns]
[sql={[sqlcmd]|@sqlfile|-file}[:bind=auto|char|cdef]
[tmpre={@sqlfile}|{[sqlcmd]}][seq=field#[,start]]
Complete list of the Copy Operators:
Copy Operator | Meaning | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Defines the source table(s). You can use: |
||||||||||||||||||
|
Target table(s). You can use the following keywords for this field: |
||||||||||||||||||
|
odb uses a generic SQL — instead of a real table — as source. |
||||||||||||||||||
|
This is the max number of records to copy. Default is to copy all records in the source table. |
||||||||||||||||||
|
Defines the size of the I/O buffer for each copy thread. You can
define the size of this buffer in two different ways: |
||||||||||||||||||
|
Truncates the target table before loading. |
||||||||||||||||||
|
Loads the target table only if empty. |
||||||||||||||||||
|
Don’t print the number of records loaded so far during loads. |
||||||||||||||||||
|
Stop On Error. odb stops as soon as it encounters an error. |
||||||||||||||||||
|
odb uses two kinds of threads: • To extract data from partitioned source table. • To write data to the target table. Example If you specify parallel argument = 4 and the source table has 32 partitions, then odb start: • 4 threads (4 ODBC connections) to read from the source table. • 8 threads (8 ODBC connections = 4 [parallel argument] * 2 [default number of loaders]) to write to the target table.
You have to specify splitby. |
||||||||||||||||||
|
Used in conjunction with parallel to copy only records satisfying the where condition. |
||||||||||||||||||
|
Defines how odb will commit the inserts. You have the following choices: |
||||||||||||||||||
|
Adds |
||||||||||||||||||
|
odb prints up to num error messages per rowset. Normally used with soe to limit the number of error messages printed to the standard error stream. |
||||||||||||||||||
|
Adds |
||||||||||||||||||
|
Lets you to use parallel copy from any database.
<column> has to be a SINGLE, numeric column. odb calculates min()/max() value for |
||||||||||||||||||
|
odb runs a single instance of either a |
||||||||||||||||||
|
Each odb thread runs either a |
||||||||||||||||||
|
Each odb thread runs either a |
||||||||||||||||||
|
odb runs a single instance of either a |
||||||||||||||||||
|
odb copies |
||||||||||||||||||
|
odb uses |
||||||||||||||||||
|
Force Wide Characters. odb considers SQL_CHAR/SQL_VARCHAR fields as they were defined SQL_WCHAR/SQL_WVARCHAR. |
||||||||||||||||||
|
odb internally allocates 4 bytes/char for SQL_WCHAR/SQL_WVARCHAR columns. You can modify the number of bytes allocated for each char using this parameter. |
||||||||||||||||||
|
odb can bind columns to ODBC buffer as characters (char) or |
||||||||||||||||||
|
odb adds a sequence when loading the target system on column number |
||||||||||||||||||
|
odb prints a timeline (milliseconds from start). |
When copying data from one data source to another, odb needs user/password/dsn for both source and target system. User credentials and DSN for the target system are specified this way:
$ odb64luo –u src_user:tgt_user –p src_pwd:tgt:pwd –d src_dsn:tgt_dsn ... -cp src=...:tgt=...
5.13. Copy a List of Tables
You can use odb to copy a list of tables from one database to another.
Example
~/Devel/odb $ cat tlist.txt
# List of tables to extract
src=TRAFODION.MAURIZIO.ORDERS
src=TRAFODION.MAURIZIO.CUSTOMER
src=TRAFODION.MAURIZIO.PART
src=TRAFODION.MAURIZIO.LINEITEM
You can extract all these tables by running:
$ odb64luo –u user1:user2 –p xx:yy –d dsn1:dsn2 \
-cp src=-tlist.txt:tgt=tpch.stg_%t:rows=m2:truncate:parallel=4 -T 8
Please note the src=-tlist.txt
. This command copies:
Source | Target |
---|---|
|
|
|
|
|
|
|
|
Optionally, you can define any other command line options in the input file.
Example
Using different splitby columns.
~/Devel/odb $ cat tlist2.txt
# List of tables to extract and their “splitby columns”
src=TRAFODION.MAURIZIO.ORDERS:splitby=O_ORDERKEY
src=TRAFODION.MAURIZIO.CUSTOMER:splitby=C_CUSTOMERKEY
src=TRAFODION.MAURIZIO.PART:splitby=P_PARTKEY
src=TRAFODION.MAURIZIO.LINEITEM:splitby=L_PARTKEY
5.14. Case-Sensitive Table and Column Names
Your database configuration determines whether you can use case sensitive table/column names. odb maintains table/column case sensitiveness when they are enclosed in double quotes.
Example
The following commands create a TRAFODION.MAURIZIO.Names
table made of three columns:
“name”, “NAME” and “Name”.
create table trafodion.maurizio."Names"
( "name" char(10)
, "NAME" char(10)
, "Name" char(10)
)
no partitions;
Double quotes have to be escaped under *nix. A few examples:
~/Devel/odb $ ./odb64luo -i T:trafodion.maurizio.\"Names\"
~/Devel/odb $ ./odb64luo -x "select from trafodion.maurizio.\"Names\""
~/Devel/odb $ ./odb64luo -l src=names.txt:tgt=trafodion.maurizio.
\"Names\":map=names.map:pc=32
You can omit double quotes around column names when using mapfiles.
5.15. Determine Appropriate Number of Threads for Load/Extract/Copy/Diff
If you have to load/extract or copy multiple tables in parallel the best option is to use the options
:tpar=number
and :parallel=number
. :tpar
defines how many tables have to be copied/extracted
in parallel; :parallel
defines how many data streams to use for each table. This way, odb automatically
allocates and start the “right” number of threads.
A rule of thumb when copying/loading or extracting tables is to use as many data streams as:
min(number of middle-tier CPUs, number of source CPUs, number of target CPUs)
The number of threads started for each data stream depend on the operation type:
Operation | Total threads | Explanation | Example with parallel=4 |
---|---|---|---|
Load |
parallel + 1 |
One thread to read from file + one thread per parallel to load. |
5 |
Extract |
parallel |
One thread per parallel to extract. |
4 |
Copy |
parallel * (1+loaders) |
Two threads per parallel: read from source and write to target. |
12 (if loaders=2) |
Diff |
parallel * 3 |
Three threads per parallel: read from source, read from target, compare. |
12 |
5.16. Integrating With Hadoop
There are basically two ways to integrate a generic database with Hadoop using odb:
-
Use HIVE (Hadoop DWH) and its ODBC Driver: odb can access HIVE like any other normal relational database. For example, you can copy to from HIVE and other databases using odb’s copy option.
-
Add the
hdfs.
prefix to the input or output file during loads/extracts*: The file is read/written from/to Hadoop. odb interacts directly with the HDFS file system using libhdfs.This option is currently available only under Linux.
6. Comparing Tables (Technology Preview)
You can use odb to compare two tables with the same structure on different databases. odb does the following to compare two tables:
-
Extracts source/target tables ordered by Primary Key or any other set of columns.
-
Compare source/target ODBC buffers without unpacking them into columns/rows.
Each comparison stream is made up of three threads:
-
One thread reading from the source table.
-
One thread reading from the target table.
-
One thread comparing the source/target buffers.
These three threads work in parallel: the compare thread checks buffer N
while the other two threads
extract the net block of data from the source/target database in parallel.
You can have multiple triplets working in parallel on different section of the table using the splitby
operator.
Example
$ odb64luo -u mailto:maurizio.felici@hp.com[MFELICI:maurizio.felici@hp.com] \
-d MFELICI:VMFELICI \
-p xx:yy –diff src=trafodion.maurizio.lineitem:tgt=mftest.lineitem:* \
key=l_orderkey,l_linenumber:output=lineitem.diff:
rows=m2:print=IDC:*splitby=l_orderkey*:parallel=8
The command above compares two tables using eight streams (parallel=8
) made of three threads each.
The comparison threads use double buffering and advanced memory-comparison techniques. odb can provide the following information in output as a CSV file:
-
Missing rows on target (
D
– deleted – rows) based on the key columns. -
New rows on target (
I
– inserted – rows) based on the key columns. -
Changed rows (same key columns but with different values in other fields).
For these rows odb can print the original source version (
C
rows) and/or the modified target version (U
rows).
Example
odb output when comparing two tables:
$ cat lineitem.diff
DTYPE,L_ORDERKEY,L_LINENUMBER,L_SUPPKEY,L_PARTKEY,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,
L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT
D,4532896,1,5974,100953,42.00,82065.90,0.03,0.00,R,F,1994-12-15,1995-01-17,1995-01-07,COLLECT COD,TRUCK,leep across the ca
D,4532896,2,2327,102326,48.00,63759.36,0.07,0.05,A,F,1995-02-18,1994-12-10,1995-03-12,TAKE BACK RETURN,RAIL,usly regular platelets. careful
D,4532896,3,612,193054,12.00,13764.60,0.05,0.02,R,F,1994-11-17,1994-11-23,1994-12-06,COLLECT COD,SHIP,s haggle quickly. ideas after the
D,4532896,4,9867,47362,36.00,47136.96,0.10,0.06,A,F,1995-01-05,1994-11-29,1995-01-06,COLLECT COD,RAIL,s haggle carefully bo
D,4532896,5,9576,2075,19.00,18564.33,0.00,0.05,R,F,1994-11-26,1995-01-17,1994-12-03,COLLECT COD,TRUCK,en sauternes integrate blithely alon
D,4532896,6,1016,68509,9.00,13297.50,0.07,0.00,R,F,1995-02-16,1995-01-05,1995-02-24,TAKE BACK RETURN,RAIL,ily above the blithel
C,1652227,3,2298,87281,28.00,35511.84,0.06,0.05,R,F,1993-05-04,1993-03-12,1993-05-12,TAKE BACK RETURN,MAIL,lly final acco
U,1652227,3,2298,87281,99.99,35511.84,0.06,0.05,R,F,1993-05-04,1993-03-12,1993-05-12,TAKE BACK RETURN,MAIL,lly final acco
D,3456226,1,8161,148160,22.00,26579.52,0.06,0.02,A,F,1994-06-26,1994-06-08,1994-07-10,DELIVER IN PERSON,FOB,uriously. furio
D,3456226,2,6293,108762,20.00,35415.20,0.10,0.05,R,F,1994-05-07,1994-06-03,1994-05-15,NONE,RAIL,ously bold requests along the b
D,3456226,3,4542,159511,33.00,51826.83,0.05,0.03,A,F,1994-07-04,1994-05-15,1994-07-26,NONE,FOB,wake carefully al
D,3456226,4,154,95135,33.00,37294.29,0.04,0.08,A,F,1994-05-27,1994-05-10,1994-06-14,DELIVER IN PERSON,AIR,ests. unusual dependencies wake fluffily
D,3456226,5,9027,126514,31.00,47755.81,0.08,0.01,R,F,1994-06-13,1994-06-18,1994-07-10,TAKE BACK RETURN,FOB,according to the arefully regular instruct
D,3456226,6,8477,110943,14.00,27355.16,0.03,0.01,R,F,1994-07-03,1994-05-28,1994-07-13,TAKE BACK RETURN,FOB,onic accounts. ironic,pend
D,3456226,7,1773,4272,34.00,39993.18,0.08,0.00,A,F,1994-05-01,1994-05-29,1994-05-15,TAKE BACK RETURN,MAIL,ounts are finally ca
D,3456227,7,3722,101211,22.00,26668.62,0.02,0.01,N,O,1997-12-16,1998-02-05,1997-12-19,NONE,TRUCK,uriously even platelets are fu
I,3456227,8,3722,101211,22.00,26668.62,0.02,0.01,N,O,1997-12-16,1998-02-05,1997-12-19,NONE,TRUCK,uriously even platelets are fu
I,9999999,1,8161,148160,22.00,26579.52,0.06,0.02,A,F,1994-06-26,1994-06-08,1994-07-10,DELIVER IN PERSON,FOB,uriously. furio
I,9999999,2,6293,108762,20.00,35415.20,0.10,0.05,R,F,1994-05-07,1994-06-03,1994-05-15,NONE,RAIL,ously bold requests along the b
I,9999999,3,4542,159511,33.00,51826.83,0.05,0.03,A,F,1994-07-04,1994-05-15,1994-07-26,NONE,FOB,wakecarefully al
I,9999999,4,154,95135,33.00,37294.29,0.04,0.08,A,F,1994-05-27,1994-05-10,1994-06-14,DELIVER IN PERSON,AIR,ests. unusual dependencies wake fluffily
I,9999999,5,9027,126514,31.00,47755.81,0.08,0.01,R,F,1994-06-13,1994-06-18,1994-07-10,TAKE BACK RETURN,FOB,according to the carefully regular instruct
I,9999999,6,8477,110943,14.00,27355.16,0.03,0.01,R,F,1994-07-03,1994-05-28,1994-07-13,TAKE BACK RETURN,FOB,onic accounts. ironic, pend
I,9999999,7,1773,4272,34.00,39993.18,0.08,0.00,A,F,1994-05-01,1994-05-29,1994-05-15,TAKE BACK RETURN,MAIL,ounts are finally ca
As you can see the first column defines the type of difference.
6.1. Diff Operators
-diff
src={table|-file}:tgt=table:[key=columns][:output=[+]file][:pwhere=where_cond]
[:pwhere=where_cond][:nomark][:rows=#rowset][:odad][:fs=fieldsep][:time]
[:rs=recsep][:quick][:splitby=column][:parallel=number][:max=#max_rec]
[:print=[I][D][C]][:ns=nullstring][:es=emptystring][:fwc][:uncommitted]
[:pre={@sqlfile}|{[sqlcmd]}][:post={@sqlfile}|{[sqlcmd]}][tpar=#tables]
Detailed Descriptions of the Copy Operators:
Diff Operator | Meaning |
---|---|
|
Defines the source table(s). You can use: |
|
Target table. |
|
Define show to order records extracted from both source and target table. If you do not specify any key column, then odb uses the Primary Key. |
|
Output file where the differences are reported. You can use stdout to print odb
output on the standard output. A |
|
Field separator of the output file. You can define the field separator as follows: |
|
Record separator used in the output file. You can define the
record separator the same way as the field separator. |
|
The max number of records to compare. Default is to compare all records. |
|
Defines the size of the I/O buffer for each extraction thread. You
can define the size of this buffer in two different ways: |
|
How odb represents NULL values in the output file. Default is the empty string (two field separators one after the other) |
|
How odb represents VARCHAR empty strings (NOT NULL with zero length) values in the output file. Default is the empty string (two field separators one after the other) |
|
Don’t print the number of records extracted so far by each thread. |
|
Stop On Error. odb stops as soon as it encounters an error. |
|
odb uses as many threads triplets (extract from source, extract from target, compare) as the parallel argument. |
|
odb adds |
|
Lets you to use parallel extract from any database. |
|
Specifies which rows are printed in the output file: |
|
odb runs a single instance of either a |
|
odb runs a single instance of either a |
|
The number of tables to compare in parallel when you have a list of tables in input. |
|
odb uses |
|
This option is used in conjunction with parallel to diff only records
satisfying the where condition. |
|
Limits the comparison to the columns in the key option (PK by default). This is a fast way to check for new/missing records but it will not find rows with differences in non-key columns. |
|
odb prints a timeline. (Milliseconds from starts) |
7. odb as a Query Driver (Technology Preview)
7.1. Getting CSV Output
It’s often handy to get a CSV output ready to be imported into your spreadsheet while running performance tests. You can easily get this kind
of output with –c
odb option.
Example
$ ./odb64luo -u mauro -p xxx -d pglocal -x 3:"select count(*) from tpch.region" \
-f 5:Q01.sql -f 3:Q02.sql -T 4 -q –c
This command runs:
-
Three copies of the
select count(): -x 3:”select count() from tpch.region”
-
Five copies of
Q01.sql: -f 5:Q01.sql
-
Three copies of
Q02: -f 3:Q02.sql
-
Queuing the resulting 11 executions into four threads:
-T 4
-
Omitting query text and query results (
-q
is equivalent to–q all
):-q
-
Printing a CSV output:
–c
The command produces the following output:
odb [2011-12-12 08:08:43]: starting (4) threads...
Thread id,Proc id,Thread Exec#,Script
Cmd#,File,Label,Command,Rows,Rsds,Prepare(s),Exec(s),1st
Fetch(s),Fetch(s),Total(s),STimeline,ETimeline
1,1,0,0,(none),,"select count() from tpch.region",1,20,0.000,0.109,0.000,0.000,0.109,94,203
0,0,0,0,(none),,"select count() from tpch.region",1,20,0.000,0.125,0.000,0.000,0.125,94,219
2,2,0,0,(none),,"select count() from tpch.region",1,20,0.000,0.109,0.000,0.000,0.109,110,219
2,6,1,0,Q01.sql,,"SELECT L_RETURNFLAG, L_LINESTATUS,SUM(L_QUANTITY)>",4,234,0.000,136.297,0.000,0.000,136.297,141,136438
2,10,2,0,Q02.sql,,"SELECT S_ACCTBAL, S_NAME, N_NAME,P_PARTKEY,P_MF>",0,274,0.000,0.468,0.000,0.016,0.484,136438,136922
0,4,1,0,Q01.sql,,"SELECT L_RETURNFLAG, L_LINESTATUS,SUM(L_QUANTITY)>",4,234,0.000,139.667,0.016,0.016,139.683,0,139683
0,8,2,0,Q02.sql,,"SELECT S_ACCTBAL, S_NAME, N_NAME, P_PARTKEY,P_MFG>",0,274,0.000,0.015,0.000,0.000,0.015,139683,139698
1,5,1,0,Q01.sql,,"SELECT L_RETURNFLAG, L_LINESTATUS,SUM(L_QUANTITY)>",4,234,0.000,144.347,0.015,0.015,144.362,141,144503
1,9,2,0,Q02.sql,,"SELECT S_ACCTBAL, S_NAME, N_NAME, P_PARTKEY,P_MFG>",0,274,0.000,0.000,0.000,0.016,0.016,144503,144519
3,3,0,0,Q01.sql,,"SELECT L_RETURNFLAG, L_LINESTATUS,SUM(L_QUANTITY)>",4,234,0.000,144.394,0.016,0.016,144.410,390,144800
3,7,1,0,Q01.sql,,"SELECT L_RETURNFLAG, L_LINESTATUS,SUM(L_QUANTITY)>",4,234,0.000,69.373,0.000,0.000,69.373,144800,214173
odb statistics:
Init timestamp: 2011-12-12 08:08:42
Start timestamp: 2011-12-12 08:08:43
End timestamp: 2011-12-12 08:12:17
Elapsed [Start->End] (s): 214.173
----
The CSV output columns have the following meaning:
Column | Meaning |
---|---|
|
Thread ID. Number of threads limited to 4 —> thread id values are 0, 1, 2, 3 |
|
Execution number. 11 executions in the 0-10 range. |
|
Progressive number (starting from 0) of execution for a specific thread. |
|
If your script contains multiple SQL statement, then they are numbered starting from zero. |
|
Script file name or |
|
The label assigned though |
|
First 30 characters of the SQL command. It will end with |
|
The number of returned rows. Not printed if you used |
|
Record Set Display Size. Gives you an idea of how big the result set is. |
|
Prepare (compile) time in seconds. |
|
Execution time in seconds. |
|
Time needed to fetch the first row in seconds. |
|
Total Fetch time in seconds. |
|
Total query elapsed time from prepare to fetch in seconds. |
|
Queries start time line in milliseconds. |
|
Queries end time line in milliseconds. |
7.2. Assign Label to a Query
Sometimes it’s not easy to recognize a query by reading the first 30 characters. Therefore, odb lets you assign a label to a generic query using:
SET QLABEL <label>
Example
~/Devel/odb $ cat script.sql
-- {project-name} TPC-H Query 1 SET QLABEL Q01
SELECT
L_RETURNFLAG
, L_LINESTATUS
, SUM(L_QUANTITY) AS SUM_QTY
...
-- TPC-H/TPC-R Minimum Cost Supplier Query (Q2)
SET QLABEL Q02
SELECT
S_ACCTBAL
, S_NAME
...
Running this script includes the Query Label in the CSV output:
~/Devel/odb $ ./odb64luo -u mauro -p xxx -d pglocal -f script.sql -q -c
odb [2011-12-12 09:06:28]: starting (1) threads...
Thread id,Proc id,Thread Exec#,Script
Cmd#,File,Label,Command,Rows,Rsds,Prepare(s),Exec(s),1st
Fetch(s),Fetch(s),Total(s),STimeline,ETimeline
0,0,0,0,script.sql,Q01,"SELECT L_RETURNFLAG, L_LINESTATUS,
SUM(L_QUANTITY)>",4,234,0.000,43.102,0.000,0.000,43.102,0,43102
0,0,0,1,script.sql,Q02,"SELECT S_ACCTBAL, S_NAME, N_NAME, P_PARTKEY,
P_MFG>",0,274,0.000,0.016,0.000,0.000,0.016,43102,43118
odb statistics:
Init timestamp: 2011-12-12 09:06:28
Start timestamp: 2011-12-12 09:06:28
End timestamp: 2011-12-12 09:07:11
Elapsed [Start->End] (s): 43.118
7.3. Run All Scripts With a Given Path
Using -S <path>
or -P <path>
options you can run all scripts with a given path
(for example, all files in a directory) either serially (-S
) or in parallel (-P
).
Both options let you to use multiplying factors to run all scripts multiple times.
This multiplying factors are defined with a <number>:
preceding the script path.
Examples
odb Command Line | Action |
---|---|
|
Executes serially all scripts with extension |
|
Runs in parallel all files under |
|
Runs in parallel three times ( |
|
Runs in parallel three times ( |
To understand the difference between standard and round-robin assignments, imagine you have four scripts in the target path. This is how the executions will be assigned to threads:
Standard Assignment (es. -P 3:) |
Round-Robin Assignment (es. -P -3:) |
|||||
---|---|---|---|---|---|---|
Thread 1 |
Thread 2 |
Thread 3 |
Thread 1 |
Thread 2 |
Thread 3 |
|
nth execution |
… |
… |
… |
… |
||
4th execution |
|
|
|
|
|
|
3rd execution |
|
|
|
|
|
|
2nd execution |
|
|
|
|
|
|
1st execution |
|
|
|
|
|
|
7.4. Randomizing Execution Order
You can use the -Z
option to shuffle the odb internal execution table.
This way the execution order is not predictable.
Examples
odb Command Line | Action |
---|---|
|
Executes three times ( |
|
Executes three times ( |
7.5. Defining a Timeout
You can stop odb after a given timeout (assuming the execution is not already completed) using -maxtime <seconds>
option.
Example
~/Devel/odb $ ./odb64luo -S /home/mauro/scripts/*.sql –maxtime 7200
The command executes, serially,( all scripts with extension .sql
under
/home/mauro/scripts/
; if the execution is not completed after two hours (7200 seconds), then odb stops.
7.6. Simulating User Thinking Time
You can simulate user thinking time using the -ttime <delay>
option.
This argument introduces a <delay>
millisecond pause between two consecutive executions in the same thread.
Example
~/src/C/odb $ ./odb64luo -f 5:script1.sql -c -q -ttime 75 -T 2
This command runs five times script1.sql
using two threads. Each thread waits 75 milliseconds before starting
the next execution within a thread. You can also use a random thinking time in a given min:max
range.
Example
The following command starts commands within a thread with a random delay between 50 and 500 milliseconds:
~/src/C/odb $ ./odb64luo -f 5:script1.sql -c -q -ttime 50:500 -T 2
7.7. Starting Threads Gracefully
You might want to wait a little before starting the next thread. This can be obtained using the -delay
option.
Example
~/src/C/odb $ ./odb64luo -f 5:script1.sql -c -q -delay 200 -T 2
This command runs five times script1.sql
using two threads. Each thread will be started 200 milliseconds after the other.
-delay introduces a delay during threads start-up while –ttime introduces a delay between one command and another within the same
thread.
|
7.8. Re-looping a Given Workload
Using -L
option you can re-loop the workload defined through -x
, -f
, -P
, and -S
commands a given number of times.
Each thread will re-loop the same number of times.
Example
~/src/C/odb $ *./*odb64luo -f 5:script1.sql -c -q -M 75 -T 2 -L 3
re-loops three times (-L 3
) the same five executions, using two threads (-T 2
) with a 75 millisecond pause (-M 75
) between two
consecutive executions in the same thread.
8. odb as a SQL Interpreter (Technology Preview)
To start the odb SQL Interpreter you have to use -I
(uppercase i) switch with an optional argument.
Example
$ odb64luo –u user –p xx –d dsn –I MFTEST
The optional -I
argument (MFTEST
in this example) is used to specify the .odbrc
section containing
commands to be automatically executed when odb starts. See <sql_run_commands, Run Commands When Interpreter Starts>>.
8.1. Main odb SQL Interpreter Features
-
It uses
mreadline
library to manage command line editing and history. History will keep track of the whole command, not just… lines: if you enter a SQL command in more than one line:S01_Maurizio@TRAFODION64[MFTEST]SQL> select S01_Maurizio@TRAFODION64[MFTEST]...> count() S01_Maurizio@TRAFODION64[MFTEST]...> from S01_Maurizio@TRAFODION64[MFTEST]...> t1;
When you press the up arrow key the whole command (up to semi-colon) will be ready for editing and/or re-run.
mreadline
provides several useful extra features:-
CTRL-V to edit the current command using your preferred editor (
$EDITOR
is used). When the editing session is closed the current command is automatically updated. -
CTRL-U/CTRL-L to change the command case.
-
CTRL-X to kill the current command.
-
See on-line help for the other
mreadline
commands.
-
-
History is saved when you exit the SQL Interpreter in a file identified by the
ODB_HIST
environment variable. You can change the number of commands saved in the history file (default100
):S01_Maurizio@TRAFODION64[MFTEST]SQL> SET HIST 200
-
Customizable prompt. You can personalize your prompt through the set prompt command. Under Unix/Linux/Cygwin you can use the standard ANSI codes to create color prompts. See Customize Interpreter Prompt.
-
Multi-threaded odb instances can be run from within the single-threaded Interpreter with the
odb
keyword. This runs another odb instance using the same credentials, data source, and connection attributes used to start the interpreter:S01_Maurizio@TRAFODION64[MFTEST]SQL> odb -l src=myfile:tgt=mytable:parallel=8:... S01_Maurizio@TRAFODION64[MFTEST]SQL> odb -e src=mytable:tgt=myfile:parallel=8:...
-
Define Aliases with parameter substitution.
Example
root@MFDB[MFDB]SQL> SET ALIAS count "SELECT ROW COUNT FROM &1;"
When you call the alias
count
the first argument will be substituted to&1
. You can use up to nine positional parameters (&1
to&9
). -
You can run operating system commands with
!command
. -
You can run scripts with
@script
. -
You can spool to file with
set spool <myfile>
and stop spooling withset spool off
. -
You can switch to a special prepare only mode with
set prepare on
. This way, commands you type will be just prepared, not executed. -
Different databases use different commands to set default schema(s):
-
Trafodion:
set schema <name>;
-
MySQL:
use <name>;
-
PostgreSQL/Vertica:
set search_path to <name1,name2,…>;
-
Teradata:
set database <name>;
set chsch <command>
is used to define database specific commands to change your schema. When odb recognize thechange schema
command it will update accordingly internal catalog (if any) and schema names.
-
-
To list database objects, you can use
ls
command.Examples
S01_Maurizio@MFTEST[MFTEST]SQL> ls . # list all objects in the current schema TABLE : CITIES TABLE : CUSTOMER TABLE : LINEITEM TABLE : NATION TABLE : ORDERS TABLE : PART TABLE : PARTSUPP TABLE : REGION TABLE : SUPPLIER TABLE : T1 VIEW : V_CITIES S01_Maurizio@MFTEST[MFTEST]SQL> ls -t %S << list tables (-t) ending with S CITIES ORDERS S01_Maurizio@MFTEST[MFTEST]SQL> ls -v << list views (-v) V_CITIES S01_Maurizio@MFTEST[MFTEST]SQL> ls -s << list schemas (-s) ... and so on ...
-
To get tables DDL, you can use either
ls –T <table>
orls -D <table>
.Examples
mauro pglocal[PUBLIC] (09:12:56) SQL> ls -T tpch.orders Describing: postgres.TPCH.orders +---------------+-------------+----+-------+---------------+ |COLUMN |TYPE |NULL|DEFAULT|INDEX | +---------------+-------------+----+-------+---------------+ |o_orderkey |int8 |NO | |orders_pkey 1 U| |o_custkey |int8 |NO | | | |o_orderstatus |bpchar(1) |NO | | | |o_totalprice |numeric(15,2)|NO | | | |o_orderdate |date |NO | | | |o_orderpriority|bpchar(15) |NO | | | |o_clerk |bpchar(15) |NO | | | |o_shippriority |int4 |NO | | | |o_comment |varchar(80) |NO | | | +---------------+-------------+----+-------+---------------+ mauro pglocal[PUBLIC] (09:13:20) SQL> ls -D tpch.orders CREATE TABLE postgres.TPCH.orders ( o_orderkey int8 ,o_custkey int8 ,o_orderstatus bpchar(1) ,o_totalprice numeric(15,2) ,o_orderdate date ,o_orderpriority bpchar(15) ,o_clerk bpchar(15) ,o_shippriority int4 ,o_comment varchar(80) ,primary key (o_orderkey) );
-
You can define your own variables or use odb internal variables or environment variables directly from the Interpreter.
-
You can
set pad fit
to automatically shrink CHAR/VARCHAR fields in order to fit one record in one line. Line length is defined throughset scols #
. Each record will be printed in one line truncating the length of CHAR/VARCHAR fields proportionally to their original display size length. In case of field truncation a>
character will be printed at the end of the truncated string.Example
MFELICI [MAURIZIO] (03:30:32) SQL> select [first 5] from part; P_PARTKEY|P_NAME |P_MFGR |P_BRAND|P_TYPE | P_SIZE |P_CONTAINER|P_RETAILPRICE|P_COMMENT ---------+-------------------------------+---------------+-------+---------------+----------+-------+----------------+-------------- 33 |maroon beige mint cyan peru |Manufacturer#2>|Brand#>|ECONOMY PLATED>| 16|LG PKG>| 933.03|ly eve 39 |rose dodger lace peru floral |Manufacturer#5>|Brand#>|SMALLPOLISHED> | 43|JUMBO >| 939.03|se slowly abo> 60 |sky burnished salmon navajo hot|Manufacturer#1>|Brand#>|LARGE POLISHED>| 27|JUMBO >| 960.06| integ 81 |misty salmon cornflower dark f>|Manufacturer#5>|Brand#>|ECONOMY BRUSHE>| 21|MED BA>| 981.08|ove the furious 136 |cornsilk blush powder tan rose |Manufacturer#2>|Brand#>|SMALL PLATED S>| 2|WRAP B>| 1036.13|kages print c>
-
You can
set plm
to print one field per row. This is useful when you have to carefully analyze few records.Example
MFELICI [MAURIZIO] (03:38:12) SQL> SET PLM ON MFELICI [MAURIZIO] (03:38:12) SQL> select * from part where p_partkey =136; P_PARTKEY 136 P_NAME :cornsilk blush powder tan rose P_MFGR :Manufacturer#2 P_BRAND :Brand#22 P_TYPE :SMALL PLATED STEEL P_SIZE 2 P_CONTAINER :WRAP BAG P_RETAILPRICE:1036.13 P_COMMENT :kages print carefully
-
Check the rest on your own.
8.1.1. odb SQL Interpreter help
`mauro pglocal[PUBLIC] (06:51:20) SQL>` *help
All the following are case insensitive:
h | help : print this help
i | info : print database info
q | quit : exit SQL Interpreter
c | connect { no | [user[/pswd][;opts;…] (re/dis)connect using previous or new user
odb odb_command : will run an odb instance using the same DSN/credentials
ls -[type] [pattern] : list objects. Type=(t)ables, (v)iews, s(y)nonyns, (s)chemas
: (c)atalogs, syst(e)m tables, (l)ocal temp, (g)lobal temp
: (m)at views, (M)mat view groups, (a)lias, (A)ll object types
: (D)table DDL, (T)table desc
print <string> : print <string>
!cmd : execute the operating system cmd
@file [&0]...[&9] : execute the sql script in file
set : show all settings
set alias [name] [cmd|-]: show/set/change/delete aliases
set chsch [cmd] : show/set change schema command
set cols [#cols] : show/set ls number of columns
set cwd [<directory>] : show/set current working directory
set drs [on|off] : show/enable/disable describe result set mode
set fs [<char>] : show/set file field separator
set hist [#lines] : show/set lines saved in the history file
set maxfetch [#rows] : show/set max lines to be fetched (-1 = unlimited)
set nocatalog [on|off] : show/enable/disable "no catalog" database mode)
set nocatnull [on|off] : show/enable/disable "no catalog as null" database mode)
set noschema [on|off] : show/enable/disable "no schema" database mode)
set nullstr [<string>] : show/set string used to display NULLs ( to make it Null)
set pad [fit|full|off] : show/set column padding
set param name [value|-]: show/set/change/delete a parameter
set pcn [on|off] : show/enable/disable printing column names
set plm [on|off] : show/enable/disable print list mode (one col/row)
set prepare [on|off] : show/enable/disable 'prepare only' mode
set prompt [string] : show/set prompt string
set query_timeout [s] : show/set query timeout in seconds (def = 0 no timeout)
set quiet [cmd|res|all|off] : show/enable/disable quiet mode
set rowset [#] : show/set rowset used to fetch rows
set soe [on|off] : show/enable/disable Stop On Error mode
set spool [<file>|off] : show/enable/disable spooling output on <file>
<SQL statement>; : everything ending with ';' is sent to the database
mreadline keys:
Control-A : move to beginning of line Control-P : history Previous
Control-E : move to end of line Up Arrow : history Previous
Control-B : move cursor Back Control-N : history Next
Left Arrow : move cursor Back Down Arrow : history Next
Control-F : move cursor Forward Control-W : history List
Right Arrow: move cursor Forward Control-R : Redraw
Control-D : input end (exit) - DEL right Control-V : Edit current line
Control-L : Lowercase Line Control-X : Kill line
Control-U : Uppercase Line # Control-G : load history entry #
8.2. Run Commands When the Interpreter Starts
When the odb SQL Interpreter starts it looks for the Initialization File. This Initialization File is made of Sections containing the commands to be executed.
To find the Initialization File, odb checks the ODB_INI
environment variable. If this variable is not set,
then odb looks for a file named .odbrc
(*nix) or _odbrc
(Windows) under your HOME directory.
The Initialization File contains Sections identified by names between square brackets.
For example, the following section is named MFTEST
:
[MFTEST]
set pcn on
set pad fit
set fs |
set cols 3 30
set editor "vim -n --noplugin"
set efile /home/felici/.odbedit.sql set prompt "%U %D [%S] (%T) %M> "
set alias count "select row count from &1;"
set alias size "select sum(current_eof) from table (disk label statistics (&1) );"
set alias ll "select left(object_name, 40) as object_name, sum(row_count) as nrows,
count(partition_num) as Nparts, sum(current_eof) as eof from table(disk label statistics(
using (select from (get tables in schema &catalog.&schema, no header, return full names)
s(b) ))) group by object_name order by object_name;"
set schema TRAFODION.MAURIZIO;
the odb SQL Interpreter automatically runs all commands in the section identified by the -I
argument (for example -I MFTEST
).
A section named DEFAULT
will be executed when -I
has no arguments.
8.3. Customizing the Interpreter Prompt
You can define your prompt through the set prompt
command when running the SQL Interpreter. set prompt
can be
executed interactively or included in your ($ODB_INI
) Initialization File. set prompt
recognizes and expands
the following variables:
-
%U
—> User name -
%D
—> Data Source name -
%S
—> Schema name -
%T
—> Current Time -
%M
—> odb mode:SQL
when running sql commandsPRE
if you’re in "prepare only" modeSPO
if you are spooling output somewhereNDC
(No Database Connection)
Example
SET PROMPT "Prompt for %U connected via %D to %S in %M mode > "
Generates the following prompt:
Prompt for S01_Maurizio connected via CIV to CIV03 in SQL mode >
Under Cygwin, Unix and Linux (and probably under Windows too using ANSI.SYS driver - not tested), you can use standard ANSI escape color codes.
Example
set prompt "\^A^[[01;32m\^A%U@%D^A\^[[01;34m^A[%S]\^A^[[00m\ ^A (%T) %M> "
Where:
-
^A is a real Control-A (ASCII 001 and 002) before and after each color code sequence.
-
^[ is a real Escape Character. The meaning of the ANSI color codes are:
^[[01;32m —> green
^[[01;34m —> blue
^[[00m —> reset.
Example Prompt
9. Appendixes
9.1. A. Troubleshooting
-
odb uses Condition Variables to synchronize threads during copy and parallel load operations.
-
Most of the memory allocation operations are dynamic. For example, you can execute an SQL command as long as you want. However, you can hard code limits as follows:
#define MAX_VNLEN 32 /* Max variable name length */ #define MAXCOL_LEN 128 /* Max column name length */ #define MAXOBJ_LEN 128 /* Max catalog/schema/table name length */ #define MAX_CLV 64 /* Max command line variables (-var) */
-
Some Linux/UNIX systems (notably the Linux Loader) have huge default stack size. Due to this extremely large value, you can have errors like this when starting tens/hundreds of threads:
Error starting cmd thread #: cannot allocate memory
If you get this error, then check your default stack size:
$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited max nice (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 137215 max locked memory (kbytes, -l) 32 max memory size (kbytes, -m) unlimited open files (-n) 65536 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 max rt priority (-r) 0 stack size (kbytes, -s) 204800 cpu time (seconds, -t) unlimited max user processes (-u) 2047 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
In the example above, the
stack size
value is the problem. Reset it to a reasonable value. (The value will be reset to the initial value when you start a new session).Example
$ ulimit -s 4096
-
If you get errors such as:
C:\Users\felici> odb64luo -u xx -p yy -d oraxe -l src=region.tbl.gz:tgt=region:fs=^|:truncate odb [2012-10-11 13:27:22]: starting ODBC connection(s)... 0 [0] odb(5020) - [Oracle][ODBC]Optional feature not implemented. (State: HYC00 Native Err: 0)
Try adding
-nocatnull
to your command line. When the back-end database doesn’t use catalogs, then you should use an empty string as catalog name.Some flawed ODBC Drivers unfortunately want NULL here — instead of empty strings as it should be. -
You can have errors loading
TIME(N)
fields with` N>0` under Trafodion because the ODBC Driver does not manage the field display size whenN>0
. -
If you have problems starting odb on Unix/Linux check:
-
The shared library dependencies with
ldd <odb_executable_name>
. -
The shared lib path defined in the following environment variables used by the shared library loader:
-
Linux:
LD_LIBRARY_PATH
-
IBM IAX:
LIBPATH
(not currently supported) -
HP/UX:
SHLIB_PATH
(not currently supported)
-
-
9.2. B. Develop and Test odb
9.2.1. Develop
odb is coded in "ANSI C" (K&R programming style) and is compiled in a 64-bit version on the Linux platform, linked to the unixODBC driver manager. Other platforms, compilers, and ODBC libraries have not yet been tested.
Platform | Compiler | ODBC Libraries | Note |
---|---|---|---|
Linux |
gcc |
unixODBC (supported), iODBC (not currently supported), Data Direct (not currently supported), Teradata (not currently supported) |
64 bit (32 bit is not currently supported) |
MS-Windows |
Visual Studio (not tested) |
MS-Windows (not tested) |
64 bit |
C compilers are set with “all warnings” enabled and odb has to compile, on each platform, with no errors (of course) AND no warnings. Tools used to code odb:
-
vim (http://www.vim.org) as editor (or Visual Studio embedded editor)
-
splint (http://www.splint.org) to statically check the source code
9.2.2. Test
The info, load, extract, and copy operations of odb have been fully tested. In addition, odb had been tested using a set of 137 standard tests to check functionalities and identify memory/thread issues.