Data Migration From GaussDB to GBase8a

Exporting Data From GaussDB

Comparison of Export Methods

Export Tool Export Steps Applicable Scenarios and Notes
Using GDS Tool to Export Data to a Regular File System

Note: The GDS tool must be installed on the server where the data files are exported
Remote Export Mode: Export business data from the cluster to an external host.
1. Plan the export path, create GDS operation users, and set write permissions for the GDS user on the export path.
2. Install, configure, and start GDS on the server where data will be exported.
3. Create an external table in the cluster, with the location path in the format "gsfs://192.168.0.90:5000/".

Local Export Mode: Export business data from the cluster to the host where the cluster nodes are located. This strategy is tailored for numerous small files.
1. Plan the export path and create directories to store exported data files on each DN in the cluster, such as "/output_data", and change the owner of this path to omm.
2. Install, configure, and start GDS on the server where data will be exported.
3. Create an external table in the cluster, with the location path in the format "file:///output_data/".
GDS tools suitable for scenarios with high concurrency and large data exports. Utilizes multi-DN parallelism to export data from the database to data files, improving overall export performance. Does not support direct export to HDFS file system.

Notes on Remote Export:
1. Supports concurrent export by multiple GDS services, but one GDS can only provide export services for one cluster at a time.
2. Configure GDS services within the same intranet as the cluster nodes. Export speed is affected by network bandwidth. The recommended network configuration is 10GE.
3. Supported data file formats: TEXT, CSV, and FIXED. Single-row data size must be <1GB.

Notes on Local Export:
1. Data will be evenly split and generated in the specified folders on the cluster nodes, occupying disk space on the cluster nodes.
2. Supports data file formats: TEXT, CSV, and FIXED. Single-row data size must be <1GB.
gs_dump and gs_dumpall Tools
gs_dump supports exporting a single database or its objects.
gs_dumpall supports exporting all databases in the cluster or common global objects in each database.
The tools support exporting content at the database level, schema level, and second level. Each level can be separately defined to export the entire content, only object definitions, or only data files.
Step 1: The omm operating system user logs into any host with MPPDB service installed and executes: source $ {BIGDATA_HOME}/mppdb/.mppdb

gs_profile command to start environment variables

Step 2: Use gs_dump to export the postgres database: gs_dump -W Bigdata@123 -U jack -f /home/omm/backup/postgres_backup.tar -p 25308 postgres -F t
1. Export the entire database information, including data and all object definitions.
2. Export the full information of all databases, including each database in the cluster and common global objects (including roles and tablespace information).
3. Export only all object definitions, including tablespace, database definitions, function definitions, schema definitions, table definitions, index definitions, and stored procedure definitions.
4. Export only data, excluding all object definitions.


GDS External Table Remote Export Example

Shell
 
mkdir -p /output_data 
groupadd gdsgrp 
useradd -g gdsgrp gds_user 
chown -R gds_user:gdsgrp /output_data
/opt/bin/gds/gds -d /output_data -p 192.168.0.90:5000 -H 10.10.0.1/24 -D 
CREATE FOREIGN TABLE foreign_tpcds_reasons 
( 
r_reason_sk integer not null, 
r_reason_id char(16) not null, 
r_reason_desc char(100) 
) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://192.168.0.90:5000/', FORMAT 'CSV',ENCODING 
'utf8',DELIMITER E'\x08', QUOTE E'\x1b', NULL '') WRITE ONLY; 
INSERT INTO foreign_tpcds_reasons SELECT * FROM reasons; 
ps -ef|grep gds 
gds_user 128954 1 0 15:03 ? 00:00:00 gds -d /output_data -p 192.168.0.90:5000 -D 
gds_user 129003 118723 0 15:04 pts/0 00:00:00 grep gds 
kill -9 128954 


GDS External Table Local Export Example

Shell
 
mkdir -p /output_data 
chown -R omm:wheel /output_data 
CREATE FOREIGN TABLE foreign_tpcds_reasons 
( 
r_reason_sk integer not null, 
r_reason_id char(16) not null, 
r_reason_desc char(100) 
) SERVER gsmpp_server OPTIONS (LOCATION 'file:///output_data/', FORMAT 'CSV',ENCODING 
'utf8', DELIMITER E'\x08', QUOTE E'\x1b', NULL '') WRITE ONLY; 
INSERT INTO foreign_tpcds_reasons SELECT * FROM reasons; 


gs_dumpall Export Example

Export all global tablespace and user information of all databases (omm user as the administrator), the export file is in text format.

Shell
 
gs_dumpall -W Bigdata@123 -U omm -f /home/omm/backup/MPPDB_globals.sql -p 25308 -g 
gs_dumpall[port='25308'][2018-11-14 19:06:24]: dumpall operation successful 
gs_dumpall[port='25308'][2018-11-14 19:06:24]: total time: 1150 ms


Export all database information (omm user as the administrator), the export file is in text format. After executing the command, there will be a long printout, and finally, when "total time" appears, it means the execution was successful.

Shell
 
gs_dumpall -W Bigdata@123 -U omm -f /home/omm/backup/MPPDB_backup.sql -p 25308 
gs_dumpall[port='25308'][2017-07-21 15:57:31]: dumpall operation successful 
gs_dumpall[port='25308'][2017-07-21 15:57:31]: total time: 9627 ms 


Export all database definitions (omm user as the administrator), the export file is in text format.

Shell
 
gs_dumpall -W Bigdata@123 -U omm -f /home/omm/backup/MPPDB_backup.sql -p 25308 -s 
gs_dumpall[port='25308'][2018-11-14 11:28:14]: dumpall operation successful 
gs_dumpall[port='25308'][2018-11-14 11:28:14]: total time: 4147 ms 


GBase 8a MPP Data Import

Execute SQL File To Import Database Definitions

Shell
 
gccli -ugbase -pgbase20110531 -Dtestdb -vvv -f <guessdb_out.sql >>guessdb_out.result  2>guessdb_out.err


Note: The -D parameter must be followed by an existing database within the GBase cluster. The executed guessdb_out.sql file will operate according to the databases specified within the SQL file, regardless of the database specified after the -D parameter.

GBase 8a MPP Import Text Data

Step 1

The data server where the data exported from GaussDB is located needs to be configured with FTP service. Ensure that all nodes in the GBase 8a MPP cluster can access the data files on the data server via FTP.

Step 2

Organize the characteristics of the data files exported from GaussDB:

Step 3

Based on the characteristics organized in Step 2, write and execute the SQL for importing data in GBase 8a MPP.

Syntax format:

Shell
 
LOAD DATA INFILE 'file_list' 
INTO TABLE [dbname.]tbl_name 
[options] 
options: 
[CHARACTER SET charset_name] 
[DATA_FORMAT number [HAVING LINES SEPARATOR]] 
[NULL_VALUE 'string'] 
[FIELDS 
[TERMINATED BY 'string'] 
[ENCLOSED BY 'string'] 
[PRESERVE BLANKS] 
[AUTOFILL] 
[LENGTH 'string'] 
[TABLE_FIELDS 'string'] 
] 
[LINES 
[TERMINATED BY 'string'] 
] 
[MAX_BAD_RECORDS number] 
[DATETIME FORMAT format]
[DATE FORMAT format] 
[TIMESTAMP FORMAT format] 
[TIME FORMAT format] 
[TRACE number] 
[TRACE_PATH 'string'] 
[NOSPLIT] 
[PARALLEL number] 
[MAX_DATA_PROCESSORS number] 
[MIN_CHUNK_SIZE number] 
[SKIP_BAD_FILE number] 
[SET col_name = value[,...]] 
[IGNORE NUM LINES] 
[FILE_FORMAT format] 


Load Examples

Multi-data file load:

Shell
 
gbase> LOAD DATA INFILE 'ftp://192.168.0.1/pub/lineitem.tbl, 
http://192.168.0.2/lineitem.tbl' INTO TABLE test.lineitem FIELDS 
TERMINATED BY '|' ENCLOSED BY '"' LINES TERMINATED BY '\n';


Import statement with wildcards for multiple files:

Shell
 
gbase> LOAD DATA INFILE 'ftp://192.168.10.114/data/*' INTO TABLE test.t; 


Import statement with column, row delimiters, and enclosing characters:

Shell
 
gbase> LOAD DATA INFILE 'ftp://192.168.0.1/pub/lineitem.tbl' INTO TABLE test.lineitem FIELDS TERMINATED BY '|' ENCLOSED BY '"' LINES TERMINATED BY '\n'


Import statement with date format:

Shell
 
load data infile 
'ftp://192.168.88.141/load_data/table_fields.tbl' into table test.t 
fields terminated by ',' table_fields 'i, vc, dt date "%H:%i:%s %Y-%m-%d", dt1 date "%Y-%m-%d %H:%i:%s"'; 


Import statement with auto-fill:

Shell
 
load data infile 'ftp://192.168.88.141/load_data/autofill.tbl' into table test.t fields terminated by '|' autofill;


Import statement with constant values:

Shell
 
gbase> Load data infile 'data.tbl' into table t fields terminated by '|' set c='2016-06-06 18:08:08',d='default',e=20.6; 


Import statement ignoring header:

Shell
 
gbase>load data infile ‘http://192.168.6.39/test.tbl’ into table data_test fields terminated by ‘|’ ignore 3 lines;


Import statement with Blob data:

Shell
 
gbase>load data infile ‘http://192.168.6.39/test.tbl’ into table 
data_test fields terminated by ‘|’ table_fields ‘a,b,c type_text,d’; 
gbase>load data infile ‘http://192.168.6.39/test.tbl’ into table 
data_test fields terminated by ‘|’ table_fields ‘a,b,c type_base64,d’; 
gbase>Load data infile ‘http://192.168.6.39/test.tbl’ into table 
data_test fields terminated by ‘|’ table_fields ‘a,b,c type_url,d’; 


 

 

 

 

Top