Data Migration From GaussDB to GBase8a
Exporting Data From GaussDB
Comparison of Export Methods
Export Tool | Export Steps | Applicable Scenarios and Notes |
---|---|---|
Using GDS Tool to Export Data to a Regular File System Note: The GDS tool must be installed on the server where the data files are exported |
Remote Export Mode: Export business data from the cluster to an external host. 1. Plan the export path, create GDS operation users, and set write permissions for the GDS user on the export path. 2. Install, configure, and start GDS on the server where data will be exported. 3. Create an external table in the cluster, with the location path in the format "gsfs://192.168.0.90:5000/". Local Export Mode: Export business data from the cluster to the host where the cluster nodes are located. This strategy is tailored for numerous small files. 1. Plan the export path and create directories to store exported data files on each DN in the cluster, such as "/output_data" , and change the owner of this path to omm .2. Install, configure, and start GDS on the server where data will be exported. 3. Create an external table in the cluster, with the location path in the format "file:///output_data/" . |
GDS tools suitable for scenarios with high concurrency and large data exports. Utilizes multi-DN parallelism to export data from the database to data files, improving overall export performance. Does not support direct export to HDFS file system. Notes on Remote Export: 1. Supports concurrent export by multiple GDS services, but one GDS can only provide export services for one cluster at a time. 2. Configure GDS services within the same intranet as the cluster nodes. Export speed is affected by network bandwidth. The recommended network configuration is 10GE. 3. Supported data file formats: TEXT, CSV, and FIXED. Single-row data size must be <1GB. Notes on Local Export: 1. Data will be evenly split and generated in the specified folders on the cluster nodes, occupying disk space on the cluster nodes. 2. Supports data file formats: TEXT, CSV, and FIXED. Single-row data size must be <1GB. |
gs_dump and gs_dumpall Toolsgs_dump supports exporting a single database or its objects.gs_dumpall supports exporting all databases in the cluster or common global objects in each database.The tools support exporting content at the database level, schema level, and second level. Each level can be separately defined to export the entire content, only object definitions, or only data files. |
Step 1: The omm operating system user logs into any host with MPPDB service installed and executes: source $ {BIGDATA_HOME}/mppdb/.mppdb gs_profile command to start environment variablesStep 2: Use gs_dump to export the postgres database: gs_dump -W Bigdata@123 -U jack -f /home/omm/backup/postgres_backup.tar -p 25308 postgres -F t |
1. Export the entire database information, including data and all object definitions. 2. Export the full information of all databases, including each database in the cluster and common global objects (including roles and tablespace information). 3. Export only all object definitions, including tablespace, database definitions, function definitions, schema definitions, table definitions, index definitions, and stored procedure definitions. 4. Export only data, excluding all object definitions. |
GDS External Table Remote Export Example
mkdir -p /output_data
groupadd gdsgrp
useradd -g gdsgrp gds_user
chown -R gds_user:gdsgrp /output_data
/opt/bin/gds/gds -d /output_data -p 192.168.0.90:5000 -H 10.10.0.1/24 -D
CREATE FOREIGN TABLE foreign_tpcds_reasons
(
r_reason_sk integer not null,
r_reason_id char(16) not null,
r_reason_desc char(100)
) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://192.168.0.90:5000/', FORMAT 'CSV',ENCODING
'utf8',DELIMITER E'\x08', QUOTE E'\x1b', NULL '') WRITE ONLY;
INSERT INTO foreign_tpcds_reasons SELECT * FROM reasons;
ps -ef|grep gds
gds_user 128954 1 0 15:03 ? 00:00:00 gds -d /output_data -p 192.168.0.90:5000 -D
gds_user 129003 118723 0 15:04 pts/0 00:00:00 grep gds
kill -9 128954
GDS External Table Local Export Example
mkdir -p /output_data
chown -R omm:wheel /output_data
CREATE FOREIGN TABLE foreign_tpcds_reasons
(
r_reason_sk integer not null,
r_reason_id char(16) not null,
r_reason_desc char(100)
) SERVER gsmpp_server OPTIONS (LOCATION 'file:///output_data/', FORMAT 'CSV',ENCODING
'utf8', DELIMITER E'\x08', QUOTE E'\x1b', NULL '') WRITE ONLY;
INSERT INTO foreign_tpcds_reasons SELECT * FROM reasons;
gs_dumpall
Export Example
Export all global tablespace and user information of all databases (omm user as the administrator), the export file is in text format.
gs_dumpall -W Bigdata@123 -U omm -f /home/omm/backup/MPPDB_globals.sql -p 25308 -g
gs_dumpall[port='25308'][2018-11-14 19:06:24]: dumpall operation successful
gs_dumpall[port='25308'][2018-11-14 19:06:24]: total time: 1150 ms
Export all database information (omm user as the administrator), the export file is in text format. After executing the command, there will be a long printout, and finally, when "total time" appears, it means the execution was successful.
gs_dumpall -W Bigdata@123 -U omm -f /home/omm/backup/MPPDB_backup.sql -p 25308
gs_dumpall[port='25308'][2017-07-21 15:57:31]: dumpall operation successful
gs_dumpall[port='25308'][2017-07-21 15:57:31]: total time: 9627 ms
Export all database definitions (omm user as the administrator), the export file is in text format.
gs_dumpall -W Bigdata@123 -U omm -f /home/omm/backup/MPPDB_backup.sql -p 25308 -s
gs_dumpall[port='25308'][2018-11-14 11:28:14]: dumpall operation successful
gs_dumpall[port='25308'][2018-11-14 11:28:14]: total time: 4147 ms
GBase 8a MPP Data Import
Execute SQL File To Import Database Definitions
gccli -ugbase -pgbase20110531 -Dtestdb -vvv -f <guessdb_out.sql >>guessdb_out.result 2>guessdb_out.err
Note: The -D parameter must be followed by an existing database within the GBase cluster. The executed guessdb_out.sql
file will operate according to the databases specified within the SQL file, regardless of the database specified after the -D parameter.
GBase 8a MPP Import Text Data
Step 1
The data server where the data exported from GaussDB is located needs to be configured with FTP service. Ensure that all nodes in the GBase 8a MPP cluster can access the data files on the data server via FTP.
Step 2
Organize the characteristics of the data files exported from GaussDB:
- Encoding format
- Field delimiter
- Quote character
- Null value in data files
- Escape character (default is double quotes)
- Whether the data file contains a header row
- Line break style of the exported data files
- Date format in date columns, etc.
Step 3
Based on the characteristics organized in Step 2, write and execute the SQL for importing data in GBase 8a MPP.
Syntax format:
LOAD DATA INFILE 'file_list'
INTO TABLE [dbname.]tbl_name
[options]
options:
[CHARACTER SET charset_name]
[DATA_FORMAT number [HAVING LINES SEPARATOR]]
[NULL_VALUE 'string']
[FIELDS
[TERMINATED BY 'string']
[ENCLOSED BY 'string']
[PRESERVE BLANKS]
[AUTOFILL]
[LENGTH 'string']
[TABLE_FIELDS 'string']
]
[LINES
[TERMINATED BY 'string']
]
[MAX_BAD_RECORDS number]
[DATETIME FORMAT format]
[DATE FORMAT format]
[TIMESTAMP FORMAT format]
[TIME FORMAT format]
[TRACE number]
[TRACE_PATH 'string']
[NOSPLIT]
[PARALLEL number]
[MAX_DATA_PROCESSORS number]
[MIN_CHUNK_SIZE number]
[SKIP_BAD_FILE number]
[SET col_name = value[,...]]
[IGNORE NUM LINES]
[FILE_FORMAT format]
Load Examples
Multi-data file load:
gbase> LOAD DATA INFILE 'ftp://192.168.0.1/pub/lineitem.tbl,
http://192.168.0.2/lineitem.tbl' INTO TABLE test.lineitem FIELDS
TERMINATED BY '|' ENCLOSED BY '"' LINES TERMINATED BY '\n';
Import statement with wildcards for multiple files:
gbase> LOAD DATA INFILE 'ftp://192.168.10.114/data/*' INTO TABLE test.t;
Import statement with column, row delimiters, and enclosing characters:
gbase> LOAD DATA INFILE 'ftp://192.168.0.1/pub/lineitem.tbl' INTO TABLE test.lineitem FIELDS TERMINATED BY '|' ENCLOSED BY '"' LINES TERMINATED BY '\n'
Import statement with date format:
load data infile
'ftp://192.168.88.141/load_data/table_fields.tbl' into table test.t
fields terminated by ',' table_fields 'i, vc, dt date "%H:%i:%s %Y-%m-%d", dt1 date "%Y-%m-%d %H:%i:%s"';
Import statement with auto-fill:
load data infile 'ftp://192.168.88.141/load_data/autofill.tbl' into table test.t fields terminated by '|' autofill;
Import statement with constant values:
gbase> Load data infile 'data.tbl' into table t fields terminated by '|' set c='2016-06-06 18:08:08',d='default',e=20.6;
Import statement ignoring header:
gbase>load data infile ‘http://192.168.6.39/test.tbl’ into table data_test fields terminated by ‘|’ ignore 3 lines;
Import statement with Blob data:
gbase>load data infile ‘http://192.168.6.39/test.tbl’ into table
data_test fields terminated by ‘|’ table_fields ‘a,b,c type_text,d’;
gbase>load data infile ‘http://192.168.6.39/test.tbl’ into table
data_test fields terminated by ‘|’ table_fields ‘a,b,c type_base64,d’;
gbase>Load data infile ‘http://192.168.6.39/test.tbl’ into table
data_test fields terminated by ‘|’ table_fields ‘a,b,c type_url,d’;