Quick and Dirty HP Vertica Schema Clone

Migrate one Schema fast and dirty from one Vertica Cluster to another. This approach is used when :

  • you don't have Vertica Cluster configuration the same in both Clusters and copy cluster won't work.
  • no INBOUND access is available to the source Vertica Database(Production security needs to be strict).
  • different Vertica Server Version.
  • to many constraints dependencies in the tables for the schema you wanna export, so vbr.py backup on the schema is to painful.
-anyway here it is:

 Step One 

  • export your schema catalog and create it on the target Vertica database.
[dbadmin@aodba]$ vsql
Password:
Welcome to vsql, the Vertica Analytic Database interactive terminal.

Type:  h or ? for help with vsql commands
       g or terminate with semicolon to execute query
       q to quit

- export ddl for the schema
dbadmin= SELECT EXPORT_TABLES('/tmp/testschema.sql','testschema');
            EXPORT_TABLES
-------------------------------------
 Catalog data exported successfully

(1 row)

-- execute the ddl schema import
dbadmin= ! vsql -U dbadmin -w password -h 111.133.444.222 -d db_name -f /tmp/testschema.sql
CREATE SCHEMA
CREATE SEQUENCE
CREATE SEQUENCE
CREATE SEQUENCE
CREATE SEQUENCE
CREATE SEQUENCE
CREATE SEQUENCE
CREATE SEQUENCE
CREATE SEQUENCE
CREATE SEQUENCE
CREATE SEQUENCE
CREATE SEQUENCE
CREATE SEQUENCE
CREATE SEQUENCE
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE

Step Two

  • run the following script to get the code that will export and import the data from one server to another.
The script has to receive the following parameters:
  • if you have DbVisualizer installed the $parameter_name$ will receive a value that you will need to input at run time.
  • if you run it by hand at the vsql command line you need to edit the script bellow with their respective values.
$source_passwd$ - source db database password $source_ip$ - source node ip address $source_db_name$ - source database name $source_schema$ - source schema that you wanna export $target_passwd$ - source db database password $target_ip$ - target node ip address $target_db_name$ - target database name $target_schema$ - target schema
SELECT DISTINCT
    'vsql -U dbadmin -w $source_passwd$ -h $source_ip$ -d $source_db_name$-At -c "SELECT * from $source_schema$.'
    ||table_name
    ||'"| vsql -U dbadmin -w $target_passwd$ -h $target_ip$ -d $target_db_name$ -c "DROP TABLE IF EXISTS  public.'
    ||table_name
    ||'_reject; truncate table '
    ||'$target_schema$'
    ||'.'
    ||table_name
    ||'; COPY '
    ||'$target_schema$'
    ||'.'
    ||table_name
    ||' FROM STDIN DELIMITER ''|'' direct null as '''' REJECTED DATA AS TABLE public.'
    ||table_name
    ||'_reject  NO ESCAPE;"; vsql -U dbadmin -w $target_passwd$ -h $target_ip$ -d $target_db_name$ -c "SELECT '''
    ||table_name
    ||'_reject'' as Table_Name , count(*) as Rejects ,CASE count(*) WHEN 0 THEN ''No Rejects'' ELSE ''You got rejects'' END FROM '
    ||'public.'
    ||table_name
    ||'_reject;"'
FROM
    tables
WHERE
    table_name IN (select table_name from tables where table_schema=$source_schema$)
The output of this sql will generate the code to migrate the data from one server to another. Here is a snippet of the output: Where i have used some dummy input values. So let me describe what the code will do: Maybe most will understand it fast, but ill go ahead and put my words in.
  1. The script will create connection to the source database followed by a "select * table "(this table name was picked-up dynamically by the script when we told him the schema name).
  2. The result of the query is piped to the source database connection when is picked up by the copy command as a STDIN input.
  3. In the command i state the the rejects will be stored in a table that will carry the same name as the target table followed by the _reject string.
  4. Finally after the load is complete the rejects table will be queried and you will know if you got rejects and how many.
  5. The rejected data is stored in the reject table and you ca go over the rejects when load is done and you can see what went wrong.
Note:
  • you need to have the vsql client installed and also the whole command bellow needs to be on the same line(so no line breaks).
  • i recommend you run the command from the source Linux box.
vsql -U dbadmin -w sourcepasswd -h 11.111.11.11 -d sourcedbname -At -c
"SELECT * from testschema.tbltest"|
vsql -U dbadmin -w targetpasswd -h 99.999.99.99 -d sourcedbname -c
"DROP TABLE IF EXISTS  public.testschema.tbltest_reject; truncate table testschema.tbltest;
 COPY testschema.tbltest FROM STDIN DELIMITER '|' direct null as ''
 REJECTED DATA AS TABLE public.testschema.tbltest_reject  NO ESCAPE;";
 vsql -U dbadmin -w targetpasswd -h 99.999.99.99 -d sourcedbname -c
"SELECT 'testschema.tbltest_reject' as Table_Name , count(*) as Rejects ,
CASE count(*) WHEN 0 THEN 'No Rejects' ELSE 'You got rejects' END
FROM public.testschema.tbltest_reject;"
When you got rejects this message will be shown:
          Table_Name           | Rejects |      case
-------------------------------+---------+-----------------
 testschema.tbltest            |  11     | You got rejects
(1 row)
If not rejects are generated this message will be shown:
          Table_Name           | Rejects |      case
-------------------------------+---------+-----------------

 testschema.tbltest            |  0      |  No Rejects

(1 row)
Step three
  • go over the rejects and fix them.
I hope this will be useful for who wants to read it and fell free to post your pros and cons opinions.