HP Vertica Backup and Recover with examples

Vertica uses vbr.py tools that is written in python. This utility lets you create full and incremental database snapshots, as well as snapshots of specific schema's or tables for use with a multitenant database. You can run the Vertica vbr.py from a crontab job or other task scheduler.

Location where you can save the backups:

  • A local directory on the nodes in the cluster
  • One or more hosts outside of the cluster
  • A different Vertica cluster (effectively cloning your database)
Snapshots create with the vbr.py tool requires the same to tool to be restored. As well as snapshots made with backup.sh tools are only to be restored with backup.sh tool. Snapshot Configuration Parameters. Parameter:
  • RemoveSnapshotInterval
s the number of seconds that Vertica checks for snapshots that can be automatically removed. Default value -3600 To alter this parameter :
SELECT SET_CONFIG_PARAMETER('RemoveSnapshotInterval',7200);
  • SnapshotRetentionTime
-How long a snapshot can exist before it can be automatically removed. Default value -3600 To alter this parameter :

VBR Terminology

  • Snapshots
  • A consistent image of all objects and data in the database at the time the snapshot is taken. -Object-level snapshots consist of a subset of database objects, selected by the user. They include other objects in the dependency graph, and associated data in the database at the time the object-level snapshot is taken. -We can name our snapshot as we like (blablasnap,1snap,fullsnap,etc)
  • Archive
  • A number of same-name snapshots, past and psent. Each archive can have a different retention policy. Example : If TSnap names a snapshot of table T, and you take the snapshot daily, keeping 7 snapshots in the archive would let you revert back to any of the last week's snapshots of table T.
  • Backup location
The directory location on a backup host where snapshots are saved. This location can comprise multiple snapshots, including associated archives. All snapshots in the same backup location share data files (through hard links). The snapshots are also compatible, meaning that after restoring a full database snapshot, any object snapshot from the same backup location is eligible to be restored.
  • Object-level snapshot
One or more schemas or tables, or group of such objects, saved on a backup host. The conglomerate parts of the object-level snapshot do not contain the entire database. In earlier Vertica versions, object-level snapshots could not exist because a snapshot always contained the entire database.
  • Incremental backups
A successive backup consisting only of new or changed data.
  • Selected objects
The objects chosen to be part of an object-level snapshot. For example, if tables T1 and T2 are backed up in a snapshot, these objects comprise the selected objects.
  • Dependent objects
Objects that should be part of an object-level snapshot due to their dependency. For example, a table with a foreign key can exist on its own, but must be backed up with the primary key table, due to table constraints. Projections anchored on a table in the selected objects are also dependent objects.
  • Principal objects
The objects on which both selected and dependent objects depend. For instance, each table and projection has an owner. The owner is a principal object.

When to Back-up the Database

Whenever you choose to, depending on your needs
  • Before you upgrade Vertica to another release.
  • Before you drop a partition.
  • After you load a large volume of data.
  • If the epoch in the latest snapshot is earlier than the current ancient history mark .
  • Before and after you add, remove, or replace nodes in your database cluster.
  • After recovering a cluster from a crash.
Note: When you restore a database snapshot, you must restore to a cluster that is identical to the one on which you created the snapshot. For this reason, always create a new snapshot after adding, removing, or replacing nodes.

Configuring Backup Hosts

You can backup you database in one or more location, being use full in creating offsite data backups. The backup hosts you use must:
  • Have sufficient backup disk space.
  • Be accessible from your database cluster.
  • Have password less SSH access for the database administrator account.
  • Have a copy of the same versions of Python and rsync that were installed by the Vertica installer.

Configuring Single-Node Database Hosts for Backup

Installing Vertica on a single-node database host automatically sets up the node with passwordless SSH access. The vbr.py utility requires that all database hosts (including single-node hosts) and backup location hosts have passwordless SSH access. Estimating Backup Host Disk Requirements Wherever you plan to save data backups, consider the disk requirements for incremental backups at your site. To see the total used space on your cluster:
select sum(used_bytes) as total_size from v_monitor.storage_containers;
To see space used on your cluster separated for each node
select node_name,sum(used_bytes) as size_in_bytes from v_monitor.storage_containers group by node_name;

Generating the vbr.py Configuration File

The vbr.py utility uses a configuration file for the information it requires to back up and restore a full- or object level snapshot. You cannot run vbr.py without a configuration file, and no default file exists.
#vbr.py --setupconfig
Creating a vbr Configuration File
 vbr.py --setupconfig
Snapshot name (snapshotName): ExampleBackup
Backup Vertica configurations (n) [y/n] y
Number of restore points (1): 5
Specify objects (no default): dim, dim2
Vertica user name (current_user): dbadmin
Save password to avoid runtime prompt (n)[y/n]: y
Password to save in vbr config file (no default): mypw234
Node v_example_node0001
Backup host name (no default): backup01
Backup directory (no default): /home/dbadmin/backups
Node #1
Node name: v_exampledb_node0002
Backup host name: backup02
Backup directory: /home/dbadmin/backups
Node #2
Node name: v_exampledb_node0003
Backup host name: backup03
Backup directory: /home/dbadmin/backups
Config file name: exampleBackup.ini
Change advanced settings (n)[y/n]: n
Saved vbr configuration to exampleBackup.ini.
You can alter the generated file manually as well. Now to back-up the database follow the syntax :
/opt/vertica/bin/vbr.py -ask backup -onfig-file
The command to restore the database is :
/opt/vertica/bin/vbr.py task restore --config-file

Follow a full example of backup-destroy-restore(full/all objects) of a cluster using vbr.py tool:

1 -see cluster state. 2 -see cluster objects 3 -backup the cluster with the full_backup.ini script 4 -drop all objects in the cluster 5 -stop the cluster database 6 -restore the cluster database 7 -start cluster database 8- Verify cluster database content And there you go we have restored our cluster database successfully. Generating Different types of vbr.py Configuration File To create a configuration file:
  • 1. From a command terminal, enter vbr.py command:
$ /opt/vertica/bin/vbr.py setupconfig
Sample Session Configuring Required Parameters: We are going to describe each parameter of the vbr.py backup configuration file: Note you can have as many backup configuration file as you need (full/incremental, objects/schema specific, location specific).
Snapshot name (snapshotName): --It is the name of the SnapShot
Backup Vertica configurations (n) [y/n] y default is false (will backup
                                configuration file as well)
Number of restore points (1): -default is 1 but can be from 1-99 (it repsents the
                              number of snapshots to retain. Having multiple snapshots
                              allows us to create incremental backups.
                              If the value is 1(default) then the backup will overwrite.

Specify objects (no default): - default is none , you can specify the names of the
                              specific objects you want to include in a backup.
                              You need to list the objects separated by comma .
                              Non-alphanumeric characters, use a backslash () followed by a hex value.
                        Example:
                        Object name my table
                        You can listed like this :
                              Objects=my20table,objectname,etc

Vertica user name (current_user): -is the admin user in this case, who can run the
                              Backup/restore tasks.

Save password to avoid runtime prompt (n)[y/n]:-chose to save the password in the
                              backup config file or now, by default is not.

Password to save in vbr config file (no default):- if you chose to store the passwd
                              In the backup config file then you must specify it now.

Backup host name (no default): - indicates where the backup will be stored, can be
                             on one of the cluster hosts or another host outside the
                             cluster(user must have password-less ssh access and specific
                             rights on the host).

Backup directory (no default):- full path location of the backup storage place.
                             Directory must already exist and must be writable by the user that
                                      runs the backup utility.

Config file name: -         it is the name of the backup configuration exampleBackup.ini

Change advanced settings (n)[y/n]: -chose to configure advanced settings, default
                             value is alse
We saved the vbr.py configuration to exampleBackup.ini in the actual work directory, or if you specify the full path when setting config file name Example:
Config file name: /opt/vertica/backupConf/exampleBackup.ini
Note: Directory ackupConfdoes not exists by default you can create one for the purpose of a better organization. Configuring Advanced VBR Parameter To use advanced settings in configuring the backup file choose when:
Change advanced settings (n)[y/n]:Y
What are the options:
Temp directory (/tmp/vbr):
- absolute path to a temporary storage area on the cluster  nodes.(default  value  tmp
                            Note: The tmp path must be the same on all nodes in the cluster.

Number of times to retry (2):
- repsentes the  number of times the backup operation attempts to
complete execution after an error occurs. If the failure continues to occur
after the number of retry attempts, the utility reports an error and stops processing.

Seconds between retry attempts (1):
-  number of seconds to wait between backup retry attempts in the event of a failure.

Encrypt data during transmission (n) [y/n]:
- default value is false. Choose this option if you are performing a backup over an
untrusted network (for example, backing up to a remote host across the Internet).
Note:
 - Encrypting data transmission causes significant processing overhead and slows transfer.

Use checksum for data integrity (not file date and size) (n) [y/n]:
- default value is false.
- determine whether files are identical before and after network transmission.
Note: Calculating checksum values increases processor usage during the backup process.

Port number for rsync daemon (50000):
-default value is 50000
-you change the port number of the rsync as desired/needed.

Transfer bandwidth limit in KBps or 0 for unlimited (0):
- Indicates the transfer bandwidth limit.
- by default is has no limit.
Examples of backup config files: (All examples are made on a 3 node cluster called luster) Before all we must choose our backup host and create specific directories that can be written by the user who running the backup/restore duties. We will chose our monitoring host for the backup storage (Vertica_Master1) And we will create the directories needed.
[root@Vertica_Master1 /]# rm -rf /vertica_db/
[root@Vertica_Master1 /]# mkdir -p /vert_backup/backup
[root@Vertica_Master1 /]# cd /vert_backup/
[root@Vertica_Master1 vert_backup]# mkdir backup_conf
[root@Vertica_Master1 vert_backup]# cd /
[root@Vertica_Master1 /]# chown -R dbadmin:dbadmin /vert_backup/
All directories are created now let us create the backup config files: 1-Full database backup config file with 1 level of retention (will be overwritten every time a backup occurs).
[dbadmin@Vertica_Master1 cluster]$ vbr.py --setupconfig
Snapshot name (backup_snapshot): full_backup
Backup vertica configurations (n) [y/n]: y
Number of restore points (1):
Specify objects (no default):
Vertica user name (dbadmin): dbadmin
Save password to avoid runtime prompt (n) [y/n]: n
Node v_cluster_node0001
Backup host name (no default): 10.253.1.216
Backup directory (no default): /vert_backup/backup
Node v_cluster_node0002
Backup host name (no default): 10.253.1.216
Backup directory (no default): /vert_backup/backup
Node v_cluster_node0003
Backup host name (no default): 10.253.1.216
Backup directory (no default): /vert_backup/backup
Config file name (full_backup.ini): /vert_backup/backup_conf/full_backup.ini
Change advanced settings (n) [y/n]: n
Saved vbr configuration to /vert_backup/backup_conf/full_backup.ini.
[dbadmin@Vertica_Master1 cluster]$
[dbadmin@Vertica_Master1 cluster]$
[dbadmin@Vertica_Master1 cluster]$
[dbadmin@Vertica_Master1 cluster]$ cd /vert_backup/backup_conf/
[dbadmin@Vertica_Master1 backup_conf]$ ls -la
total 4
-rw-rw-r--. 1 dbadmin dbadmin 475 Sep  6 14:00 full_backup.ini
See content of the full_backup.ini file 2-Full/Incremental database backup config file with more than one level of retention (values 1-99).
[dbadmin@Vertica_Master1 backup_conf]$ vbr.py --setupconfig
Snapshot name (backup_snapshot): backup_incremental_7
Backup vertica configurations (n) [y/n]: y
Number of restore points (1): 7
Specify objects (no default):
Vertica user name (dbadmin): dbadmin
Save password to avoid runtime prompt (n) [y/n]: n
Node v_cluster_node0001
Backup host name (no default): 10.253.1.216
Backup directory (no default): /vert_backup/backup
Node v_cluster_node0002
Backup host name (no default): 10.253.1.216
Backup directory (no default): /vert_backup/backup
Node v_cluster_node0003
Backup host name (no default): 10.253.1.216
Backup directory (no default): /vert_backup/backup
Config file name (backup_incremental_7.ini): /vert_backup/backup_conf/backup_incremental_7.ini
Change advanced settings (n) [y/n]: n
Saved vbr configuration to /vert_backup/backup_conf/backup_incremental_7.ini.
[dbadmin@Vertica_Master1 backup_conf]$
[dbadmin@Vertica_Master1 backup_conf]$
[dbadmin@Vertica_Master1 backup_conf]$ ls -la
total 16
drwxr-xr-x. 2 dbadmin dbadmin 4096 Sep  6 14:05 .
drwxr-xr-x. 4 dbadmin dbadmin 4096 Sep  6 13:40 ..
-rw-rw-r--. 1 dbadmin dbadmin  470 Sep  6 14:05 backup_incremental_7.ini
-rw-rw-r--. 1 dbadmin dbadmin  475 Sep  6 14:04 full_backup.ini
See content of the backup_incremental_7.ini file 3- Object specific database backup config file (as well can have more levels of retention). Chose some object/objects in the database to backup only
cluster= dt
               List of tables
 Schema | Name  | Kind  |  Owner  | Comment
--------+-------+-------+---------+---------
 public | one   | table | dbadmin |
 public | three | table | dbadmin |
 public | two   | table | dbadmin |
(3 rows)
We will backup table public.one
[dbadmin@Vertica_Master1 /]$ vbr.py --setupconfig
Snapshot name (backup_snapshot): backup_table.one
Backup vertica configurations (n) [y/n]: n
Number of restore points (1):
Specify objects (no default): public.one
Vertica user name (dbadmin): dbadmin
Save password to avoid runtime prompt (n) [y/n]: n
Node v_cluster_node0001
Backup host name (no default): 10.253.1.216
Backup directory (no default): /vert_backup/backup
Node v_cluster_node0002
Backup host name (no default): 10.253.1.216
Backup directory (no default): /vert_backup/backup
Node v_cluster_node0003
Backup host name (no default): 10.253.1.216
Backup directory (no default): /vert_backup/backup
Config file name (backup_table.one.ini): /vert_backup/backup_conf/backup_table_one.ini
Change advanced settings (n) [y/n]: n
Saved vbr configuration to /vert_backup/backup_conf/backup_table_one.ini.
[dbadmin@Vertica_Master1 /]$
[dbadmin@Vertica_Master1 /]$
[dbadmin@Vertica_Master1 /]$
[dbadmin@Vertica_Master1 /]$ cd /vert_backup/backup_conf/
[dbadmin@Vertica_Master1 backup_conf]$ ls -la
total 20
drwxr-xr-x. 2 dbadmin dbadmin 4096 Sep  6 14:21 .
drwxr-xr-x. 4 dbadmin dbadmin 4096 Sep  6 13:40 ..
-rw-rw-r--. 1 dbadmin dbadmin  470 Sep  6 14:12 backup_incremental_7.ini
-rw-rw-r--. 1 dbadmin dbadmin  488 Sep  6 14:21 backup_table_one.ini
-rw-rw-r--. 1 dbadmin dbadmin  475 Sep  6 14:04 full_backup.ini
See content of the backup_table_one.ini file Testing our backup configuration files To run backup on Vertica using the configuration file created use the following sintax:
# vbr.py task backup --config-file
To run restore a Vertica backup using the configuration file created use the following sintax:
# vbr.py task restore --config-file
NOTE: -to restore/recover a full database you need to have the database shutdown.

Example of test backups

Full backup
 [dbadmin@Vertica_Master1 backup_conf]$ vbr.py --task backup --config-file full_backup.ini
Please input vertica password:
pparing...
Found Database port:  5433
Copying...
251955 out of 251955, 100%
All child processes terminated successfully.
Committing changes on all backup sites...
backup done!
The backup files were generated in the specified location Incremental backup
[dbadmin@Vertica_Master1 backup_conf]$ vbr.py --task backup --config-file backup_incremental_7.ini
Please input vertica password:
pparing...
Found Database port:  5433
Copying...
249828 out of 249828, 100%
All child processes terminated successfully.
Committing changes on all backup sites...
backup done!
The backup files were generated in the specified location Single object backup
[dbadmin@Vertica_Master1 backup_conf]$ vbr.py --task backup --config-file backup_table_one.ini
Please input vertica password:
pparing...
Found Database port:  5433
Copying...
70926 out of 70926, 100%
All child processes terminated successfully.
Committing changes on all backup sites...
backup done!
The backup files were generated in the specified location Verify backup location
[dbadmin@Vertica_Master1 /]$ cd vert_backup/
[dbadmin@Vertica_Master1 vert_backup]$ ls -la
total 16
drwxr-xr-x.  4 dbadmin dbadmin 4096 Sep  6 13:40 .
dr-xr-xr-x. 28 root    root    4096 Sep  6 13:53 ..
drwxr-xr-x.  5 dbadmin dbadmin 4096 Sep  6 15:22 backup
drwxr-xr-x.  2 dbadmin dbadmin 4096 Sep  6 15:29 backup_conf
[dbadmin@Vertica_Master1 vert_backup]$ cd backup/
[dbadmin@Vertica_Master1 backup]$ ll
total 12
drwxrwxr-x. 8 dbadmin dbadmin 4096 Sep  6 15:29 v_cluster_node0001
drwxrwxr-x. 8 dbadmin dbadmin 4096 Sep  6 15:29 v_cluster_node0002
drwxrwxr-x. 8 dbadmin dbadmin 4096 Sep  6 15:29 v_cluster_node0003
[dbadmin@Vertica_Master1 backup]$ cd v_cluster_node0001/
[dbadmin@Vertica_Master1 v_cluster_node0001]$ ll
total 12
drwx------. 3 dbadmin dbadmin 4096 Sep  6 15:26 backup_incremental_7
drwx------. 3 dbadmin dbadmin 4096 Sep  6 15:29 BackupTableOne
drwx------. 3 dbadmin dbadmin 4096 Sep  6 15:22 full_backup
So backup were created and placed in the supplied directories during backup configuration file.

Restoring the backup of a database from an actual backup

Syntax:
#vbr.py task restore --config-file
Let's drop all the objects in our database and restore it with our recently made backup
cluster= dt
               List of tables
 Schema | Name  | Kind  |  Owner  | Comment
--------+-------+-------+---------+---------
 public | one   | table | dbadmin |
 public | three | table | dbadmin |
 public | two   | table | dbadmin |
(3 rows)

cluster= drop table one,two,three;
DROP TABLE
cluster= dt
No relations found.
Great, now stop the database so we can restore it. Other-wise you will get an error telling you :
[dbadmin@Vertica_Master1 backup_conf]$ vbr.py --task restore --config-file full_backup.ini
Please input vertica password:
Preparing...
Found Database port:  5433
Warning: trying to restore to an UP cluster
Warning: Node state of v_cluster_node0001 is UP; node must be DOWN for restore; ignoring restore on this node.
Warning: Node state of v_cluster_node0002 is UP; node must be DOWN for restore; ignoring restore on this node.
Warning: Node state of v_cluster_node0003 is UP; node must be DOWN for restore; ignoring restore on this node.
Nothing to do
restore done!
So the database is down. Now restore the database:
[dbadmin@Vertica_Master1 backup_conf]$ vbr.py --task restore --config-file full_backup.ini
Please input vertica password:
Preparing...
Found Database port:  5433
Copying...
248556 out of 248556, 100%
All child processes terminated successfully.
restore done!
Startup the database and check that your objects are available :
cluster= dt
               List of tables
 Schema | Name  | Kind  |  Owner  | Comment
--------+-------+-------+---------+---------
 public | one   | table | dbadmin |
 public | three | table | dbadmin |
 public | two   | table | dbadmin |
(3 rows)
Now let simulate the loss of a table and in this case we will restore the single backup option:
cluster= drop table one;
DROP TABLE
cluster= dt
               List of tables
 Schema | Name  | Kind  |  Owner  | Comment
--------+-------+-------+---------+---------
 public | three | table | dbadmin |
 public | two   | table | dbadmin |
(2 rows)
Note:For this restore database must be up and running
[dbadmin@Vertica_Master1 backup_conf]$ vbr.py --task restore --config-file backup_table_one.ini
Please input vertica password:xxxxxxxx
Preparing...
Found Database port:  5433
Copying...
0 out of 0, 100%
All child processes terminated successfully.
Copying...
2763 out of 2763, 100%
All child processes terminated successfully.
restore done!
Now check that the object is in the database:
[dbadmin@Vertica_Master1 backup_conf]$ vsql
Password:xxxxxxx

dbadmin= dt
               List of tables
 Schema | Name  | Kind  |  Owner  | Comment
--------+-------+-------+---------+---------
 public | one   | table | dbadmin |
 public | three | table | dbadmin |
 public | two   | table | dbadmin |
(3 rows)

dbadmin= d one ;
                                    List of Fields by Tables
 Schema | Table | Column |     Type     | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+--------+--------------+------+---------+----------+-------------+-------------
 public | one   | one    | numeric(3,0) |    8 |         | f        | f           |
Great out table is back and ready to use.

Automating Regular Backups

The vbr.py utility helps to automate backing up you database, because you can configure vbr with the required run-time parameters. The ability to configure runtime parameters facilitates adding the utility to a cron or other task scheduler to fully automate regular backing up your database. Example of crontab syntax:
# Minute   Hour   Day of Month       Month          Day of Week        Command
# (0-59)  (0-23)     (1-31)    (1-12 or Jan-Dec)  (0-6 or Sun-Sat)
    0        2          12             *               0,6           /usr/bin/find
How i put it inside my crontab.
0 1 * * * /opt/vertica/bin/vbr.py task backup --config-file /vertica_db/backup_conf/ Incremental_full   /vertica_db/backup/cronjobs.log
So our crontab job will do a full back-up every day at 1 am and will write the output into a log file that resides into