Migrate SQL Server to HP Vertica in a single step

In this short article i will demonstrate how you can migrate/move data from a SQL Server database to a HP Vertica database in a single line of code. As we already know HP Vertica database is not handling very well OLTP transactional volumes and for this as a front end there will always be a database that will take care of the transactional jobs. in this case we have SQL Server doing this for us. Even thow you can use some ETL tools like Pentaho, Informatica, Talend and many more out there on the market to do the ETL process you can choose to build/write your own scripts that will do the job.

So how can we achieve this ? Here are the steps 1- Install the Microsoft Connectivity Pack and ADO.NET Driver on Windows 2- Replicate the table structure from SQL Server to HP Vertica 
  • here how you export the table definition in SQL Server Management Studio.
export sql server ddl
  • you have to alter the data type so that they can fit Vertica data types.
3- Create the table in Vertica 4- Create the script to extract your data from the SQL Server database 
SET NOCOUNT ON
SELECT * FROM dbo.TABLE_NAME
go
  • save the file, call it extract_script.sql.

  • this script will do the E&T steps out of the ETL.(in this case we will do a simple select).

  • you can do all kind of stuff(transform) with the data while extracting it(getting it ready for Loading in Vertica).

5- Create the rest of the script that will load the data into Vertica database

  • if you choose to do you transformation on SQL Server side you don't have to custom to much you Vertica loading script.
vsql -h 11.111.1.11 -d db01 -U dbadmin -w db01  -c "COPY public.HD_BANCO FROM LOCAL STDIN
 DELIMITER '|' ;"
Where the options are: -h is the host ip address -d  database name -U user name -w password -c will be the copy command -P the port number(since is default in my case i didn't specified it) Now here is the sqlcmd script 
sqlcmd -S SQL_Server_host.com -d SCHEMA_NAME -E -s"|" -W -ic:Usersadrian.opreaDownloads
query.sql | findstr /V /C:"-" /B

Where the options are: -S SQL Server Instance -d Schema name -E tells it is a trusted connection -s specified the column separator  -W removes trailing spaces -i  script file location findstr /V /C:"-" /B  - will remove all the extra characters The final script should look like this 
sqlcmd -S SQL_Server_host.com -d SCHEMA_NAME -E -s"|" -W -ic:Usersadrian.opreaDownloads
query.sql | findstr /V /C:"-" /B | vsql -h 11.111.1.11 -d db01 -U dbadmin -w db01 -c "COP
Y public.TABLE_NAME FROM LOCAL STDIN DELIMITER '|' ;"
As you can see i use the pipe "|" to throw the query results into STDOUT which is going to grabbed by the vsql copy command and inserted into Vertica database. Example:
sqlcmd -S SQL_Server_host.com -d SCHEMA_NAME -E -s"|" -W -ic:Usersadrian.opreaDownloads
query.sql | findstr /V /C:"-" /B | vsql -h 11.111.1.11 -d db01 -U dbadmin -w db01 -c "COP
Y public.TABLE_NAME FROM LOCAL STDIN DELIMITER '|' ;"

Rows Loaded
-------------
         21
(1 row)

Is this the best solution to do your ETL ? maybe not ! But if you don't have resources(Fancy software or the right people to handle software like Informatica or any other ETL tool) this type of work can do it for you. There are many things to adjust in this process such as load size(when you will need to store your data into csv files) or the how you will handle data cleansing and so on. I hope this was helpfull.