One of the many task that needs to be completed when Cloning a Vertica Cluster from a On-premise DW to Amazon Cloud is to change the default Spread daemon messaging service type.  See here the full HP Vertica installation manual on AWS.

   What is Spread ? 

An open source toolkit used in Vertica to provide a high performance messaging service that is resilient to network faults. Spread daemons start automatically when a database starts up for the first time, and the spread process runs on control nodes in the cluster.

 

So you need to do ? 

I you have tried to clone a database cluster using the vbr.py copycluster tools you end-up with a no starting database after the copycluster is complete with no clues of what and why is not starting.

 Why ? 

When the copycluster runs it seems that it over wrights the catalog definition of the AWS cluster installation (which it has to be pt2pt in AWS) to reflect the Source cluster one (broadcast).

 

 

 So ?

We the Cluster won`t start because of Spread mis-configuration.

 

How do i fix this ? 

In this case you need to contact Vertica Support so they can pass you the guided documentation or you can use the script bellow:

Note: This is not recommended to be done in Production and i will not be held responsible for any damage caused by it. Use on your own responsibility. 

 

This script must be executed logged in a dbadmin on the target cluster nodes and it will change the spread control Mode to pt2pt. You must run it on all nodes.

In this short tutorial we will see how to change the location of our Data Storage in Vertica

Checkout the Video tutorial on how this is done

Use the script here to guide yourself thru the procedure:

 

This Vertica Script will create a report on your load streams. I use this to identify my loading streams and give and output for how long they are running or what is their target table stats status.

Also i generate a analyze_stats sql in case you need run them.

monitor

 

The output will be the following:

 

HP Vertica Database Series

The new HP Vertica 8 “Frontloader” was Released and it comes with a lot of great new features. We will try to go over the most important features of this new Vertica release.

  •   Vertica is available now on Microsoft Azure cloud platform.(More info on this at this link).
  •   Amazon AWS documentation of HP Vertica running on AWS cloud is now part of the official HP Vertica documentation(See more here).
  •  Advanced Analytics Package – Machine Learning Package now Included in Server Install.(See more here).
  •  Apache Kafka integration that provides a high-performance loading mechanism for streaming data from a third party message bus into your Vertica database.(See more here).
  • Python SDK was added. This SDK allows you to use Vertica Python APIs to add additional functionality to your Vertica cluster.(See more here).
  • Management Console comes now with a Kafka and AWS S3 loading jobs monitor features; extended monitoring is enable over longer periods of time; Table usage monitor(if any projection has been used or unused).
  • Apache Spark integration. This new feature enables us to transfer data between Vertica and Apache Spark.
  • User Client Connections can be limited now as per each user and they are enforced cluster-wide.

 

We can see more and more that HP Vertica is becoming a close component of any Apache Hadoop Eco System.  Would be great to see Apache NiFi to integrate with HP Vertica.

 

 

 

   Many times when we load data in HP Vertica we really do not know what is happening and unless we have a tool such as Vertica Management Console installed and configured capturing information about the load streams is quite hard.

  I am going to walk you thru some simple steps that will help manage and understand better at what point is my load stream.

The table that captures the information about the Data load streams is called load_streams. This table contains all the information you require to understand the status of your loads.

So before you jump in and COPY+PASTE the script go thru some of this bullet points below:

1 – Make sure you setup the STREAM NAME of the Data Load, this will allow you to easy track and measure your data loads over periods of time.

2 – Use DIRECT option when you load large amounts of data.

     Why DIRECT ? 

– the DIRECT option enables you to load data straight into the ROS containers.

3 – Put your Rejected data into a table.

– i find this very useful as you can query it and work with easy on it to fix any loading issues.

Here is a basic syntax of COPY command:

 

So while loading the data this is the script i use to monitor the load_streams. 

  • this script will consider all loading sessions that are executing and will output details about Accepted_Row_Count, Duration of the load since load start and what is the load average per secound.

I hope this will be useful and if any can improve this script fell free to do so and drop me a message with any ideas or requests related to this topic.

 

In this short tutorial we will see how we can remove a storage location for a HP Vertica Database Cluster.

This is a common task for Vertica DW administrator.

Before you go ahead and copy + pate what ever code i have put here make sure you backup your database and you test in DEV environment before. 

If you dont know how to backup your database you can follow this tutorial here - Full HP Vertica Backup.

So this is how my storage looks at the moment in my Vertica Database 

  • so we wanna remove the vertica_data2 location.

So in order to be able to drop/remove a storage from Vertica database first we need to Retire the location.

Retiring a location - this will make the storage location inactive and also it has the function to move what ever data is stored in this location to other available storage locations.

Syntax:

enforce_storage_move - if you have data on the storage location data is moved elsewhere, by default this is set to false, and to enable it you need to set it to true

 

Lets Run the retire function:

Now we go back to the Storage_Locations table and see the status of the retired locations.

  • see that is marked as retired.

Now the Storage Location is ready to be dropped.

  • remember that this is a logical drop at the database level and the folders will persist at the OS level.

Now before we go ahead and drop the fiscal location we need to make sure that changes were done in the /opt/vertica/conf/admintools.conf file.

– so under the nodes part you will see what ever is left from your storage locations.

Also make sure you check the status of your database:

Now you are ready to drop the storage on the OS and remove it from you /etc/fstab file (optional)

 

hope this was helpful

 

 

Ok, i have spent about 2 hours today trying to understand what is wrong with a loading script of mine.

The is getting loaded but the 1st line.

Content of the file:

  • it looks like all are integer values ! right ?

 

The file type is UTF-8 so is good !

Now there is leading character called the BOM Marker in the file so this was the reason my file was not getting loaded and end-up with this error:

 So what is the fix for this ?

  • remove the the BOM marker using sed(good old sed 🙂 ).

To do it for a file that get the name randomly generated sed won`t accept  (*)wildcard for a in-place operation so i came-up with this script.

 

I hope this was helpful for all that face the same errro 🙂

 

 

a

Vertica User Clone

This is a very common task for a DBA(Database Administrator) and it can be time consuming in some cases.
So in this article we will see how we can build a User Defined Procedure in HP Vertica to clone a Database user. We have to user User Defined Procedure due to the lack of programmability features offered by Vertica(Vertica was build for super smooth Sql stuff).

So first we will go over the scripts we will use and them see how we can call them from Database as UDP(User Defined Procedure).

1 – First Script:

. /home/dbadmin/.profile

  • this file will have to contain the user and pasword used for the porcedure to run and execute(in most cases will be dbadmin user).

  • we will use this “trick” to avoid text/hard coded password when running scripts against the database

Note:

  • this file is of extreme importance !!! so make sure only the owner has access to it.

 

2 – Second Script

/vertica/catalog/database_name/procedures/clone_user.sh

  • this script will generate the final SQL that will contain the new user SQL definition.
  • note that we have 3 parameter that we pass to this script $1(new_user),$2(password),$3(old_user), they will be echoed into the /tmp/create_user.sql.

3 – Third Script

Script execution.

  • this part is pretty obvious, will execute the content of the /tmp/create_user.sql script

 

Full script

  • this will hold all the described script above.

 

Now lets go and install this script as a UDP, so we can call it as a Vertica SQL action.

 1 - We need to create the script into the procedure folder of the Vertica Database.

The Folder is located at /vertica/catalog/database_name/procedures/, the folder naming convention is specific to each database.

  • after you create the script , lets call it clone_user.sh you need to change the privilege definition on it.

 

2 – Install the store Procedure on the Vertica Cluster.

  • you must have the script on all nodes that are part of the cluster.

  • must be logged as dbamin user.

3 – Install the Procedure on your Database

  • must be logged as dbamin user.

Lets the the procedure in action

I have a user called test_user that has the following access:

and definition:

 

Lets run our clone_user procedure and create a new user called test_user_clone

  • the output of the procedure is:

 

Lets check the new user definition:

And grants:

… i guess it worked boys !!!

 

I hope this was useful and if you have comments or suggestions please fell free to leave them here.

Note:

Use this script at your own RISK, so make sure you test it in DEV before you break your production 🙂

 

 

When running a big Data Warehouse on top of HP Vertica the last thing you want is some smart BI analyst running queries on your system tables repositories.
Sensitive information might be there for grabs by some bad intention-ed users.
Also when running Multi-Tenant DW where you have multiple users speared by each project you don`t really want to give this opportunity on seeing resource pools definition or any user
specific definition.

So you wanna keep this access limited to only dbadmin role users.

By default the parameter RestrictSystemTables is set to 0,where 0 Allows all users to access system tables.

  • To see the actual value of the RestrictSystemTables parameter:

To restrict access to normal users the parameter RestrictSystemTables value needs to be 1 (Limits access to system tables to database administrator users).

  • To alter the value of the RestrictSystemTables parameter:

Note:

This requires a database restart.

Ohh but what is my user needs to see this type of data ?

  • create views on this system tables limited to only each user scope.

Example : 

 

  •  the user will be able to see only his user definition.

 

In this short article i will go over the steps used to create a simple Vertica UDP (user defined procedure) that will help us create a new schema and also create the database roles in a single SQL command.

This will help you work faster and spend less time writing simple SQL code.

Normally when i create schema i make sure that the default role also come with it.

My Default roles:

  • this is optional, i only do this to avoid the headache of not being able to see who has access to what.
  • i never give direct access to a user, always prefer role to propagate.

<schema_name>_datareader – grant select rights to all objects in this schema.

<schema_name>_datawriter – grant insert,update,delete rights to all objects in this schema.

<schema_name>_schemausage – grant schema usage on this schema.

Here is the content of my .profile file.

  • the main reason i use this approach is to avoid clear text password and this is a serious security issue.
  • the file can only be accesed by the dbadmin user.

 

The add_schema.sh content:

  • the location of the add_schema.sh should be inside the default procedure folder of your catalog.

  • the script will user the username and password that is passed by the .profile file and will execute the SQL script attaching the parameter value we will pass to the Procedure at execution time.

Make sure you run the following command when you create the add_schema.sh script:

 

Create the UDP using the admintools

Create the Procedure in your database

  • i normally have specific schema for my own use that i call dba, here i store all my procedures and all other tables & views used to manage the database.

  • i called my procedure add_schema, i am trying to make it as obvious as possible.

After the procedure add_schema was created try to see if it works.

  • See if schema was created.

  • See if the roles have been created.

  • See if the roles have the right grants on the new schema.

Ok, that looks good, so the procedure work well.

Feel free to use it and make it better.

 

 During the build of a HP Vertica Object backup script i have faced the following error when trying to initiate the backup location. 

Solution for this error:

  • it appears that the rsync process was using port 50000  already with a process that was running on my OS.
  • after i have terminated this process Vertica init command completed with success.

 Completed Vertica VBR init task.

I hope this was helpful.

migration sql server to vertica

This is a simple step by step Vertica Management Console upgrade guide.

Before you start make sure you download the latest Version or Vertica Management Console from  my.vertica.com website.

Here we go !

Stop the Vertica Console:

Take a backup of the existing Vertica Console:

Upgrade the Vertica Console rpm

 

Open the browser and go to the Console Webui

  • accept the conditions.

uvc1

 

 

Next you need to fill in the apikey that is stored in the /opt/vertica/config/ folder.

  • is the apikey string

 

uvc2

 

Done your Vertica Management Console was upgraded with success.

uvc3

 

 

 

 

 See here the similar process but on a different Version VMC 8.0

 

On running the following command:

I got the following error

ERROR: AHM must lag behind the create epoch of unrefreshed projection projection name.

Here is small fix for Error :

  • something is wrong !

What is the reason ?

  • this error has to do with data consistency across all projections.

  • we can see that the projection is not up-to-date, which means data was not replicated across all of the projections of the base table.

To fix this we need to refresh the base table:

Now if you run the same query again:

  • we can see that the projections are up-to-date.

Make Dimension table using the Dimension table design !!! I cannot stress enough about this.

Dimension tables are typically small, ranging from a few to several thousand rows. Occasionally dimensions can grow fairly large, but this table are very skinny meaning this tables are not wide .



In Vertica we have UNSEGMENTED and SEGMENTED to ensure high availability and recovery

UNSEGMENTED projections:

  • this type of projection design applies to small table like lookup tables or Dimension table, where the entire data set can be replicated across all nodes in the cluster.
  • so in few words if you have a 3 node cluster with a k-safety of 1 you will have 3 copies of the table in each node.

Where B is a single node and contains 100 % of its data in each node.

 

 

SEGMENTED projections:

  • this type of projection design applies to large table with many millions – billions – trill…. etc rows such as Fact Table.

The idea behind it is that since they are to big to be replicated they will segmented into smaller chunks containing certain amounts of data.

The chunks of data or so called projection segmentation provides high availability and recovery and optimizes query execution.



What do you gain from doing this ? 

  •   Ensures high availability and recovery through K-Safety.
  •   Spreads the query execution workload across multiple nodes.
  •   Allows each node to be optimized for different query workloads.

How can use properly segment our projections ? 

In order to have a proper segmentation in place for a Fact table you need to provide even distribution of data across all nodes in the Cluster.

Vertica provides you with the HASH function that provides even distribution of data across multiple nodes. The ideal candidate column for segmentation should be

the primary key of your table or a combination of column that will be translated into a unique value after HASH has been applied to it.

Note: – segmented projections that have low cardinallty columns in the segmentation clause suffer of Data Skew.

What is Data Skew ?

Data skew is when your projection data is segmented in all nodes but the segments contain different amounts of data.

  • here is a graphical example of how a projection with bad segmentation looks like and we can see that data distribution is not even similar across nodes.

skew_still_tmp

 

So how would data Skew affect my Vertica database ? 

  • well when issuing a query in Vertica the Initiator node will split the execution Workload across all nodes and some nodes might work more then other and bottom line is that your Vertica Cluster is as fast as your slowest node.

 

Here is a graphical example of a Segmented Projection(Buddy Projections).

  • where a single projection will have buddy projections spread across all nodes. If any one node will go down all data will be available since at least one buddy projection will be available.

 



 

 

Disclaimer – the images used in this article belong to HP Vertica

 

After all the Projection bla bla bla .. “Hope this was helpful to some of you” we should get back to our bad Dimension design and see how we can find the bad Dimensions and how we can fix this.



So the plan is :

1 – Identify the Dimension tables with bad design

2 – Identify the right order by keys (optimize query performance)

– best sort orders are determined by query WHERE clauses (or the so called predicates).

3  – Prepare your new Projection Definition.

4 – Apply changes.

5 – Remove old projections with bad Design.

6 – Run analyze_histogram on 100 % of the data.

 

So lets get to work:

 1 - Identify the Dimension tables with bad design

If you are critical on table naming conventions :

Use this Query to find Dimension tables that have segmented super projections.

Also you can identify tables that are not called Dim* but they fit into dimension type category:

Now that you have the tables go ahead and



2 - Identify the right order by keys (optimize query performance)

- best sort orders are determined by query WHERE clauses (or the so called predicates).

  • i am a firm believer that database performance is very dependent on your table design.

Some basic rules on sort order of your projections:

1 - Do you use this column in any of your filters ? YES (consider it for sort order) / NO (do not consider it for sort order).

2 -  Keep the same sort order for all buddy projections as Refresh and Recovery will be a pain for them.

3 -  Try to avoid the default sort order given by Vertica a creation time. (will be the order of the columns in which the anchor table has been created)

4 -  Cardinality is an important factor when choosing your sort order. Use low-cardinality columns with the smallest number of distinct values before the high-cardinality columns.

5 - Even if you have High-cardinality columns used as predicates make sure your give sort order priority to them, i return some extra space will be used.

6 - Size of the columns is important when choosing the sort order. Try to put larger  footprint column before smaller footprint column

Eg:

A column with  CHAR(50) takes up 50 bytes, but an INTEGER column takes up 8 bytes. By putting the CHAR(50) column ahead of the INTEGER column, your projection compresses better.



7 - AccessRank can be applied to predicate columns as well. But this only if your Database runs on more types of storage and you are using Storage Tiers. I will not go in details about this.

 

How do find my most used predicates ? 

1 - One way is to talk to your developers and see what they have to say.

2 - Start looking thru the dc table :

  • this is query i have used for a couple of times:

  • this will output the columns used a filters in your queries.

3  - Prepare your new Projection Definition.

Before going ahead and change the table definition make sure you have picked-up the best encoding for the table.

The best way to do this is by using DBD build-in function called  DESIGNER_DESIGN_PROJECTION_ENCODINGS. 

  • this function will analyze your table and data and will come up with the best encoding fit.

Example : 

Where true stands for deploy or false stands for dont deploy just generate the script.

After you have your predicates and encoding sorted out put together your final table & projection design.

Please remember Sort order tips on Predicate use and Cardinality & Encoding type.

4 – Apply changes.

Run the Create Projection Script.

5 – Remove old projections with bad Design.

Make sure the new projection is up to date before you try to drop the old projection. It won`t let anyway if is not :).



6 – Run analyze_histogram on 100 % of the data.

Run analyze_histogram using 100% in the option parameter.

analyze_histogram is nothing more then the bigger brother of analyze_statistics.

 

I hope this was not that boring and somebody will find it useful.

 

Starting with Vertica version 7.2.x HP Vertica comes with a great feature called Inheritance, this concept is similar to SQL Server schema roles that will dynamic propagate the privileges across all objects in the schema and associate them with the users that have access to this schema roles.

Thanks HP for implementing this as it makes my life much more easy when dealing with new objects and access rules.

So lets dive in and see what is this all about.

 

To Enable or Disable Inherited Privileges at Database Level use the syntax below:

This new feature all privileges granted to the schema are automatically granted to all newly created tables or views in the schema.

Note:

Existing tables or views remain unchanged when you alter the schema to include or exclude inherited privileges.

Not be able to apply inherited privileges you need be the owner of the schema or a superuser.

 

Below is full step by step example of inheritance works:

We will create a new schema called no_grants and put one table in it. Next create a new user.

Login with the new user.

Try to query the new table.

  • make sense the error since we havent granted anything to the new user.

Now let`s alter the schema definition and enable inheritance using the “include privileges” option and add a new table.

Try to query the secound table, and we can see that even no direct grant to the table was given to the user he is able to query the table.

  • this is because of the inheritance we enabled on schema level in combination with the USAGE + SELECT grant on schema granted to the user.

Run this as user_nograt.

Now try to query the old table logged as user_nograt and see we get an error due to permission fault.

  • the reason for this is that even thou we have applied the inheritance at the schema level the inheritance it will only apply to new objects and will omit the existing objects.

To fix this we need to alter each object in the database using the following syntax:

Now the user can query the table.

Looking at the grants table we can see that we only have entries for the schema granted privileges:

Also there is no reference in the v_catalog.tables system table about the inheritance parameter, if is enabled or disabled.

But there is a way for you find and list the tables that have inheritance enabled, the table is vs_tables.

Now to make your life a bit easier when trying to implement inheritance use the below script to pickup the present tables and views and alter their definition to apply schema privilege inheritance.

Note:

  • replace the $Schema_Name$ with the schema of your choice.

 

Also  we can revoke and inherited privilege for a single object that resides in a schema that has inheritance enabled.

To see if the privileges were excluded you can use the query below :

Or try to query the table:

 

Hope this was helpful !

 

When you go thru regular database maintenance you might need to drop some objects.

In some cases this objects have dependencies, this is the case of Sequences.

So this is the error you get when you try to drop a sequence and that sequences is used in some tables definition.

So you need and hunt for the tables that use them and see if the tables really use them :

  • this happens a lot after a migration of when a developing project is done , etc…

After you remove the sequences from the default values or drop the table. You can go ahead and drop the sequence with no problem.

 

How to add a new node to a single node Vertica database ?

1- Make sure you have a host with the same operation system installed as the running Vertica O.S.

2- Prepare the box to receive  Vertica software and create the same O.S. layout(partitions).

3- Setup the sshd password-less for root user between the two hosts(running and the host that will be added)

3- Choose the way you want to add the new host to the cluster

A – Using Vertica Management Console.

B – Using the update_vertica utility.

in this article we will use the update_vertica  utility.

Syntax:

 

 

 Example :

 

Now we have the cluster running on two nodes, but our database still resides on one node(initial one).

 See example after node add

 

From inside the vsql command line

 

5- Add the node to Vertica cluster

Syntax :

Example:

 

You will need to redesign your schema to take advantage of the new nodes.

Done ! Now we have our database running on two nodes.

Let’s check to see the changes:

-we will run the same commands as ass before adding the node.

 

From inside the vsql command line

All we need to do now is re-balance the data across all nodes

 

 

Migrate one Schema fast and dirty from one Vertica Cluster to another.

This approach is used when :

  • you don’t have Vertica Cluster configuration the same in both Clusters and copy cluster won’t work.
  • no INBOUND access is available to the source Vertica Database(Production security needs to be strict).
  • different Vertica Server Version.
  • to many constraints dependencies in the tables for the schema you wanna export, so vbr.py backup on the schema is to painful.

-anyway here it is:

 

 Step One 

  • export your schema catalog and create it on the target Vertica database.

 

Step Two

  • run the following script to get the code that will export and import the data from one server to another.

The script has to receive the following parameters:

  • if you have DbVisualizer installed the $parameter_name$ will receive a value that you will need to input at run time.
  • if you run it by hand at the vsql command line you need to edit the script bellow with their respective values.

$source_passwd$ – source db database password

$source_ip$ – source node ip address

$source_db_name$ – source database name

$source_schema$ – source schema that you wanna export

$target_passwd$ – source db database password

$target_ip$ – target node ip address

$target_db_name$ – target database name

$target_schema$ – target schema

The output of this sql will generate the code to migrate the data from one server to another.

Here is a snippet of the output:

Where i have used some dummy input values.

So let me describe what the code will do:

Maybe most will understand it fast, but ill go ahead and put my words in.

  1. The script will create connection to the source database followed by a “select * table “(this table name was picked-up dynamically by the script when we told him the schema name).
  2. The result of the query is piped to the source database connection when is picked up by the copy command as a STDIN input.
  3. In the command i state the the rejects will be stored in a table that will carry the same name as the target table followed by the _reject string.
  4. Finally after the load is complete the rejects table will be queried and you will know if you got rejects and how many.
  5. The rejected data is stored in the reject table and you ca go over the rejects when load is done and you can see what went wrong.

Note:

  • you need to have the vsql client installed and also the whole command bellow needs to be on the same line(so no line breaks).
  • i recommend you run the command from the source linux box.

When you got rejects this message will be shown:

If not rejects are generated this message will be shown:

Step three

  • go over the rejects and fix them.

I hope this will be useful for who wants to read it and fell free to post your pros and cons opinions.

 

Following on a question i saw in one of the online forums i decided to do a step by step paper on how you can recover one failed node(when the host/node is no longer available).

  • if in case you lose one node(the node will be with no access) in your cluster, you wont be able to remove it from the database ! as it is not up(so you have to fix this in another way).

Just before you jump head first you need to remember that if you use this solution in your environment i shall not be hold responsible for any damage caused.

So let`s see how we can destroy one of the hosts and recover the entire host + Vertica node.

In this scenario i will be using a Vertica Cluster with 3 nodes running on AWS as shown below:

I have my database up and running as it can be seen.

Ill go ahead and terminate one of the nodes(this is to simulate the loss of the node):

 

Next i can see that my database is still running(because of build-in HA) but is missing a node(the one we dropped).

So how we can fix this ? 

Here is the list of steps:

  1. create new EC2 instance.
  2. install the required packages.
  3. download the vertica rpm and install it.
  4. create the dbadmin user and the verticadba group.
  5. enable password-less ssh access between the nodes for dbadmin user.
  6. create the same directory structure as in the lost node.
  7. copy admintools.conf file.
  8. create catalog location.
  9.  recover the database using the force option.

 

 

Create new EC2 instance.

  • it has to be on the VPC and on the subnet.
  • provide the ip address as the lost node in our case 10.333.333.226

 

Install the required packages and setup host for Vertica installation.

For this task use the script below unless you have an AMI template ready.

Download the vertica rpm and install it.

Make sure you use the same version as the one in the other two nodes.

 

Create the dbadmin user and the verticadba group.

  Run the command in the 10.333.333.226 node.

 

Enable password-less ssh access between the nodes for dbadmin user.

This has to be enabled between the nodes.

  • i won’t describe here how is done since is trivial and there are many source over the internet.

 

Create the same directory structure as in the lost node.

To see the directories you require open the /opt/vertica/config/admintools.conf file in the good nodes.

 

Copy admintools.conf file.

Copy the admintools.conf file from one of the good nodes onto the new node.

Create catalog location.

 You need to create the catalog location in the new node.

  • since is the node 3 that we lost, we need to create the catalog directory structure as it was before. You just need to see how it is in one of the good nodes and replicate it with node0003 naming.

 

 Recover the database using the force option.

 Now that we have all ready we need to recover the node. We need to use the –force option that will enable auto recover of the node.

See if all nodes are up:

 

 Conclusion: 

  • yes it can be done ! Is not a solution i recommend ! It is just a workaround. But sometimes when you get stuck you have to handle it the way you can :).

Hope this was helpful

In this article we will see how we can install the AWS client using two simple shell scripts.
The scripts are:

conf_vars.sh

  • will hold your AWS account credentials and your region data.

installAWSClient.sh

  • will download, install and configure your AWS client.

Lets see the content of the scripts:

conf_vars.sh

  • edit the file and add your credentials and region.

 

installAWSClient.sh

Create this two shell scripts on your Linux box and make them executable.

 

Lets demo the script execution.

  • it is quite straight foreword.

Test your AWS client installation.

or 

 

Great your AWS client is ready to be used.

 

 

Wanna see the Video version as well ? See Here! 

In this article I will discus about something i have initiated in  Vertica Storage Concepts.
This Is something that DW admin that migrate from SQL Server platform or Oracle must understand when working with Vertica database as their DW database engine.

So we all now by now that Vertica is storing it’s data in WOS or ROS containers(I went and explained this concepts in previous articles).

ROS container are write-once and again I will say write-once, what write-once really means ?

Well if you have 1 or many rows of data stored in one ROS container and you decide you want to remove or update one of them the ENTIRE ROS CONTAINER MUST BE RE-CREATED.
So after we decide to update an existing data this are the steps that Vertica will need to follow the steps bellow:

  • Insert new data
    – if you user default insert with no DIRECT option data will go to a WOS container.
  • Initial storage id after default insert

          45035996273721463(this is stored in memory/not-persistent so there is no reference of it on disk).

  • After move-out happens we have a ROS container created and his id is :
    45035996273721571 (stored on disk/physical file/persistent).
  • You decide you want to update a row that is contained in this ROS container.
    -after update you will end-up with 2 ROS containers as bellow.
    45035996273721571 — old ROS
    45035996273722143 — new ROS
  • Next the database will trigger the merge-out operation and will consolidate the present ROS containers.
  • After merge-out we end-up with one container
    45035996273722437 — Consolidated ROS
  • Finally the purge operation runs and removed the delete vectors and we end up with:
    45035996273722797 — Final Consistent and up-to date ROS container.

The whole process I have related is part of the Tuple Mover (TM) operation and it happens at a set interval. For a more deep understanding of this operation follow the link 

Here is an example how this actually work in reality.

I will go and create a table and simulate all the steps i have related previous.

Create a new table.

 

Insert data into the table with default option(in WOS storage)

 

See the storage container for that table(by default the initial projection name will be <table_name>_super).
– see that is stored into WOS as per storage_type column value.

 

Move your data from WOS to ROS by using the DO_tm_task(‘MOVEOUT’) operation.

 

Query again the storage container for that table and see that the storage_oid changed and also the storage_type is now ROS.

 

Now look for the ROS container file location on disk.
– find the location

 

Look for the storage_oid in your host file system

See the content of the 571 folder.

  • this will contain two files one is the .fdb which is the ROS container and the second is the .pidx holds indexes and statistics.
  • as small hint on the folders and file naming is that the last three digits of the storage containers will allays correspond to the folder/directory name that holds them.

 

Also count the number of ROS containers that are present for our TEST table.

 

Update the TEST table.
Note:
-the update statement in general and this is not only in Vertica will do a delete and an insert.

Count again the number of containers that are present for our TEST table after the update statement now.

See we have a new container present.  Normally in a regular database as Oracle or SQL Server this is not done this way, other types of structures are generated(REDO LOGS,TRAN LOGS,etc..).

Consolidate your containers now by running the do_tm_task(‘mergout’) opertion.
– remember that this operation is done by your database on a set interval, we only do it manually just for the purpose of this example.

 

After the mergeout occurs we end up with only one ROS container for our TEST table, and you can see the storage_oid changed and we have 2 rows in the row count(this is because of the delete vectors).

Delete/Purge the delete vectors using the purge_table() function.

 

See your data files on the file system now.

 Look for the initial ROC container.

  • is gone.

Conclusion:

There is nothing to update in Vertica !   It only re-creates data so high I/O is required for this type of operations.  The concept of Tablespace(Oracle), FileGroup(SQL Server) does not exists in Vertica this is very important to understand when planning a DW that used Vertica as it’s storage engine.

 

Just installed a new Vertica Cluster on host where I have already had Vertica installed and i got this error.
After searching a bit and I realized that i was using the client of my previous Vertica installation which was a higher version
of my actual install.
So the solution was to install the Vertica client with the same version as my Vertica server.

 

In Vertica we can take advantage of the “Partition Pruning” so we can speed up our queryes.
So what happens when we query a partitioned table in Vertica.
– all ROS containers that are not needed will be eliminated/excluded from the Query process, this is done using the predicate values to compare them with the
partition-related metadata.
– is very important that when using partitions you provide the query with a predicate that is on the partitioning column.

Let’s create a partitioned table and identify if Vertica is able to use the Partition Pruning or not.

 Run the query that we assume that the partition pruning will be used.

 Next run the following query to see if you have any events that contains the “PARTITIONS_ELIMINATED”  event.

  • so what this tells us is that partition pruning was used to resolve the query.

  • this tells us that only 1 ROS was used in the query execution out from the 3 ROS available and the rest were pruned “excluded from the query plan”.

Another way to see this using the EXPLAIN command with the  LOCAL VERBOSE option.

  • since the output of the EXPLAIN will be to long to be analyzed at the command line i will spool it into a file and then grep the file and look for the “Pruned storages” string.

Here is a visual and maybe more intuitive demonstration of how Partition Pruning works.

Partition Pruning

How can we do an average of the first two column and load it into the third column using a stored value of a Vertica vsql variable.

Extract the number of column in the csv file minus the one column you will store the output and store it in a variable called count.

  • our data sample is a bellow:

 Extract the column count

Run the full script

Done, happy transforming 🙂

 

With the help of the options that copy offers we have the ability to skip a column or ever a part of a column.

In this examples will demonstrate how this can be done.

We will use a csv data files called data.csv as our loading data.

 

And the table we load data into is called test and is has 3 columns.

 Let’s see how we can skip a column when loading data.

  • you can see that no data was loaded into column col2, also is important that the col2 should accept null values.

As you can see we declare the input value as v2 but we don’t tell Vertica to pass it into our load batch.

 How can we skip a part of a column 

  • The data we will use for this example:

  •  And the same table test:

Load Example

  • you see that we “declare” 4 variables that will receive the input data from the data.csv file but we only pass to the load command only 3 of them. Since we have established that the separator between the first and second variable data set is ‘x’ , the data will separated into 4 columns.

This can be very useful when loading data.

 

In Vertica all data should be loaded using the copy command. The copy command comes with many options that are very useful when loading data into your Vertica database.

How to skip a rows when loading data from a csv file. 

Our data.csv files has the following content.

 

And table that will receive the load is called test.

 

To be able to skip one or more rows we will use the SKIP option of the copy command:

  • we can see that the line number 1 was skipped and only the rest of the lines were loaded.

Hp Vertica comes with a set of utility validation scripts that can help you encounter performance issues before you install a Vertica database or after you install it.

The location of the validation scripts is :  /opt/vertica/bin

We have three scripts:

  • Vcpuperf – a CPU performance test used to verify your CPU performance.
  • Vioperf – an Input/Output test used to verify the speed and consistency of your hard drives.
  • Vnetperf – a Network test used to test the latency and throughput of your network between hosts.

 

Let’s go and see how we can use this script so we can identify possible performance degradation.

Vcpuperf

The script will test the speed of your CPU, the script will provide you with a set of values that should apply as per each CPU model.

  • the returned values are in seconds and the lower the returned value the better is the performance of your CPU’s.

Use syntax:

  • this utility comes with only one option and that is the -q which enable the command to run in quite mode.

Example of test run

As we can see in my test the expected CPU speed for my type of CPU is 8.0 seconds and my real CPU speed is bellow this values 13.+ seconds(Is not that bad).

What can we do to make this better? 

  • well as HP recommends you should disable CPU frequency scaling thou this might cost you in terms of energy.

To disable use the method via CPU scaling in BIOS follow this link.

 

Vioperf

  • next tool we have is going to measure the input and output of your Vertica nodes.

Vioperf utility will perform the following tests:

  • sequential write
  • sequential rewrite
  • sequential read
  • skip read (read non-contiguous data blocks)

Syntax :

 

Example :

  • i chose to run the script using the option duration that the test will run, by default if you don’t use any option the script will run for 5 minutes.

  • you also can use the DIR option where you point to a specific directory otherwise the current working directory will used.

 

The output result can be read as follows:

We will describe the columns that are outputted:

test – this represents the type of test done.

directory – what directory is used for the test.

counter name – can be MB/s or seeks/s

counter value – the value of the counter

counter value average – avg value for the counter

thread count – number of threads used for the test

%CPU – the available CPU % during the test

%IO wait – time that the process is locked waiting for I/O operation to complete

elapsed time  – duration of the test

Note:

The script will offer you some baseline values that you can use to benchmark your system. Such as  minimum required I/O value and recommended I/O value per physical core on each node.

 

After you have installed the Vertica client on your SQL Server database host you can create Linked Server  this can be very useful.

Here is the syntax you can use to make this happen.

The to query use the OPENQUERY statement.

Here is the format of a JDBC connection string that can be used in Vertica database

Base Syntax format:

 Example: 

 

 

A very important statement in Vertica is the comment on statement, this statement allows you to create a sort of extended proprieties on the objects you create in the database.

Among this objects are:
– you can comment on:

  • COLUMN
  • CONSTRAINT
  • FUNCTION
  • LIBRARY
  • NODE
  • PROJECTION
  • SCHEMA
  • SEQUENCE
  • TABLE
  • TRANSFORM FUNCTION
  • VIEW

The advantage of using the comment statement/extended properties is that the documentation, notes, and so on stay with the databases and can be used as a “live documentation” mechanism. They are backed up with the database, and scripted out with the build scripts.

Let’s go and see some examples of how we can add comments to our objects.

Add comment to a schema:

Add comment on a table

Add comment on a projection