Vertica database Storage Concepts

Write optimized Storage (WOS)

  • row store , memory based method to add data to vertica.
  • is used to reduce latency.
  • is used to recive large numbers of transactions containing small amounts of data ,is also called TRICKLE-LOAD

Commands to load data in WOS

  • LCOPY -used with ODBC loads
  • COPY -used with other types of loads

Read optimized Storage (ROS)

  • column store , disk based method to add data to vertica.
  • is used in to recive large amounts of data called BULK LOAD .
LCOPY DIRECT-used with ODBC loads COPY DIRECT-used with other types of loads

LOAD Commands :

  • LCOPY - best to use this command when moving data from a remote location using ODBC.
  • COPY - best to use this command when moving data from within the cluster.
After the commands run the data will be redistibuted across the db nodes. Data is moved from WOS to ROS by the TUPLE MOVER , the data movement is called "moveout" -the data in WOS is:
  • uncompressed
  • unsorted
  • sorted in rows
  • sorted in insertion order
During the moveout the data is transformed so that will be (commpresed,sorted,converted in columnar format).
// ]]>

Database parameters that control the behavior of the "moveout" process:

  • MoveOutInterval(-will check the WOS fill levels)
-is 5 min as default
  • MoveOutMaxAgeTime(-how long data can reside on the WOS before being moved out to ROS)
  • default is 30 min.
  • MoveOutSizePct (- he percentage that is required for the WOS to move data to ROS)
-default is 0% (data will always move to ROS)
  • WOSDATA resource pool parameters (set adjustments in memory area and priority for the WOS activities)

Adding data in Vertica

  • INSERT
  • COPY(to WOS and ROS)
  • LCOPY

INSERT command

  • is used frequentlly and it has high overhead.
  • is stored in the WOS unless you use the /*+DIRECT*/ is used.
Ex:
   INSERT [/*+DIRECT*/]... INTO SCHEMA-NAME.TABLE-NAME ....;

COPY command

-used for bulk loads directly to ROS(to disk). options:
  • DIRECT - trickle loads of frequent data contained into smaller amounts of data to WOS(in memory)
  • AUTO - fast approach of loading a datawarehouse.
  • bulk loads data from files to Vertica.
// ]]>

COPY DIRECT command

  • used for very large loads
  • is more efficient then AUTO
  • the WOS is not involved
  • no MoveOuts of data from WOS to ROS is done.
  • data in ROS is fully available and optimized upon commit.
To many ROS containers can cause ROS pushback(-this occurs when the maximum container number is reached -to monitore this use the ros_count from projection_storage table).

ROS pushback solutions :

1-increase the MAXCONCURRENCY in the Tuple Mover Resource pool. 2-reduce partitions in the load table . 3-stop the load and allow the database to process and merge containers.

UPDATEING data in VERTICA

  • DELETE
  • UPDATE

DELETE

  • deleted data is not immediatly removed from the DB.
  • the records are marked fro delete by the "delete vector".
  • the objects that are marked by the delete vector can not be queryed thow they are phisicly present .
  • it provides faster way of removing the data using the vectors marking then physical deletion
  • "Purge command " will remove files that are beyond AHM (ancient history mark)
  • purge can be run manually.

  • purge is run automaticly on margeout.

UPDATE

  • update is an insert+delete combo.
// ]]>

PARTITIONING

  • Auto-partion in Vertica
  • Partition pruning

Auto-partion in Vertica

  • in Vertica you can setup our schema to have partioned tables
  • when loading data into partitioned tables multiple ROS containers will be created since a container can contain data from a single partition only.
  • partition should not go beyond 700 per projection.

Post-Loading Tasks

Update the Statistics - when to run
  • this should be done after a initial load
  • if data was changed more then 50% from last statistics collection.
  • when query plans change overtime.

How to update statistics:

  • we can run statistics for a single table or for all tables :
     Single table :
     vsql:select analize_statistics('table_name');
  All tables :
     vsql:select analize_statistics('');

Share this with everybody