Skip to content

Latest commit

 

History

History
63 lines (51 loc) · 3.53 KB

how_to.md

File metadata and controls

63 lines (51 loc) · 3.53 KB

WorkFlow Process

What happens when the code is ran.

It creates a new partitioned table of the previously giant un-partitioned one a monthly schema.
This process reduces the stress in the production cost for inserts, select and deletes.
This process reduces table bloating, although it might increase the overall index size, but with an advantage as the index would not be dealing with unbalance Btree.
Once the table is partitioned, the instance will then needs to have a manual snapshot.
This snapshot will be the historical rds which can be restored to get the data for point in time.
Furthermore the current rds being partitioned can get rid of historical data by issuing drop commands to the child table, without impacting any process and resource

Flow

Process workflow

How to use

There are six variables that needs to be filled before execution in the conf

1. database_connection_string

A dictionary that contains 5 keys, and defines the connection parameter.

  1. host : The connection endpoint for the rds.
  2. port : The port number.
  3. user : The user with the priviledge to select/insert/delete/alter on the tables.
  4. passw : The password.
  5. database : The working database.

2.table_list

A list of dictionaries where each dictionary should contains 9 keys, and is responsible for the archival of table. Each individual dictionary is reposible for the audit of one table only.

  1. table_name : Defines the name of the table to be archived
  2. temp_table_name : The intermediate table created by the process for the archival to happen
  3. new_table_name : Placeholder name of the table where the partitioned data will be placed.
  4. backup_table_name : Name for when the original table will be stored as a backup.
  5. short_hand_name_for_child : Prefix for the child tables for the new partitioned table
  6. id : Primary column name of the table, should be int, and auto-incremental
  7. date_column_name : The comma separated column(s) for table partitioning in a monthly manner
  8. new_date_column_name : Combined column of preference, eg. coalesce(date_column_name)
  9. all_columns : A dictionary containing all the columns as key and datatype as value, excluding primary.

3. cfg_complete_run

A Boolean that when set to true will start from the very beginning of the process. It will recreate new master, child, and temp table. Will attempt to insert in new_master from the index 0.
Else, if it is false, the process will pick up from where it stopped/halted the last time by comparing the index in the new master and old master, and will puch the difference only from the max of index

4. cfg_are_you_sure

A boolean when set to true will attempt to replace the new master table with the old master table, this switcheroo will take place by

  1. Locking the old master.
  2. Renaming the old master to backup.
  3. Renaming the new master to master.
  4. Create function for new insert.
  5. Create trigger for every new row insert calling the function
  6. Changing sequence to current max+1.
  7. Commit.

Else, will just print out the alter statements.

5. cfg_verified

A Boolean parameter required after the rename. When set to True will attempt to drop the old master, currently named as backup.
Dev input: Better to keep it False.

6. cfg_divisor

An integer that defines that for each run how many row should be considered.
Should be a multiple of 10, prefered 1000000