WorkFlow Process

What happens when the code is ran.

It creates a new partitioned table of the previously giant un-partitioned one a monthly schema.
This process reduces the stress in the production cost for inserts, select and deletes.
This process reduces table bloating, although it might increase the overall index size, but with an advantage as the index would not be dealing with unbalance Btree.
Once the table is partitioned, the instance will then needs to have a manual snapshot.
This snapshot will be the historical rds which can be restored to get the data for point in time.
Furthermore the current rds being partitioned can get rid of historical data by issuing drop commands to the child table, without impacting any process and resource

Flow

How to use

There are six variables that needs to be filled before execution in the conf

1. database_connection_string

A dictionary that contains 5 keys, and defines the connection parameter.

host : The connection endpoint for the rds.
port : The port number.
user : The user with the priviledge to select/insert/delete/alter on the tables.
passw : The password.
database : The working database.

2.table_list

A list of dictionaries where each dictionary should contains 9 keys, and is responsible for the archival of table. Each individual dictionary is reposible for the audit of one table only.

table_name : Defines the name of the table to be archived
temp_table_name : The intermediate table created by the process for the archival to happen
new_table_name : Placeholder name of the table where the partitioned data will be placed.
backup_table_name : Name for when the original table will be stored as a backup.
short_hand_name_for_child : Prefix for the child tables for the new partitioned table
id : Primary column name of the table, should be int, and auto-incremental
date_column_name : The comma separated column(s) for table partitioning in a monthly manner
new_date_column_name : Combined column of preference, eg. coalesce(date_column_name)
all_columns : A dictionary containing all the columns as key and datatype as value, excluding primary.

3. cfg_complete_run

A Boolean that when set to true will start from the very beginning of the process. It will recreate new master, child, and temp table. Will attempt to insert in new_master from the index 0.
Else, if it is false, the process will pick up from where it stopped/halted the last time by comparing the index in the new master and old master, and will puch the difference only from the max of index

4. cfg_are_you_sure

A boolean when set to true will attempt to replace the new master table with the old master table, this switcheroo will take place by

Locking the old master.
Renaming the old master to backup.
Renaming the new master to master.
Create function for new insert.
Create trigger for every new row insert calling the function
Changing sequence to current max+1.
Commit.

Else, will just print out the alter statements.

5. cfg_verified

A Boolean parameter required after the rename. When set to True will attempt to drop the old master, currently named as backup.
Dev input: Better to keep it False.

6. cfg_divisor

An integer that defines that for each run how many row should be considered.
Should be a multiple of 10, prefered 1000000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how_to.md

how_to.md

WorkFlow Process

Flow

How to use

1. database_connection_string

2.table_list

3. cfg_complete_run

4. cfg_are_you_sure

5. cfg_verified

6. cfg_divisor

Files

how_to.md

Latest commit

History

how_to.md

File metadata and controls

WorkFlow Process

Flow

How to use

1. database_connection_string

2.table_list

3. cfg_complete_run

4. cfg_are_you_sure

5. cfg_verified

6. cfg_divisor