Project to handle table partitioning task so that data can be processed faster in postgres environment. It can also be used as Data Archival if followed the PostRun step below.
- Python environment(preferably 3.7)
- Virtual env with packages installed from requirements.txt
- Host machine with access to postgres instance endpoint
- The instance should be checked for the storage and must be set to dynamically increase the storage, at most twice the current size depending on the run cycle
Preferably when the postgres instance is running as an RDS instance in AWS
Checkout the How To? file to understand the process.
Once the process is completed and all tests are done. We need to take the manual snapshot of rds, serving as a historical point in time recovery and drop past child in current master.
A test was performed on the rds "test table" as reference table of 50 cols of size 225 gigs, (needed extra 250gb for new data inserts.)
Process took about 10-14 hours to get completed.
Switch masters was run separately will the script running and creating a process of rapid data insertion.
Master switch took at max 1 min to get completed.
After the switch was performed, there is some data diff between the old master and new master.
It can be solved by changing the sequence id of new master to max+1 of old master. And then inserting the diff records seaprately with defined primary column in the insert statements.