Skip to content

abhushansahu/Postgres-partitioner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Partitioner

Project to handle table partitioning task so that data can be processed faster in postgres environment. It can also be used as Data Archival if followed the PostRun step below.

Pre-requisite

  1. Python environment(preferably 3.7)
  2. Virtual env with packages installed from requirements.txt
  3. Host machine with access to postgres instance endpoint
  4. The instance should be checked for the storage and must be set to dynamically increase the storage, at most twice the current size depending on the run cycle

Preferably when the postgres instance is running as an RDS instance in AWS

Process

Checkout the How To? file to understand the process.

Post Run

Once the process is completed and all tests are done. We need to take the manual snapshot of rds, serving as a historical point in time recovery and drop past child in current master.

Test Performed

A test was performed on the rds "test table" as reference table of 50 cols of size 225 gigs, (needed extra 250gb for new data inserts.)
Process took about 10-14 hours to get completed.
Switch masters was run separately will the script running and creating a process of rapid data insertion.
Master switch took at max 1 min to get completed.

Concerns

After the switch was performed, there is some data diff between the old master and new master.
It can be solved by changing the sequence id of new master to max+1 of old master. And then inserting the diff records seaprately with defined primary column in the insert statements.

About

Postgres database table partitioning in Python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published