Running the default workload, causes disk to get filled up causing DB to crash #547

johnyvincent · 2023-04-24T07:56:17Z

johnyvincent
Apr 24, 2023

Hello,

I am running HammerDB with Postgresql to stress test SSDs. We want to run the workload for approximately an year. After schema creation is complete, in the main workload we create different number of virtual users and run workloads for 1.5 hrs before destroying virtual users and then starting with a different VU count. This is configured as an infinite loop as we want to monitor our disks performance over a very long time. However, we notice that the disk soon gets filled up. Ideally we would like to have a 60% read, 30% update and 10% read-modify-write operation. I took a quick look at the pgoltp.tcl file and saw that NEWORD, and PAYMENT workloads have INSERT operations which creates new records and eventually fills up the disk.
If anyone has seen a similar issue, could you please suggest how you fixed it? I could just go and comment out the INSERT operations, but I am not sure, if that would break something else, since I am not very well aware of the code.

My driver script is pasted below.

dbset bm TPC-C;
diset connection pg_host pgsql-service;
diset tpcc pg_raiseerror true;
diset tpcc pg_driver timed;
diset tpcc pg_allwarehouse true;
diset tpcc pg_timeprofile true;
diset tpcc pg_rampup 1;
diset tpcc pg_duration 30;
diset tpcc pg_storedprocs true;
diset tpcc pg_total_iterations 5000000;
tcset logtotemp 1;
tcset timestamps 1;
tcset refreshrate 10;
loadscript;
puts "TEST SEQUENCE STARTED";
while {1} {;
foreach z { 50 100 120 40 80 110 70 60 90 } {;
puts "$$z VU TEST";
vuset vu $$z;
vucreate;
tcstart;
vurun;
runtimer 5400;
vudestroy;
after 1000;
puts "TEST SEQUENCE COMPLETE";

Thanks for your help in advance

Answered by sm-shaw

Apr 24, 2023

Hi, The TPC-C workload that TPROC-C is based on is designed to grow during the test. If you look in the specification https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_v5.11.0.pdf in section 6.2.3 then this describes how to factor in the growth (although HammerDB is not exactly the same).
Firstly, it should be said that with PostgreSQL you will also have WAL files as well as your data, so you should look into the PostgreSQL documentation to manage the WAL.
You are correct that the stored procedures are interdependent, so you shouldn't just comment out SQL and expect everything to work. (I have for example seen workloads do this that end up running SQL that returns no rows found).
…

View full answer

sm-shaw · 2023-04-24T11:33:12Z

sm-shaw
Apr 24, 2023
Maintainer

Hi, The TPC-C workload that TPROC-C is based on is designed to grow during the test. If you look in the specification https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_v5.11.0.pdf in section 6.2.3 then this describes how to factor in the growth (although HammerDB is not exactly the same).
Firstly, it should be said that with PostgreSQL you will also have WAL files as well as your data, so you should look into the PostgreSQL documentation to manage the WAL.
You are correct that the stored procedures are interdependent, so you shouldn't just comment out SQL and expect everything to work. (I have for example seen workloads do this that end up running SQL that returns no rows found).
As an approximate guide, the schema build will use 100MB per warehouse and checking this with PostgreSQL this is a good guide:

20 warehouses 
2.0G	base/
2.0G	total

Being more specific to check the dynamic tables straight after the build, 831MB of this 2GB is in the dynamic tables.

tpcc=# select pg_size_pretty(pg_total_relation_size('ORDERS'));
 pg_size_pretty 
----------------
 71 MB
(1 row)

tpcc=# select pg_size_pretty(pg_total_relation_size('ORDER_LINE'));
 pg_size_pretty 
----------------
 714 MB
(1 row)

tpcc=# select pg_size_pretty(pg_total_relation_size('HISTORY'));
 pg_size_pretty 
----------------
 46 MB
(1 row)

Then after a 5 min run at 96699 NOPM these tables grew to 1610MB so growth of 779MB for what is a small configuration. (The formula does predict similar growth).

tpcc=# select pg_size_pretty(pg_total_relation_size('ORDERS'));
 pg_size_pretty 
----------------
 138 MB
(1 row)

tpcc=# select pg_size_pretty(pg_total_relation_size('ORDER_LINE'));
 pg_size_pretty 
----------------
 1390 MB
(1 row)

tpcc=# select pg_size_pretty(pg_total_relation_size('HISTORY'));
 pg_size_pretty 
----------------
 82 MB
(1 row)

So by default you can run some experiments and predict how much space you will need. For a year it is going to be quite significant.
One INSERT you can safely comment out is the history table as this doesn't get queried after the insert.
Another thing you can do is look at the asynchronous clients with key and think time. The transaction rate will be lower however more connections means more of the data area will be active.
Otherwise you many need to refresh the database periodically to keep the data growth manageable. (see the example scripts to automate build, run and delete).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running the default workload, causes disk to get filled up causing DB to crash #547

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Running the default workload, causes disk to get filled up causing DB to crash #547

johnyvincent Apr 24, 2023

Replies: 1 comment

sm-shaw Apr 24, 2023 Maintainer

johnyvincent
Apr 24, 2023

sm-shaw
Apr 24, 2023
Maintainer