The {{ … }}
blocks can be placed before the first CREATE TABLE statement. These expressions would
be evaluated once, and will not be written into the generated files. This is useful to define global
constants used by all rows.
{{ @dirs := array['North', 'West', 'East', 'South'] }}
CREATE TABLE cardinals (
t INTEGER {{ rownum }},
d1 VARCHAR(5) {{ @dirs[rand.zipf(4, 0.8)] }},
d2 VARCHAR(5) {{ @dirs[rand.zipf(4, 0.8)] }}
);
Variables assigned in global expressions can be re-assigned, but the change is localized in the current file generator thread. Every new thread would be initialized by the same evaluated values. For instance if we generate 2 files given this template:
{{ @value := rand.range(0, 100000) }}
CREATE TABLE _ (
p INTEGER {{ @value }},
n INTEGER {{ @value := rand.range(0, 100000) }}
);
We may get
------ first file -------
INSERT INTO _ VALUES
(58405, 87322),
(87322, 41735),
(41735, 91701);
------ second file ------
INSERT INTO _ VALUES
(58405, 3046),
(3046, 8087),
(8087, 26211);
Note that the initial @value
are the same for both files (58405
), because rand.range()
is only
evaluated once. After generation started, though, each file acquires its own state and we see they
evaluate @value
differently without any interference.
In a relational database, contents of tables are related to each other, e.g.
CREATE TABLE "parent" (
"parent_id" UUID PRIMARY KEY,
"child_count" INT UNSIGNED NOT NULL
);
CREATE TABLE "child" (
"child_id" UUID PRIMARY KEY,
"parent_id" UUID NOT NULL REFERENCES "parent"("parent_id")
);
We want the two tables to be related such that:
child.parent_id
refer to real IDs in theparent
tableparent.child_count
is an actual count of rows inchild
table having the specifiedparent_id
.parent.child_count
are still random.
These two tables therefore must be generated together. dbgen
supports generating derived tables
from the previous tables with this syntax:
CREATE TABLE "parent" (
"parent_id" UUID PRIMARY KEY,
/*{{ @parent_id := rand.uuid() }}*/
"child_count" INT UNSIGNED NOT NULL
/*{{ @child_count := rand.range_inclusive(0, 4) }}*/
);
/*{{ for each row of "parent" generate @child_count rows of "child" }}*/
CREATE TABLE "child" (
"child_id" UUID PRIMARY KEY,
/*{{ rand.uuid() }}*/
"parent_id" UUID NOT NULL REFERENCES "parent"("parent_id")
/*{{ @parent_id }}*/
);
This may produce
------ parent.1.sql ------
INSERT INTO "parent" VALUES
('451b789a-3438-4d6b-847e-ac6bb0d61988', 0),
('55200ffe-2304-4b68-a1a8-8467fbcbb339', 4),
('0082fa2d-c553-46df-aa61-7182accf1ea7', 2),
('c488c641-a92e-405c-870b-1e10a213e456', 1),
…
------ child.1.sql -------
INSERT INTO "child" VALUES
('49188e47-d0da-4f1e-8c82-156138bb4887', '55200ffe-2304-4b68-a1a8-8467fbcbb339'),
('0251ec50-8039-4e59-a04f-fc8143a9d278', '55200ffe-2304-4b68-a1a8-8467fbcbb339'),
('4dddc583-b175-4814-a677-02fa4ec295b8', '55200ffe-2304-4b68-a1a8-8467fbcbb339'),
('fb8bab0d-8f3a-4cf8-891d-d2ad6e7aac28', '55200ffe-2304-4b68-a1a8-8467fbcbb339'),
('1feb2f81-6000-4191-8cc3-95acbd3f1723', '0082fa2d-c553-46df-aa61-7182accf1ea7'),
('63e44b85-1779-4508-9598-c94df3eee10e', '0082fa2d-c553-46df-aa61-7182accf1ea7'),
('77d13d62-12ea-4fe7-98c5-35cb0f1daece', 'c488c641-a92e-405c-870b-1e10a213e456'),
…
There can be multiple derived tables, and it can refer to any table before it as the generator.
CREATE TABLE A ( … );
/*{{ for each row of A generate 2 rows of B }}*/
CREATE TABLE B ( … );
/*{{ for each row of B generate 1 row of C }}*/
CREATE TABLE C ( … );
/*{{ for each row of A generate 4 rows of D }}*/
CREATE TABLE D ( … );
All derived rows share the same set of variables. Variables can be used to establish common values among the group of tables.
In a derived table, rownum
refers to the row number of the main table. If we generate 10 derived
rows for each main row, all 10 rows will produce the same rownum
.
You can distinguish between derived rows of the same rownum
using the subrownum
symbol, which
has values 1, 2, …, 10 if we generate 10 rows.
-- INPUT: template.sql
CREATE TABLE main ( … );
/*{{ for each row of main generate 3 rows of derived }}*/
CREATE TABLE derived (
rn INT /*{{ rownum }}*/,
srn INT /*{{ subrownum }}*/,
…
);
-- RESULT: derived.1.sql
INSERT INTO derived VALUES
(1, 1, …),
(1, 2, …),
(1, 3, …),
(2, 1, …),
(2, 2, …),
(2, 3, …),
…
With a derived table hierarchy, the rownum
always refer to the top table, and subrownum
always
refer to the current table. If you need the row numbers of the tables in between, store them into a
variable, e.g.
-- INPUT: template.sql
CREATE TABLE "top" ( top_id INT /*{{ rownum }}*/, … );
/*{{ for each row of "top" generate 2 rows of "middle" }}*/
CREATE TABLE "middle" ( middle_id INT /*{{ @middle_id := subrownum }}*/, … );
/*{{ for each row of "middle" generate 2 rows of "bottom" }}*/
CREATE TABLE "bottom" (
top_id INT /*{{ rownum }}*/,
middle_id INT /*{{ @middle_id }}*/,
bottom_id INT /*{{ subrownum }}*/,
…
);
Derived tables do not have individual --total-count
, --rows-per-file
and --rows-count
settings. In particular, if we set for each row of "main" generate N rows of "derived"
, the actual
number of rows per INSERT statements of the derived table will be N times --row-count
of the main
table (deeper derivatives will cascade). This may produce excessively large tables when the number
of rows to generate is huge. Therefore, if it is possible to generate the values independently, we
recommend using two separate templates instead of derived tables.
NOT recommended. File size of main and derived data cannot be balanced. | Recommended. File size of main and derived data can be balanced. |
-- template.sql
CREATE TABLE main (
main_id INT PRIMARY KEY {{ rownum }}
);
{{ for each row of main generate 3000 rows of derived }}
CREATE TABLE derived (
main_id INT NOT NULL {{ rownum }},
sub_id INT NOT NULL {{ subrownum }},
PRIMARY KEY (main_id, sub_id)
); |
-- main.sql
CREATE TABLE main (
main_id INT PRIMARY KEY {{ rownum }}
); -- derived.sql
CREATE TABLE derived (
main_id INT NOT NULL {{ div(rownum-1, 3000)+1 }},
sub_id INT NOT NULL {{ mod(rownum-1, 3000)+1 }},
PRIMARY KEY (main_id, sub_id)
); |