-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternate implementation for ClassName column #11358
Comments
Some basic test classes to look at enum performance on large databases <?php
use SilverStripe\ORM\DataObject;
class MyBaseDataObject extends DataObject
{
private static $table_name = 'MyBaseDataObject';
private static $db = [
'Title' => 'Varchar'
];
} <?php
use SilverStripe\ORM\DataObject;
use SilverStripe\ORM\DB;
class MyDataObjectA extends MyBaseDataObject
{
private static $table_name = 'MyDataObjectA';
private static $db = [
'SomeField' => 'Varchar'
];
public function requireDefaultRecords()
{
$num = 2000000; // total number of records that should exist (note cannot decrease)
$insert = 100000; // max inserts per query
$count = self::get()->count();
$diff = $num - $count;
$t = '2024-08-29 17:47:10';
parent::requireDefaultRecords();
if ($diff > 0) {
# use raw SQL to make insertions much faster
$loops = ceil($diff / $insert);
for ($loop = 0; $loop < $loops; $loop++) {
$sqlBase = [];
$sqlA = [];
for ($i = 1; $i <= $insert && $i <= $diff; $i++) {
$id = $count + ($loop * $insert) + $i;
$sqlBase[] = "($id, 'MyDataObjectA', 'My Data Object $id', '$t', '$t')";
$sqlA[] = "($id, 'Some value $id')";
}
if (!empty($sqlBase)) {
DB::query('INSERT INTO "MyBaseDataObject" ("ID", "ClassName", "Title", "LastEdited", "Created") VALUES ' . implode(',', $sqlBase) . ';');
DB::query('INSERT INTO "MyDataObjectA" ("ID", "SomeField") VALUES ' . implode(',', $sqlA) . ';
');
}
$diff -= $insert;
}
}
}
} <?php
use SilverStripe\CMS\Controllers\ContentController;
use SilverStripe\ORM\DB;
class PageController extends ContentController
{
protected function init()
{
parent::init();
// 1 million records 2.810 seconds though also 0.009 seconds after the first time
// 2 million records 5.444 seconds
// doesn't matter the number of enums, what matters is the number of records
if (isset($_GET['timer'])) {
$start = microtime(true);
$sql = <<<EOT
ALTER TABLE
`MyBaseDataObject`
MODIFY COLUMN
`ClassName` enum(
'MyDataObjectA',
'SomethingElseA',
'SomethingElseB',
'SomethingElseC',
'SomethingElseD',
'SomethingElseE',
'SomethingElseF',
'SomethingElseG',
'SomethingElseH'
)
NOT NULL AFTER `ID`;
EOT;
DB::query($sql);
$end = microtime(true);
$time = $end - $start;
// format to 3 decimal places
$time = number_format($time, 3);
echo $time;
die;
}
}
} |
https://github.com/emteknetnz/silverstripe-dev-build-benchmark can be used to check how long different queries take to run when doing a dev/build, including during deployments. Works for both CMS 4 and 5 Tested on a deployment to a test environment where there was some though not much existing data on the server to Running the following script to show percentages of time spent on each query <?php
$res = ['other' => 0];
$isHeader = true;
if (($handle = fopen("data.csv", "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
if ($isHeader) {
$isHeader = false;
continue;
}
[$id, $when, $time, $sql] = $data;
$sql = strtolower($sql);
$startsWiths = [
'ALTER TABLE',
'CHECK TABLE',
'SELECT COUNT(*)',
'SELECT DISTINCT',
'SHOW FULL FIELDS IN',
'SHOW FULL TABLES WHERE Table_Type',
'SHOW INDEXES IN',
'SHOW TABLE STATUS LIKE',
'SHOW TABLES LIKE',
'UPDATE',
];
$matched = false;
foreach ($startsWiths as $startsWith) {
if (str_starts_with(strtolower($sql), strtolower($startsWith))) {
$res[$startsWith] ??= 0;
$res[$startsWith] += $time;
$matched = true;
continue 2;
}
}
if (!$matched) {
$res['other'] += $time;
}
}
fclose($handle);
}
uksort($res, function ($a, $b) use ($res) {
return $res[$b] <=> $res[$a];
});
$percs = [];
echo "\nTime in seconds:\n";
foreach ($res as $key => $val) {
printf('%7.2f %s%s', $val, $key, PHP_EOL);
$percs[$key] = $val / array_sum($res) * 100;
}
echo "\nPercentages:\n";
foreach ($percs as $key => $val) {
printf('%6.1f%% %s%s', $val, $key, PHP_EOL);
}
|
PRs merged |
Very large sites with huge databases tables can take a very long time to deploy due to a slow dev/build
A known performance issue on these sorts of sites is the use of an Enum field to represent the polymorhpic ClassName on DataObject tables, which often needs to be updated via an ALTER TABLE query that performs very poorly at scale
Enums do have some advantages though:
A couple of possible alternative implementations:
- Use a lookup table instead, e.g. a table called "ClassNames" that has columns "ID", "AncestorClassName", "DescendantClassName", where "AncestorClassName" might be "SilverStripe\CMS\SiteTree" and "DescendantClassName" might be "Page". Replace the "ClassName" column on DataObject tables with "ClassNameID". Disadvantages here are the lose of string value in select statements e.g. SELECT * FROM $table, and the fact we need to JOIN an extra table on just about every query(this is honestly a bad solution, would result in worse website performance across the board)It might be worth putting some abstraction around how the ClassName column is implemented, defaulting to enum and letting projects switch to a different implementation as required
Notes
Acceptance criteria (added after implementation)
New issues created
Kitchen sink CI - using DBClassNameVarchar instead of DBClassName
PRs
The text was updated successfully, but these errors were encountered: