You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This isn't a typical scenario, however: I manage a site with a little over 4.3k pages. I found that with the proper caching applied, you can dramatically reduce server-side load. For example, some areas that could use improvement:
Caching the filtered results from calls to ->canIncludeInGoogleSitemap() somehow
Add <% cached ... %> blocks for partial template caching to save overhead in generating XML
Incorporating sane defaults in cache-control headers (minor benefit if you have a CDN and your sitemap.xml is popular)
The performance boost is particularly noticeable if you happen to have any logic placed inside of ->canIncludeInGoogleSitemap() on your pages. Even if not, caching these calls can help a lot due to the overhead of going through the call stack to hit the extension and then the overhead of the call in the extension version of that method itself. The extension's own ->canIncludeInGoogleSitemap() then calls ->hasPublishedParent() recursively down the page hierarchy, which is an expensive operation when done 4.3k times (or maybe 500 times per page, for example), particularly if the site content has a lot of organizational depth.
On my site, after some quick hacking, I found that I could reduce load time from ~4s 100% of the time down to only ~4s on the first hit, then ~300ms on subsequent loads as long no SiteTree instances listed in the current sitetree.xml page have been modified (since you want to spoil the cache if there were any modifications that could have impacted the result of that function call). This would be especially beneficial in a distributed cluster where the cache layer is utilizing a distributed k/v cache like Memcached or Redis or whatever.
Two main approaches:
My first approach was to just cache on request, e.g. take current paginated page number combined with the class being looked at and verify that the list of instances are the same as the last time we cached output results and, if so, just use the last cached output results. This can be done super quickly by just using ->column(...) on the list to get certain fields to spoil/vary the caching.
Alternatively, you could possibly cache the value of ->canIncludeinGoogleSitemap() on ->write() in a special column on the root SiteTree object itself instead of calling it all the time. The benefit here is you get extremely fast performance by just filtering on the result of that column. The downside is that you lose the benefit of any complicated side-effects from ->canIncludeGoogleSitemap() that don't rely purely on the state of thatDataObject (e.g. say it relies on the state of some other unrelated object or a global value). So, doing this would be a breaking change, I'd immagine.
Just some ideas. I would have submitted a PR, however my site is still stuck on SilverStripe 3.x so there's no point. I'm also just going to leave my modified version of the extension (mentioned above in my testing) out of production code since we have a pretty performant cluster, send proper HTTP cache headers (via an extension to GoogleSitemapController) have a CDN that will force clients to use the cached version (in case a bot sees this weakness and uses it to pound our server, for example).
The text was updated successfully, but these errors were encountered:
@patricknelson happy to see any improvements you've got. If you want to email anything through, I can adapt for v4.
Even though caching logic can be quite site specific, for this usage 95% of the time we can rely on the LastEdited timestamp on the record. Would need some logical API's to override whatever the system does out of the box but seems an easy improvement.
This isn't a typical scenario, however: I manage a site with a little over 4.3k pages. I found that with the proper caching applied, you can dramatically reduce server-side load. For example, some areas that could use improvement:
->canIncludeInGoogleSitemap()
somehow<% cached ... %>
blocks for partial template caching to save overhead in generating XMLsitemap.xml
is popular)The performance boost is particularly noticeable if you happen to have any logic placed inside of
->canIncludeInGoogleSitemap()
on your pages. Even if not, caching these calls can help a lot due to the overhead of going through the call stack to hit the extension and then the overhead of the call in the extension version of that method itself. The extension's own->canIncludeInGoogleSitemap()
then calls->hasPublishedParent()
recursively down the page hierarchy, which is an expensive operation when done 4.3k times (or maybe 500 times per page, for example), particularly if the site content has a lot of organizational depth.On my site, after some quick hacking, I found that I could reduce load time from ~4s 100% of the time down to only ~4s on the first hit, then ~300ms on subsequent loads as long no
SiteTree
instances listed in the currentsitetree.xml
page have been modified (since you want to spoil the cache if there were any modifications that could have impacted the result of that function call). This would be especially beneficial in a distributed cluster where the cache layer is utilizing a distributed k/v cache likeMemcached
orRedis
or whatever.Two main approaches:
->column(...)
on the list to get certain fields to spoil/vary the caching.->canIncludeinGoogleSitemap()
on->write()
in a special column on the rootSiteTree
object itself instead of calling it all the time. The benefit here is you get extremely fast performance by just filtering on the result of that column. The downside is that you lose the benefit of any complicated side-effects from->canIncludeGoogleSitemap()
that don't rely purely on the state of thatDataObject
(e.g. say it relies on the state of some other unrelated object or a global value). So, doing this would be a breaking change, I'd immagine.Just some ideas. I would have submitted a PR, however my site is still stuck on SilverStripe 3.x so there's no point. I'm also just going to leave my modified version of the extension (mentioned above in my testing) out of production code since we have a pretty performant cluster, send proper HTTP cache headers (via an extension to
GoogleSitemapController
) have a CDN that will force clients to use the cached version (in case a bot sees this weakness and uses it to pound our server, for example).The text was updated successfully, but these errors were encountered: