You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It appears there's a significant issue with newtype Query = Query ByteString. Namely, heap fragmentation.
See haskell/bytestring#193 — TL;DR: the strict ByteString uses ForeignPtr as the underlying type (and that's unlikely to change anytime soon). And it means that all long-lived ByteStrings are pinned in memory — GC is not allowed to move them, which affects heap compaction.
In some usage patterns, the heap will expand and turn into a metaphorical colander or sieve: with pinned allocations sprinkled around with wide unused gaps in-between. In a case I've got at hand currently, a DB-heavy test scenario causes large increase in heap blocks (pages) with <10% utilization. Following @mpickering's work, I tend to agree it's a symptom of fragmentation. Using ghc-debug, I've found pinned BS constructors in lowest-utilization heap blocks; among the worst examples, 6-byte COMMIT pinning a whole 4096-byte page.
No need to reach far — Yesod's persistent-postgresql performs query caching which permanently¹ saves prepared statements in a Map Text Statement in memory. The Text key is good — unlike ByteString, it's not pinned — what's bad is this line where persistent-postgresql is forced to pre-encode the Text query to a ByteString so that Query of this package can be constructed. That Statement closure then goes into the StatementCache, and the original SQL query is "forever" retained twice:
as a Text in the map key,
as a pinned ByteString in Query argument in Statement closure environment.
The latter is really bad.
What are your thoughts about changing to newtype Query = Query ShortByteString ? Docs of ShortByteString explain it's not contributing to heap fragmentation, like ByteString does.
Ideally though, newtype Query = Query Text would be best IMO — enabling sharing of the same Text form of query which StatementCache keeps already anyway. Any reasons to not go with Text I'm missing?
¹ until the cache gets explicitly cleared, which can be done with Internal APIs there.
The text was updated successfully, but these errors were encountered:
I'm testing a conduit-streaming content from ~thousands of Large Objects per request in a Yesod webapp. Repro not easy to minimize.
What's observed is RSS (OS memory) growth — while "heap live bytes" metric is flat.
I agree — there's only a small amount of queries in the StatementCache, it's a small contribution but definitely part of the problem. There're other pieces which contribute too; I just have to start somewhere.
Edit: @phadej this app has hundreds different queries. I'm certainly not generating them dynamically. My targeted repro test raises StatementCache from 11 entries to 15 and it stays constant at 15 as expected on further iterations. Still, the pinned queries do retain memory.
Hey @phadej 👋
It appears there's a significant issue with
newtype Query = Query ByteString
. Namely, heap fragmentation.See haskell/bytestring#193 — TL;DR: the strict
ByteString
usesForeignPtr
as the underlying type (and that's unlikely to change anytime soon). And it means that all long-lived ByteStrings are pinned in memory — GC is not allowed to move them, which affects heap compaction.In some usage patterns, the heap will expand and turn into a metaphorical colander or sieve: with pinned allocations sprinkled around with wide unused gaps in-between. In a case I've got at hand currently, a DB-heavy test scenario causes large increase in heap blocks (pages) with <10% utilization. Following @mpickering's work, I tend to agree it's a symptom of fragmentation. Using
ghc-debug
, I've found pinnedBS
constructors in lowest-utilization heap blocks; among the worst examples, 6-byteCOMMIT
pinning a whole 4096-byte page.No need to reach far — Yesod's
persistent-postgresql
performs query caching which permanently¹ saves prepared statements in aMap Text Statement
in memory. TheText
key is good — unlike ByteString, it's not pinned — what's bad is this line wherepersistent-postgresql
is forced to pre-encode the Text query to a ByteString so that Query of this package can be constructed. ThatStatement
closure then goes into the StatementCache, and the original SQL query is "forever" retained twice:Query
argument inStatement
closure environment.The latter is really bad.
What are your thoughts about changing to
newtype Query = Query ShortByteString
? Docs of ShortByteString explain it's not contributing to heap fragmentation, like ByteString does.Ideally though,
newtype Query = Query Text
would be best IMO — enabling sharing of the same Text form of query which StatementCache keeps already anyway. Any reasons to not go with Text I'm missing?¹ until the cache gets explicitly cleared, which can be done with
Internal
APIs there.The text was updated successfully, but these errors were encountered: