How to use supavisor with asyncpg? #287

awalias · 2024-01-29T11:18:43Z

Discussed in https://github.com/orgs/supabase/discussions/20775

^{Originally posted by nsbradford January 26, 2024}
For reference:

asyncpg uses prepared statements despite they're disabled MagicStack/asyncpg#1058
Feature request: 'get_attributes' without prepared statements MagicStack/asyncpg#837
Using statement_cache_size asyncpg setting / prepared statement name for asyncpg w pgbouncer sqlalchemy/sqlalchemy#6467

TLDR: asyncpg is go-to Postgres client, but it has incompatibility with pgbouncer because of issues with prepared statements. Supabase uses it's own in-house Supavisor, but still seems to have the same issue.

There are workarounds if you have access to the pgbouncer instance, but there doesn't seem to be a way to configure Supavisor easily? Are there any other workarounds for this?

Workaround details: https://docs.sqlalchemy.org/en/20/dialects/postgresql.html#prepared-statement-name

To prevent a buildup of useless prepared statements in your application, it’s important to use the NullPool pool class, and to configure PgBouncer to use DISCARD when returning connections. The DISCARD command is used to release resources held by the db connection, including prepared statements. Without proper setup, prepared statements can accumulate quickly and cause performance issues.

nsbradford · 2024-02-08T23:08:59Z

Would love a solution/guidance here 🙏 so far the likely option is moving to Session mode, though given our Supabase Max client connections is fixed i would be pretty concerned about exhausting allowed connections

StacySiz · 2024-06-22T13:05:20Z

Any updates on this? Besides moving to sync Session mode?

gregm711 · 2024-06-25T08:15:29Z

Same, would love this functionality as well - Any updates?

nileshtrivedi · 2024-07-03T09:13:59Z

I ran into this too. I have IPv6 at home, so I can switch to port 5432 and bypass it, but at work where I don't have IPv6, I'm forced to use Supavisor which leads to this error:

An error occurred: (sqlalchemy.dialects.postgresql.asyncpg.Error) <class 'asyncpg.exceptions.InvalidSQLStatementNameError'>: prepared statement "asyncpg_stmt_7" does not exist
HINT:
NOTE: pgbouncer with pool_mode set to "transaction" or
"statement" does not support prepared statements properly.
You have two options:

if you are using pgbouncer for connection pooling to a
single server, switch to the connection pool functionality
provided by asyncpg, it is a much better option for this
purpose;

if you have no option of avoiding the use of pgbouncer,
then you can set statement_cache_size to 0 when creating
the asyncpg connection object.

[SQL: SELECT * FROM public.users WHERE identifier = $1]
[parameters: ('admin',)]
(Background on this error at: https://sqlalche.me/e/20/dbapi)

jordanparker6 · 2024-07-18T10:43:35Z

I ran into this too. I have IPv6 at home, so I can switch to port 5432 and bypass it, but at work where I don't have IPv6, I'm forced to use Supavisor which leads to this error:

An error occurred: (sqlalchemy.dialects.postgresql.asyncpg.Error) <class 'asyncpg.exceptions.InvalidSQLStatementNameError'>: prepared statement "asyncpg_stmt_7" does not exist
HINT:
NOTE: pgbouncer with pool_mode set to "transaction" or
"statement" does not support prepared statements properly.
You have two options:

if you are using pgbouncer for connection pooling to a
single server, switch to the connection pool functionality
provided by asyncpg, it is a much better option for this
purpose;

if you have no option of avoiding the use of pgbouncer,
then you can set statement_cache_size to 0 when creating
the asyncpg connection object.

[SQL: SELECT * FROM public.users WHERE identifier = $1]
[parameters: ('admin',)]
(Background on this error at: https://sqlalche.me/e/20/dbapi)

Running into the same issue.

ShravanSunder · 2024-08-16T15:11:57Z

seems to work with the below. you need the prepared_statement_name_func

      self.async_engine: AsyncEngine = create_async_engine(
        options.connection,
        future=True,
        json_deserializer=from_json,
        json_serializer=to_json_str,
        echo=options.echo,
        pool_size=16,
        max_overflow=128,
        connect_args={
          "statement_cache_size": 0,
          "prepared_statement_cache_size": 0,
          "prepared_statement_name_func": lambda: f"__asyncpg_{uuid7()}__",
        },
      )

ManuelAngel99 · 2024-10-29T17:59:25Z

Any news on this issue?

GrigoryEvko · 2024-11-02T19:09:38Z

from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession, async_sessionmaker
import uuid

db_url = f"postgresql+asyncpg://{DB_USER}:{DB_PASSWORD}@{DB_SERVER}:{DB_PORT}/postgres"

engine = create_async_engine(
    db_url,
    poolclass=pool.NullPool, # important setting #1 , but maybe it's not required actually
    future=True,
    connect_args={ # important settings for asyncpg
        "prepared_statement_name_func": lambda:  f"__asyncpg_{uuid.uuid4()}__",
        "statement_cache_size": 0,
        "prepared_statement_cache_size": 0,
    },
) 

async_session = async_sessionmaker(
    engine,
    class_=AsyncSession,
    expire_on_commit=False
)

And then use select method: https://stackoverflow.com/questions/68360687/sqlalchemy-asyncio-orm-how-to-query-the-database

Then it works fine.

ManuelAngel99 · 2024-11-22T16:09:27Z

Has anyone else noticed that queries run with poolclass=pool.NullPool are much slower than those executed using the postgREST client?

GrigoryEvko · 2024-11-22T20:44:12Z

Has anyone else noticed that queries run with poolclass=pool.NullPool are much slower than those executed using the postgREST client?

Yes, that's kind of expected
I had a long discussion in supabase support ticket with @TheOtherBrian1 on supavisor, I hope he can help here as well.

Part of email from him:

=================================================================
Prepared statements are only supported via SQL commands (PREPARE/EXECUTE). However, most client libraries initiate them at the protocol level.

Unfortunately, Supavisor cannot accommodate prepared statements with most ORMs. I created a pull request to update Supavisor's repo documentation.

A complete explanation of pooling

You do not have to read this explanation, it's just to provide a mental model for how connections work in Postgres.

Establishing a new PostgreSQL connection is both slow and resource-intensive.

I recently created connection benchmarks for a presentation I'm working on. In the first scenario, connections were pre-established and persistent; in the second, a new connection was opened for each query.

Queries using pre-established connections averaged 21.65 ms to complete, while those that opened a new connection took an average of 707.213 ms.

This difference is understandable: establishing a new connection involves multiple networking requests and the forking of a new process.

SQLAlchemy and other ORMs create pools specifically to avoid the "first connection" penalty. This is known as application-side pooling.

Serverside poolers, such as Supavisor, serve a different purpose.

Postgres forks a new process to handle each connection. This makes it more crash-resistant - if a single process fails, the overall system will stay online. Sadly, maintaining them is resource-intensive, and, more importantly, imposes an orchestration cost.

At some point, the benefit of having more connections is dwarfed by management overhead. EnterpriseDB graphed out how transactions per second begin to decline with excessive connections on an enterprise-sized server:

Because Postgres cannot handle an infinite amount of connections, serverside poolers were built to extract the most performance from the existing ones.

An app server may make a 20ms query, then perform some calculations, before requesting an external API. All of this may take place over the course of 400ms, but the database connection was left idle for 380ms. Because the server didn't release it, no other server could've taken advantage of the finite, but vital resource.

Supavisor, when in transaction mode, acts as a connection load balancer. It maintains a pool of connections and forwards incoming queries to whichever connection is available.

This is great for maximizing throughput, but Supavisor is a proxy. It adds network complexity and is another resource that has to be configured. If you do not need it, you shouldn't use it.

=================================================================

Honestly I was very impressed to receive an email with such detailed explanation, I suggested several updates to documentation, I hope it would be described somewhere publicly available, not only in my personal email!

ManuelAngel99 · 2024-11-24T11:35:36Z

Has anyone else noticed that queries run with poolclass=pool.NullPool are much slower than those executed using the postgREST client?

Yes, that's kind of expected I had a long discussion in supabase support ticket with @TheOtherBrian1 on supavisor, I hope he can help here as well.

Part of email from him:

================================================================= Prepared statements are only supported via SQL commands (PREPARE/EXECUTE). However, most client libraries initiate them at the protocol level.

Unfortunately, Supavisor cannot accommodate prepared statements with most ORMs. I created a pull request to update Supavisor's repo documentation.

A complete explanation of pooling

You do not have to read this explanation, it's just to provide a mental model for how connections work in Postgres.

Establishing a new PostgreSQL connection is both slow and resource-intensive.

I recently created connection benchmarks for a presentation I'm working on. In the first scenario, connections were pre-established and persistent; in the second, a new connection was opened for each query.

Queries using pre-established connections averaged 21.65 ms to complete, while those that opened a new connection took an average of 707.213 ms.

This difference is understandable: establishing a new connection involves multiple networking requests and the forking of a new process.

SQLAlchemy and other ORMs create pools specifically to avoid the "first connection" penalty. This is known as application-side pooling.

Serverside poolers, such as Supavisor, serve a different purpose.

Postgres forks a new process to handle each connection. This makes it more crash-resistant - if a single process fails, the overall system will stay online. Sadly, maintaining them is resource-intensive, and, more importantly, imposes an orchestration cost.

At some point, the benefit of having more connections is dwarfed by management overhead. EnterpriseDB graphed out how transactions per second begin to decline with excessive connections on an enterprise-sized server:

Because Postgres cannot handle an infinite amount of connections, serverside poolers were built to extract the most performance from the existing ones.

An app server may make a 20ms query, then perform some calculations, before requesting an external API. All of this may take place over the course of 400ms, but the database connection was left idle for 380ms. Because the server didn't release it, no other server could've taken advantage of the finite, but vital resource.

Supavisor, when in transaction mode, acts as a connection load balancer. It maintains a pool of connections and forwards incoming queries to whichever connection is available.

This is great for maximizing throughput, but Supavisor is a proxy. It adds network complexity and is another resource that has to be configured. If you do not need it, you shouldn't use it.

=================================================================

Honestly I was very impressed to receive an email with such detailed explanation, I suggested several updates to documentation, I hope it would be described somewhere publicly available, not only in my personal email!

Thank you so much for the reply @GrigoryEvko. I still do not get why using nullpool slows down the queries so much provided that we are connecting to supavisor rather than to postgres directly. I may not be getting the complete picture here, but I would expect that since supavisor keeps it own pool of connections to postgres ready to use, connecting to it every time (when using NullPool) would not introduce a high latency to the system, which it apparently does.

awalias transferred this issue from supabase/supabase Jan 29, 2024

J0 added the documentation Improvements or additions to documentation label Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use supavisor with asyncpg? #287

How to use supavisor with asyncpg? #287

awalias commented Jan 29, 2024

nsbradford commented Feb 8, 2024

StacySiz commented Jun 22, 2024

gregm711 commented Jun 25, 2024

nileshtrivedi commented Jul 3, 2024

jordanparker6 commented Jul 18, 2024

ShravanSunder commented Aug 16, 2024

ManuelAngel99 commented Oct 29, 2024

GrigoryEvko commented Nov 2, 2024 •

edited

Loading

ManuelAngel99 commented Nov 22, 2024 •

edited

Loading

GrigoryEvko commented Nov 22, 2024 •

edited

Loading

ManuelAngel99 commented Nov 24, 2024

How to use supavisor with asyncpg? #287

How to use supavisor with asyncpg? #287

Comments

awalias commented Jan 29, 2024

Discussed in https://github.com/orgs/supabase/discussions/20775

nsbradford commented Feb 8, 2024

StacySiz commented Jun 22, 2024

gregm711 commented Jun 25, 2024

nileshtrivedi commented Jul 3, 2024

jordanparker6 commented Jul 18, 2024

ShravanSunder commented Aug 16, 2024

ManuelAngel99 commented Oct 29, 2024

GrigoryEvko commented Nov 2, 2024 • edited Loading

ManuelAngel99 commented Nov 22, 2024 • edited Loading

GrigoryEvko commented Nov 22, 2024 • edited Loading

ManuelAngel99 commented Nov 24, 2024

GrigoryEvko commented Nov 2, 2024 •

edited

Loading

ManuelAngel99 commented Nov 22, 2024 •

edited

Loading

GrigoryEvko commented Nov 22, 2024 •

edited

Loading