Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use supavisor with asyncpg? #287

Open
awalias opened this issue Jan 29, 2024 · 11 comments
Open

How to use supavisor with asyncpg? #287

awalias opened this issue Jan 29, 2024 · 11 comments
Labels
documentation Improvements or additions to documentation

Comments

@awalias
Copy link
Member

awalias commented Jan 29, 2024

Discussed in https://github.com/orgs/supabase/discussions/20775

Originally posted by nsbradford January 26, 2024
For reference:

TLDR: asyncpg is go-to Postgres client, but it has incompatibility with pgbouncer because of issues with prepared statements. Supabase uses it's own in-house Supavisor, but still seems to have the same issue.

There are workarounds if you have access to the pgbouncer instance, but there doesn't seem to be a way to configure Supavisor easily? Are there any other workarounds for this?

Workaround details: https://docs.sqlalchemy.org/en/20/dialects/postgresql.html#prepared-statement-name

To prevent a buildup of useless prepared statements in your application, it’s important to use the NullPool pool class, and to configure PgBouncer to use DISCARD when returning connections. The DISCARD command is used to release resources held by the db connection, including prepared statements. Without proper setup, prepared statements can accumulate quickly and cause performance issues.

@awalias awalias transferred this issue from supabase/supabase Jan 29, 2024
@J0 J0 added the documentation Improvements or additions to documentation label Jan 30, 2024
@nsbradford
Copy link

Would love a solution/guidance here 🙏 so far the likely option is moving to Session mode, though given our Supabase Max client connections is fixed i would be pretty concerned about exhausting allowed connections

@StacySiz
Copy link

Any updates on this? Besides moving to sync Session mode?

@gregm711
Copy link

Same, would love this functionality as well - Any updates?

@nileshtrivedi
Copy link

I ran into this too. I have IPv6 at home, so I can switch to port 5432 and bypass it, but at work where I don't have IPv6, I'm forced to use Supavisor which leads to this error:

An error occurred: (sqlalchemy.dialects.postgresql.asyncpg.Error) <class 'asyncpg.exceptions.InvalidSQLStatementNameError'>: prepared statement "asyncpg_stmt_7" does not exist
HINT:
NOTE: pgbouncer with pool_mode set to "transaction" or
"statement" does not support prepared statements properly.
You have two options:

  • if you are using pgbouncer for connection pooling to a
    single server, switch to the connection pool functionality
    provided by asyncpg, it is a much better option for this
    purpose;

  • if you have no option of avoiding the use of pgbouncer,
    then you can set statement_cache_size to 0 when creating
    the asyncpg connection object.

[SQL: SELECT * FROM public.users WHERE identifier = $1]
[parameters: ('admin',)]
(Background on this error at: https://sqlalche.me/e/20/dbapi)

@jordanparker6
Copy link

I ran into this too. I have IPv6 at home, so I can switch to port 5432 and bypass it, but at work where I don't have IPv6, I'm forced to use Supavisor which leads to this error:

An error occurred: (sqlalchemy.dialects.postgresql.asyncpg.Error) <class 'asyncpg.exceptions.InvalidSQLStatementNameError'>: prepared statement "asyncpg_stmt_7" does not exist
HINT:
NOTE: pgbouncer with pool_mode set to "transaction" or
"statement" does not support prepared statements properly.
You have two options:

  • if you are using pgbouncer for connection pooling to a
    single server, switch to the connection pool functionality
    provided by asyncpg, it is a much better option for this
    purpose;
  • if you have no option of avoiding the use of pgbouncer,
    then you can set statement_cache_size to 0 when creating
    the asyncpg connection object.

[SQL: SELECT * FROM public.users WHERE identifier = $1]
[parameters: ('admin',)]
(Background on this error at: https://sqlalche.me/e/20/dbapi)

Running into the same issue.

@ShravanSunder
Copy link

seems to work with the below. you need the prepared_statement_name_func

      self.async_engine: AsyncEngine = create_async_engine(
        options.connection,
        future=True,
        json_deserializer=from_json,
        json_serializer=to_json_str,
        echo=options.echo,
        pool_size=16,
        max_overflow=128,
        connect_args={
          "statement_cache_size": 0,
          "prepared_statement_cache_size": 0,
          "prepared_statement_name_func": lambda: f"__asyncpg_{uuid7()}__",
        },
      )

@ManuelAngel99
Copy link

Any news on this issue?

@GrigoryEvko
Copy link

GrigoryEvko commented Nov 2, 2024

from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession, async_sessionmaker
import uuid

db_url = f"postgresql+asyncpg://{DB_USER}:{DB_PASSWORD}@{DB_SERVER}:{DB_PORT}/postgres"

engine = create_async_engine(
    db_url,
    poolclass=pool.NullPool, # important setting #1 , but maybe it's not required actually
    future=True,
    connect_args={ # important settings for asyncpg
        "prepared_statement_name_func": lambda:  f"__asyncpg_{uuid.uuid4()}__",
        "statement_cache_size": 0,
        "prepared_statement_cache_size": 0,
    },
) 

async_session = async_sessionmaker(
    engine,
    class_=AsyncSession,
    expire_on_commit=False
)

And then use select method: https://stackoverflow.com/questions/68360687/sqlalchemy-asyncio-orm-how-to-query-the-database

Then it works fine.

@ManuelAngel99
Copy link

ManuelAngel99 commented Nov 22, 2024

Has anyone else noticed that queries run with poolclass=pool.NullPool are much slower than those executed using the postgREST client?

@GrigoryEvko
Copy link

GrigoryEvko commented Nov 22, 2024

Has anyone else noticed that queries run with poolclass=pool.NullPool are much slower than those executed using the postgREST client?

Yes, that's kind of expected
I had a long discussion in supabase support ticket with @TheOtherBrian1 on supavisor, I hope he can help here as well.

Part of email from him:

=================================================================
Prepared statements are only supported via SQL commands (PREPARE/EXECUTE). However, most client libraries initiate them at the protocol level.

Unfortunately, Supavisor cannot accommodate prepared statements with most ORMs. I created a pull request to update Supavisor's repo documentation.

A complete explanation of pooling

You do not have to read this explanation, it's just to provide a mental model for how connections work in Postgres.

Establishing a new PostgreSQL connection is both slow and resource-intensive.

I recently created connection benchmarks for a presentation I'm working on. In the first scenario, connections were pre-established and persistent; in the second, a new connection was opened for each query.
unnamed

Queries using pre-established connections averaged 21.65 ms to complete, while those that opened a new connection took an average of 707.213 ms.

This difference is understandable: establishing a new connection involves multiple networking requests and the forking of a new process.

SQLAlchemy and other ORMs create pools specifically to avoid the "first connection" penalty. This is known as application-side pooling.

Serverside poolers, such as Supavisor, serve a different purpose.

Postgres forks a new process to handle each connection. This makes it more crash-resistant - if a single process fails, the overall system will stay online. Sadly, maintaining them is resource-intensive, and, more importantly, imposes an orchestration cost.

At some point, the benefit of having more connections is dwarfed by management overhead. EnterpriseDB graphed out how transactions per second begin to decline with excessive connections on an enterprise-sized server:
image-Nov-03-2024-07-47-34-5855-PM

Because Postgres cannot handle an infinite amount of connections, serverside poolers were built to extract the most performance from the existing ones.

An app server may make a 20ms query, then perform some calculations, before requesting an external API. All of this may take place over the course of 400ms, but the database connection was left idle for 380ms. Because the server didn't release it, no other server could've taken advantage of the finite, but vital resource.

Supavisor, when in transaction mode, acts as a connection load balancer. It maintains a pool of connections and forwards incoming queries to whichever connection is available.

This is great for maximizing throughput, but Supavisor is a proxy. It adds network complexity and is another resource that has to be configured. If you do not need it, you shouldn't use it.

=================================================================

Honestly I was very impressed to receive an email with such detailed explanation, I suggested several updates to documentation, I hope it would be described somewhere publicly available, not only in my personal email!

@ManuelAngel99
Copy link

Has anyone else noticed that queries run with poolclass=pool.NullPool are much slower than those executed using the postgREST client?

Yes, that's kind of expected I had a long discussion in supabase support ticket with @TheOtherBrian1 on supavisor, I hope he can help here as well.

Part of email from him:

================================================================= Prepared statements are only supported via SQL commands (PREPARE/EXECUTE). However, most client libraries initiate them at the protocol level.

Unfortunately, Supavisor cannot accommodate prepared statements with most ORMs. I created a pull request to update Supavisor's repo documentation.

A complete explanation of pooling

You do not have to read this explanation, it's just to provide a mental model for how connections work in Postgres.

Establishing a new PostgreSQL connection is both slow and resource-intensive.

I recently created connection benchmarks for a presentation I'm working on. In the first scenario, connections were pre-established and persistent; in the second, a new connection was opened for each query. unnamed

Queries using pre-established connections averaged 21.65 ms to complete, while those that opened a new connection took an average of 707.213 ms.

This difference is understandable: establishing a new connection involves multiple networking requests and the forking of a new process.

SQLAlchemy and other ORMs create pools specifically to avoid the "first connection" penalty. This is known as application-side pooling.

Serverside poolers, such as Supavisor, serve a different purpose.

Postgres forks a new process to handle each connection. This makes it more crash-resistant - if a single process fails, the overall system will stay online. Sadly, maintaining them is resource-intensive, and, more importantly, imposes an orchestration cost.

At some point, the benefit of having more connections is dwarfed by management overhead. EnterpriseDB graphed out how transactions per second begin to decline with excessive connections on an enterprise-sized server: image-Nov-03-2024-07-47-34-5855-PM

Because Postgres cannot handle an infinite amount of connections, serverside poolers were built to extract the most performance from the existing ones.

An app server may make a 20ms query, then perform some calculations, before requesting an external API. All of this may take place over the course of 400ms, but the database connection was left idle for 380ms. Because the server didn't release it, no other server could've taken advantage of the finite, but vital resource.

Supavisor, when in transaction mode, acts as a connection load balancer. It maintains a pool of connections and forwards incoming queries to whichever connection is available.

This is great for maximizing throughput, but Supavisor is a proxy. It adds network complexity and is another resource that has to be configured. If you do not need it, you shouldn't use it.

=================================================================

Honestly I was very impressed to receive an email with such detailed explanation, I suggested several updates to documentation, I hope it would be described somewhere publicly available, not only in my personal email!

Thank you so much for the reply @GrigoryEvko. I still do not get why using nullpool slows down the queries so much provided that we are connecting to supavisor rather than to postgres directly. I may not be getting the complete picture here, but I would expect that since supavisor keeps it own pool of connections to postgres ready to use, connecting to it every time (when using NullPool) would not introduce a high latency to the system, which it apparently does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

10 participants