diff --git a/README.md b/README.md index 53aba76..5fc9dd3 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ Want to contribute to the docs? See [CONTRIBUTING](CONTRIBUTING.md) for details ## Join the Community -For questions or support, join us on the [ReadySet Community Discord](https://discord.gg/readyset), post questions on our [Github forum](https://github.com/readysettech/readyset/discussions), or schedule an [office hours chat](https://calendly.com/d/d5n-y44-mbg/office-hours-with-ready-set) with our team. +For questions or support, join us on the [ReadySet Community Discord](https://discord.gg/readyset), post questions on our [GitHub forum](https://github.com/readysettech/readyset/discussions), or schedule an [office hours chat](https://calendly.com/d/d5n-y44-mbg/office-hours-with-ready-set) with our team. Everyone is welcome! diff --git a/docs/concepts/dataflow.md b/docs/concepts/dataflow.md index 943d090..4e0a719 100644 --- a/docs/concepts/dataflow.md +++ b/docs/concepts/dataflow.md @@ -1,10 +1,12 @@ # ReadySet Concepts + The heart of ReadySet is a query engine based on **partially-stateful, streaming dataflow**. What's that? Let's break it down. First, we'll take a look at the basics of **stateful, streaming dataflow**, then in a later section we'll consider how to improve memory overhead using **partial state**. ## Streaming dataflow + The basic premise of [streaming dataflow](https://en.wikipedia.org/wiki/Stream_processing) is that a **series of operations** is applied to each element of a **stream** (a given sequence of data). @@ -33,6 +35,7 @@ cache the final query results, and all non-leaf nodes effectively cache intermed ![High Level](../assets/high-level-graph.png) ## Putting it all together + As writes are applied to your database, the resulting data changes are immediately replicated to ReadySet. ReadySet incrementally updates its cached query results to reflect these changes, thus replacing any hand-written cache eviction logic. When using ReadySet, you just write traditional SQL queries, and ReadySet will keep the results up-to-date for you. diff --git a/docs/concepts/efficiency.md b/docs/concepts/efficiency.md index 54becca..f14ff95 100644 --- a/docs/concepts/efficiency.md +++ b/docs/concepts/efficiency.md @@ -1,5 +1,5 @@ - # Memory Efficiency + In the first section, we discussed the **stateful, streaming dataflow** model and how to use it to maintain cached state in real time. In this model, both reader and internal nodes of the graph store result sets. Without care, this could lead to an impractical memory footprint. @@ -14,7 +14,7 @@ What does this look like in practice? Let's come back to the query in the prior SELECT id, author, title, url, vcount FROM stories JOIN (SELECT story_id, COUNT(*) AS vcount - FROM votes GROUP BY story_id) + FROM votes GROUP BY story_id) AS VoteCount ON VoteCount.story_id = stories.id WHERE stories.id = ?; ``` @@ -30,7 +30,6 @@ After this initial computation, ReadySet will keep those results up-to-date base For example, after the info for story `42` has been cached, if any users upvote that story, then ReadySet will increment the cached vote count for story `42` by `1` to reflect this data change. - When ReadySet is initially deployed, the cache starts off cold and the dataflow graph is entirely empty. During the initial cache warming phase, most queries will be ones that ReadySet has never seen before (i.e., cache misses) so ReadySet will have to compute their results from scratch. diff --git a/docs/concepts/example.md b/docs/concepts/example.md index 90f7ff1..30b1980 100644 --- a/docs/concepts/example.md +++ b/docs/concepts/example.md @@ -1,7 +1,9 @@ # Example: News Forum + To illustrate these concepts, we will walk through an example of using ReadySet for a news forum application inspired by HackerNews. ## Schema + First we define two tables to keep track of HackerNews stories and votes. ```sql @@ -15,13 +17,13 @@ CREATE TABLE votes (user int, story_id int); ## Query Next, we'll write a query that computes the vote count for each story and joins the -vote counts with other story metadata such as the author, title, and ID. +vote counts with other story metadata such as the author, title, and ID. ```sql SELECT id, author, title, url, vcount FROM stories JOIN (SELECT story_id, COUNT(*) AS vcount - FROM votes GROUP BY story_id) + FROM votes GROUP BY story_id) AS VoteCount ON VoteCount.story_id = stories.id WHERE stories.id = ?; ``` diff --git a/docs/concepts/overview.md b/docs/concepts/overview.md index 235b0e9..c4d2009 100644 --- a/docs/concepts/overview.md +++ b/docs/concepts/overview.md @@ -1,4 +1,5 @@ # ReadySet Key Concepts + ReadySet is a lightweight caching solution that turns even the most complex SQL reads into **lightning fast lookups** with **no extra code**. ReadySet slots between your application and database. It is wire-compatible with both MySQL and Postgres, so all you have to @@ -8,11 +9,12 @@ queries are cached. Queries that aren't cached are proxied through ReadySet. ![Basic ReadySet Stack Diagram](../assets/rs_stack_diagram.png) ## How does ReadySet work under the hood? + Imagine a basic online forum application with `posts`, `users`, and `upvotes`. A simple database schema for this application might look like: ![Example DB Schema](../assets/reddit_sql_schema.png) -You can imagine a query like the one below, which returns all of the posts authored by a particular user: +You can imagine a query like the one below, which returns all the posts authored by a particular user: ```sql SELECT @@ -31,11 +33,12 @@ The graph for the query would look something like this: ![Example ReadySet Dataflow Graph](../assets/rs_example_dataflow.png) -Once the graph is constructed, if a user queries all of the posts authored by user id 4, ReadySet has the results ready so reads can be performed with **no additional compute**. +Once the graph is constructed, if a user queries all the posts authored by user id 4, ReadySet has the results ready, so reads can be performed with **no additional compute**. Results are therefore returned instantaneously, regardless of the size of your database. ## How does ReadySet handle more complex queries? + One of the biggest advantages of this model is that latencies are not affected by query complexity. Let's take a look at a few more queries in this application: Here's a point query for an article: @@ -68,8 +71,8 @@ With ReadySet, read performance is not impacted by the size of the base tables o ### Memory Overhead -There's no free lunch– ReadySet trades off the cost of maintaining the dataflow graph in memory for excellent read performance. However, there are a few key ways we can mitigate this cost, such as -through **partial materialization**. You can think of partial materialization as a demand-driven cache-filling mechanism. With it, only a subset of the query results are stored in memory + +There's no free lunch, ReadySet trades off the cost of maintaining the dataflow graph in memory for excellent read performance; however, there are a few key ways we can mitigate this cost, such as through **partial materialization**. You can think of partial materialization as a demand-driven cache-filling mechanism. With it, only a subset of the query results are stored in memory based on common input parameters to the query. For example, if a query is parameterized on user IDs, then ReadySet would only cache the results of that query for the active subset of users, since they are the ones issuing requests. Once ReadySet surpasses a developer-specified memory limit, cache entries are evicted from memory based on a specified eviction strategy (e.g., LRU). @@ -80,6 +83,7 @@ won’t take up any memory real estate in your ReadySet cluster. ### No Strong Consistency + ReadySet supports **eventual consistency**. There will be a small delay between when the write is issued and the cached result is updated in ReadySet to reflect that write. @@ -87,9 +91,11 @@ ReadySet supports **eventual consistency**. There will be a small delay between ## Is ReadySet a good fit for my application? + Like most caching solutions, ReadySet will have the greatest impact on read-heavy applications with non-uniform access patterns. Most web applications have a high read-to-write traffic ratio, and therefore fit the bill. ReadySet generally provides immediate performance improvements in these contexts. ## Can I try it? + Yes! ReadySet is source-available under the BSL 1.1 license. Check out our [GitHub](https://github.com/readysettech/readyset) for more info. diff --git a/docs/guides/cache/cache-queries.md b/docs/guides/cache/cache-queries.md index e6ace5d..3350d7e 100644 --- a/docs/guides/cache/cache-queries.md +++ b/docs/guides/cache/cache-queries.md @@ -43,7 +43,7 @@ CREATE CACHE [ALWAYS] [] FROM ; - `` is optional. If a cache is not named, ReadySet automatically assigns an identifier. - `` is the full text of the query or the unique identifier assigned to the query by ReadySet, as seen in output of `SHOW PROXIED QUERIES`. -- `ALWAYS` is optional. If the `CREATE CACHE` command is executed inside a transaction (e.g., due to an ORM), use `ALWAYS` to run the command against ReadySet; otherwise, the command will be proxied to the upstream database with the rest of the transaction. +- `ALWAYS` is optional. If the `CREATE CACHE` command is executed inside a transaction (e.g., due to an ORM), use `ALWAYS` to run the command against ReadySet; otherwise, the command will be proxied to the upstream database with the rest of the transaction. ## View cached queries diff --git a/docs/guides/cache/profile-queries.md b/docs/guides/cache/profile-queries.md index 7ab84df..ab2f449 100644 --- a/docs/guides/cache/profile-queries.md +++ b/docs/guides/cache/profile-queries.md @@ -16,14 +16,14 @@ If you already have performance monitoring in place, use that tooling to identif SELECT calls, query FROM pg_stat_statements LIMIT 1; ``` - If an error is returned, enable pg_stat_statments with the following command + If an error is returned, enable `pg_stat_statments` with the following command ```sh CREATE EXTENSION IF NOT EXISTS pg_stat_statements; ``` !!! warning - In some environments, the pg_stat_statements extension may not be available. In that case, run `ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements';` and restart your Postgres instance before re-running the `CREATE EXTENSION` command. + In some environments, the `pg_stat_statements` extension may not be available. In that case, run `ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements';` and restart your Postgres instance before re-running the `CREATE EXTENSION` command. === "Using ReadySet metrics" diff --git a/docs/guides/connect/new-app/python.md b/docs/guides/connect/new-app/python.md index 6ae363b..03aa9c0 100644 --- a/docs/guides/connect/new-app/python.md +++ b/docs/guides/connect/new-app/python.md @@ -337,5 +337,4 @@ This page gives you examples for a few common Postgres drivers and ORMS for Pyth - [Learn how ReadySet works under the hood](/concepts/overview.md) -- [Deploy with the ReadySet binary](/deploy/deploy-readyset-binary.md) - +- [Deploy with the ReadySet binary](/deploy/deploy-readyset-binary.md) \ No newline at end of file diff --git a/docs/guides/connect/new-app/ruby.md b/docs/guides/connect/new-app/ruby.md index 8edc7c5..d9ceab2 100644 --- a/docs/guides/connect/new-app/ruby.md +++ b/docs/guides/connect/new-app/ruby.md @@ -299,5 +299,4 @@ This page gives you examples for a few common Postgres drivers and ORMS for Ruby - [Learn how ReadySet works under the hood](/concepts/overview.md) -- [Deploy with the ReadySet binary](/deploy/deploy-readyset-binary.md) - +- [Deploy with the ReadySet binary](/deploy/deploy-readyset-binary.md) \ No newline at end of file diff --git a/docs/guides/deploy/production-notes.md b/docs/guides/deploy/production-notes.md index 412df08..0b86019 100644 --- a/docs/guides/deploy/production-notes.md +++ b/docs/guides/deploy/production-notes.md @@ -79,7 +79,7 @@ The upstream database must be configured to allow ReadySet to connect to the dat ReadySet uses Postgres' logical replication feature to keep the cache up-to-date as the underlying database changes. -- ReadySet must be connected to the primary database instance. ReadySet cannot work off an RDS [read replica](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html). +- ReadySet must be connected to the primary database instance. ReadySet cannot work off an RDS [read replica](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html). - ReadySet does not support [row-level security](https://www.postgresql.org/docs/current/ddl-rowsecurity.html). Make sure any RLS policies are disabled. @@ -93,11 +93,11 @@ The upstream database must be configured to allow ReadySet to connect to the dat - The [binary logging format](https://dev.mysql.com/doc/refman/5.7/en/binary-log-setting.html) must be set to `ROW`. -- ReadySet must be connected to the primary database instance. ReadySet cannot work off a [read replica](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html). +- ReadySet must be connected to the primary database instance. ReadySet cannot work off a [read replica](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html). ### Supabase -- In Supabase, [replication](https://www.postgresql.org/docs/current/logical-replication.html) is already enabled. However, you must change the `postgres` user's permissions to `SUPERUSER` so that ReadySet can create a replication slot. +- In Supabase, [replication](https://www.postgresql.org/docs/current/logical-replication.html) is already enabled. However, you must change the `postgres` user's permissions to `SUPERUSER` so that ReadySet can create a replication slot. - ReadySet does not support [row-level security](https://www.postgresql.org/docs/current/ddl-rowsecurity.html). Make sure any RLS policies are disabled. diff --git a/docs/guides/intro/intro.md b/docs/guides/intro/intro.md index 1b9c1c3..01909d8 100644 --- a/docs/guides/intro/intro.md +++ b/docs/guides/intro/intro.md @@ -47,7 +47,7 @@ To run through this process on a server, see [Deploy with binary](../deploy/depl ## How do you connect to ReadySet? -Once you have a ReadySet instance up and running, the next step is to connect your application by swapping out your database connection string to point to ReadySet instead. The specifics of how to do this vary by database client library, ORM, and programming language. See [Connect an App](../connect/existing-app.md) for examples. +Once you have a ReadySet instance up and running, the next step is to connect your application by swapping out your database connection string to point to ReadySet instead. The specifics of how to do this varies by database client library, ORM, and programming language. See [Connect an App](../connect/existing-app.md) for examples. ## When can you start caching queries? @@ -69,4 +69,4 @@ To view a list of queries that are cached in ReadySet, connect a database SQL sh ## How do you stop caching a query? -To stop caching a query in ReadySet, connect a database SQL shell and run the the custom [`DROP CACHE`](../cache/cache-queries.md#remove-cached-queries) SQL command. +To stop caching a query in ReadySet, connect a database SQL shell and run the custom [`DROP CACHE`](../cache/cache-queries.md#remove-cached-queries) SQL command. diff --git a/docs/guides/intro/quickstart.md b/docs/guides/intro/quickstart.md index fde745a..8b48820 100644 --- a/docs/guides/intro/quickstart.md +++ b/docs/guides/intro/quickstart.md @@ -34,7 +34,7 @@ In this step, you'll use Docker Compose to start Postgres, load some sample data Compose then does the following: - Starts Postgres in a container called `db` and imports two tables from the [IMDb dataset](https://www.imdb.com/interfaces/). - - Starts ReadySet in a container called `cache`. For details about the CLI options used to start ReadySet, see the [CLI reference docs](../../reference/cli/readyset.md). + - Starts ReadySet in a container called `cache`. For details about the CLI options used to start ReadySet, see the [CLI reference docs](../../reference/cli/readyset.md). - Creates a container called `app` for running a sample Python app against ReadySet. ## Step 2. Check snapshotting @@ -101,7 +101,7 @@ Snapshotting can take between a few minutes to several hours, depending on the s ## Step 3. Cache queries -With snapshotting finished, ReadySet is ready for caching, so in this step, you'll get to know the dataset, run some queries, check if ReadySet supports them, and then cache them. +With snapshotting finished, ReadySet is ready for caching, so in this step, you'll get to know the dataset, run some queries, check if ReadySet supports them, and then cache them. 1. If necessary, reconnect the `psql` shell to ReadySet: @@ -146,7 +146,7 @@ With snapshotting finished, ReadySet is ready for caching, so in this step, you' tconst | averagerating | numvotes -----------+---------------+---------- tt0093779 | 8.0 | 427192 - (1 row) + (1 row) ``` 1. Run a query that joins results from `title_ratings` and `title_basics` to count how many titles released in 2000 have an average rating higher than 5: @@ -196,8 +196,8 @@ With snapshotting finished, ReadySet is ready for caching, so in this step, you' WHERE title_basics.startyear = 2000 AND title_ratings.averagerating > 5; ``` - !!! tip - + !!! tip + To cache a query, you can provide either the full `SELECT` (as shown here) or the query ID listed in the `SHOW PROXIED QUERIES` output. !!! note diff --git a/docs/reference/sql-support.md b/docs/reference/sql-support.md index ab38f7a..350a6bb 100644 --- a/docs/reference/sql-support.md +++ b/docs/reference/sql-support.md @@ -199,7 +199,7 @@ ReadySet supports the UTF-8 character set for strings and compares strings case- ### Writes -All `INSERT`, `UPDATE`, and `DELETE` statements sent to ReadySet are proxied to the upstream database. ReadySet receives new/changed data via the database's replication stream and updates its snapshot and cache automatically. +All `INSERT`, `UPDATE`, and `DELETE` statements sent to ReadySet are proxied to the upstream database. ReadySet receives new/changed data via the database's replication stream and updates its snapshot and cache automatically. ### Schema changes @@ -304,7 +304,7 @@ But the following queries are not supported: SELECT * FROM t1 JOIN t2 ON t1.x = t1.y; ``` -``` sql +``` sql -- This query doesn't compare using equality SELECT * FROM t1 JOIN t2 ON t1.x > t2.x; @@ -403,13 +403,13 @@ ReadySet supports the following components of the SQL expression language: - `JSONB_SET_LAX()` - `JSONB_STRIP_NULLS()` - `JSONB_TYPEOF()` - - `LEAST()` + - `LEAST()` - `MONTH()` - `ROUND()` - `SPLIT_PART()` - `SUBSTR()` and `SUBSTRING()` - `TIMEDIFF()` -- Aggregate functions (see [Aggregations](#aggregations)) +- Aggregate functions (see [Aggregations](#aggregations)) ReadySet does not support the following components of the SQL expression language (this is not an exhaustive list):