Documentation: Refactor "SQLAlchemy: Getting started" section

The previous version was a bit of a potpourri of different topics. From that mix, two new documents have been created. One is about "advanced querying", including aggregations and full-text capabilities, and the other one exercises operations on the CrateDB container and geospatial special types concisely.
crate · Dec 20, 2022 · 6090d53 · 6090d53
1 parent 88b50e9
commit 6090d53
Show file tree

Hide file tree

Showing 6 changed files with 586 additions and 376 deletions.
diff --git a/docs/by-example/index.rst b/docs/by-example/index.rst
@@ -43,6 +43,8 @@ its corresponding API interfaces, see also :ref:`sqlalchemy-support`.
 
     sqlalchemy/getting-started
     sqlalchemy/crud
+    sqlalchemy/working-with-types
+    sqlalchemy/advanced-querying
     sqlalchemy/inspection-reflection
 
 

diff --git a/docs/by-example/sqlalchemy/advanced-querying.rst b/docs/by-example/sqlalchemy/advanced-querying.rst
@@ -0,0 +1,267 @@
+.. _sqlalchemy-advanced-querying:
+
+=============================
+SQLAlchemy: Advanced querying
+=============================
+
+This section of the documentation demonstrates running queries with
+aggregations, and queries using a fulltext index with analyzer, both
+using the CrateDB SQLAlchemy dialect.
+
+
+.. rubric:: Table of Contents
+
+.. contents::
+   :local:
+
+
+Introduction
+============
+
+Import the relevant symbols:
+
+    >>> import sqlalchemy as sa
+    >>> from sqlalchemy.ext.declarative import declarative_base
+    >>> from sqlalchemy.orm import sessionmaker
+    >>> from uuid import uuid4
+
+Establish a connection to the database, see also :ref:`sa:engines_toplevel`
+and :ref:`connect`:
+
+    >>> engine = sa.create_engine(f"crate://{crate_host}")
+    >>> connection = engine.connect()
+
+Create an SQLAlchemy :doc:`Session <sa:orm/session_basics>`:
+
+    >>> session = sessionmaker(bind=engine)()
+    >>> Base = declarative_base()
+
+
+Introduction to fulltext indexes
+================================
+
+:ref:`crate-reference:fulltext-indices` take the contents of one or more fields
+and split it up into tokens that are used for fulltext-search. The transformation
+from a text to separate tokens is done by an analyzer. In order to create
+fulltext search queries a :ref:`fulltext index with an analyzer
+<crate-reference:sql_ddl_index_fulltext>` must be defined for the related columns.
+
+In order to support fulltext query operations, the CrateDB SQLAlchemy dialect
+provides the :ref:`crate-reference:predicates_match` through its ``match`` function.
+
+For exercising those features, let's define a schema using SQLAlchemy's
+:ref:`sa:orm_declarative_mapping`:
+
+    >>> def gen_key():
+    ...     return str(uuid4())
+
+    >>> class Character(Base):
+    ...     __tablename__ = 'characters'
+    ...     id = sa.Column(sa.String, primary_key=True, default=gen_key)
+    ...     name = sa.Column(sa.String)
+    ...     quote = sa.Column(sa.String)
+    ...     name_ft = sa.Column(sa.String)
+    ...     quote_ft = sa.Column(sa.String)
+    ...     __mapper_args__ = {
+    ...         'exclude_properties': ['name_ft', 'quote_ft']
+    ...     }
+
+Please note that the schema defined above is provisioned to the database using
+the following SQL DDL statement.
+
+.. code-block:: sql
+
+    CREATE TABLE characters (
+        id STRING PRIMARY KEY,
+        name STRING,
+        quote STRING,
+        INDEX name_ft USING fulltext(name) WITH (analyzer = 'english'),
+        INDEX quote_ft USING fulltext(quote) WITH (analyzer = 'english')
+    )
+
+.. note::
+
+    Currently, it is not supported to define a *named index column definition*
+    using a :ref:`crate-reference:sql_ddl_index_fulltext` on behalf of the
+    SQLAlchemy declarative schema.
+
+Let's add two records we use for testing.
+
+    >>> arthur = Character(name='Arthur Dent')
+    >>> arthur.quote = "Let's go somewhere."
+    >>> session.add(arthur)
+
+    >>> trillian = Character(name='Tricia McMillan')
+    >>> trillian.quote = "We're on a space ship Arthur. In space."
+    >>> session.add(trillian)
+
+    >>> session.commit()
+
+After ``INSERT`` statements are submitted to the database, the newly inserted
+records aren't immediately available for retrieval because the index is only
+updated periodically (default: each second). In order to synchronize that,
+refresh the table:
+
+    >>> _ = connection.execute(sa.text("REFRESH TABLE characters"))
+
+
+Aggregates: Counting and grouping
+=================================
+
+SQLAlchemy supports different approaches to issue a query with a count
+aggregate function. Take a look at the `count result rows`_ documentation
+for a full overview.
+
+CrateDB currently does not support all variants as it can not handle the
+sub-queries yet.
+
+This means that queries using ``count()`` have to be written in one of the
+following ways:
+
+    >>> session.query(sa.func.count(Character.id)).scalar()
+    2
+
+    >>> session.query(sa.func.count('*')).select_from(Character).scalar()
+    2
+
+Using the ``group_by`` clause is similar:
+
+    >>> session.query(sa.func.count(Character.id), Character.name) \
+    ...     .group_by(Character.name) \
+    ...     .order_by(sa.desc(sa.func.count(Character.id))) \
+    ...     .order_by(Character.name).all()
+    [(1, 'Arthur Dent'), (1, 'Tricia McMillan')]
+
+
+Fulltext search with MATCH predicate
+====================================
+
+Fulltext search in CrateDB is performed using the :ref:`crate-reference:predicates_match`.
+The CrateDB SQLAlchemy dialect comes with a ``match`` function, which can be used to
+search on one or multiple fields.
+
+    >>> from crate.client.sqlalchemy.predicates import match
+
+    >>> session.query(Character.name) \
+    ...     .filter(match(Character.name_ft, 'Arthur')) \
+    ...     .all()
+    [('Arthur Dent',)]
+
+To get the relevance of a matching row, an internal system column ``_score``
+can be selected. It is a numeric value which is relative to the other rows.
+The higher the score value, the more relevant the row.
+
+In most cases, ``_score`` is not part of the SQLAlchemy table definition,
+so it must be passed as a string:
+
+    >>> session.query(Character.name, sa.literal_column('_score')) \
+    ...     .filter(match(Character.quote_ft, 'space')) \
+    ...     .all()
+    [('Tricia McMillan', ...)]
+
+To search on multiple columns, pass a dictionary with columns and ``boost``
+attached. ``boost`` is a factor that increases the relevance of a column in
+respect to the other columns:
+
+    >>> session.query(Character.name) \
+    ...           .filter(match({Character.name_ft: 1.5, Character.quote_ft: 0.1},
+    ...                         'Arthur')) \
+    ...     .order_by(sa.desc(sa.literal_column('_score'))) \
+    ...     .all()
+    [('Arthur Dent',), ('Tricia McMillan',)]
+
+The ``match_type`` argument determines how a single ``query_term`` is applied
+and how the resulting ``_score`` is computed. Thus, it influences which
+documents are considered more relevant. The default selection is ``best_fields``.
+For more information, see :ref:`crate-reference:predicates_match_types`.
+
+Results are ordered by ``_score`` by default, but can be overridden by adding
+an ``order_by()`` clause.
+
+    >>> session.query(Character.name) \
+    ...     .filter(
+    ...         match(Character.name_ft, 'Arth',
+    ...                 match_type='phrase',
+    ...                 options={'fuzziness': 3})
+    ...     ) \
+    ...     .all()
+    [('Arthur Dent',)]
+
+It is not possible to specify options without the ``match_type`` argument:
+
+    >>> session.query(Character.name) \
+    ...     .filter(
+    ...         match(Character.name_ft, 'Arth',
+    ...                 options={'fuzziness': 3})
+    ...     ) \
+    ...     .all()
+    Traceback (most recent call last):
+    ValueError: missing match_type. It's not allowed to specify options without match_type
+
+
+``INSERT...FROM SELECT``
+========================
+
+In SQLAlchemy, the ``insert().from_select()`` function returns a new ``Insert``
+construct, which represents an ``INSERT...FROM SELECT`` statement. This
+functionality is supported by the CrateDB client library. Here is an example
+that uses ``insert().from_select()``.
+
+First, let's define and create the tables:
+
+    >>> from sqlalchemy import select, insert
+
+    >>> class Todos(Base):
+    ...     __tablename__ = 'todos'
+    ...     __table_args__ = {
+    ...         'crate_number_of_replicas': '0'
+    ...     }
+    ...     id = sa.Column(sa.String, primary_key=True, default=gen_key)
+    ...     content = sa.Column(sa.String)
+    ...     status = sa.Column(sa.String)
+
+    >>> class ArchivedTasks(Base):
+    ...     __tablename__ = 'archived_tasks'
+    ...     __table_args__ = {
+    ...         'crate_number_of_replicas': '0'
+    ...     }
+    ...     id = sa.Column(sa.String, primary_key=True)
+    ...     content = sa.Column(sa.String)
+
+    >>> Base.metadata.create_all(bind=engine)
+
+Let's add a task to the ``Todo`` table:
+
+    >>> task = Todos(content='Write Tests', status='done')
+    >>> session.add(task)
+    >>> session.commit()
+    >>> _ = connection.execute(sa.text("REFRESH TABLE todos"))
+
+Now, let's use ``insert().from_select()`` to archive the task into the
+``ArchivedTasks`` table:
+
+    >>> sel = select([Todos.id, Todos.content]).where(Todos.status == "done")
+    >>> ins = insert(ArchivedTasks).from_select(['id','content'], sel)
+    >>> result = session.execute(ins)
+    >>> session.commit()
+
+This will emit the following ``INSERT`` statement to the database::
+
+    INSERT INTO archived_tasks (id, content)
+        (SELECT todos.id, todos.content FROM todos WHERE todos.status = 'done')
+
+Now, verify that the data is present in the database:
+
+    >>> _ = connection.execute(sa.text("REFRESH TABLE archived_tasks"))
+    >>> pprint([str(r) for r in session.execute("SELECT content FROM archived_tasks")])
+    ["('Write Tests',)"]
+
+
+.. hidden: Disconnect from database
+
+    >>> session.close()
+    >>> connection.close()
+    >>> engine.dispose()
+
+
+.. _count result rows: http://docs.sqlalchemy.org/en/14/orm/tutorial.html#counting
diff --git a/docs/by-example/sqlalchemy/crud.rst b/docs/by-example/sqlalchemy/crud.rst
@@ -253,30 +253,10 @@ Reset the record:
     >>> location.details
     []
 
-Update nested dictionary:
-
-    >>> from crate.client.sqlalchemy.types import Craty
-    >>> class Character(Base):
-    ...     __tablename__ = 'characters'
-    ...     id = sa.Column(sa.String, primary_key=True)
-    ...     details = sa.Column(Craty)
-    >>> char = Character(id='1234id')
-    >>> char.details = {"name": {"first": "Arthur", "last": "Dent"}}
-    >>> session.add(char)
-    >>> session.commit()
-
-    >>> char = session.query(Character).filter_by(id='1234id').one()
-    >>> char.details['name']['first'] = 'Trillian'
-    >>> char.details['size'] = 45
-    >>> session.commit()
-
-Refresh "characters" table:
-
-    >>> _ = connection.execute("REFRESH TABLE characters")
+.. seealso::
 
-    >>> session.refresh(char)
-    >>> pprint(char.details)
-    {'name': {'first': 'Trillian', 'last': 'Dent'}, 'size': 45}
+    The documentation section :ref:`sqlalchemy-working-with-types` has more
+    details about this topic.
 
 
 Delete