Skip to content

Commit

Permalink
Documentation: Refactor "SQLAlchemy: Getting started" section
Browse files Browse the repository at this point in the history
The previous version was a bit of a potpourri of different topics. From
that mix, two new documents have been created. One is about "advanced 
querying", including aggregations and full-text capabilities, and the
other one exercises operations on the CrateDB container and geospatial
special types concisely.
  • Loading branch information
amotl committed Dec 20, 2022
1 parent 88b50e9 commit 55cac25
Show file tree
Hide file tree
Showing 9 changed files with 670 additions and 450 deletions.
7 changes: 2 additions & 5 deletions docs/by-example/blob.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
Blob container API
==================

The connection object provides a convenience API for easy access to `blob
tables`_.
The connection object provides a convenience API for easy access to
:ref:`blob tables <crate-reference:blob_support>`.


Get blob container handle
Expand Down Expand Up @@ -98,6 +98,3 @@ Close connection
Close the connection to clear the connection pool::

>>> client.close()


.. _blob tables: https://crate.io/docs/crate/reference/en/latest/general/blobs.html
11 changes: 6 additions & 5 deletions docs/by-example/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ DBAPI, HTTP, and BLOB interfaces

The examples in this section are all about CrateDB's `Python DBAPI`_ interface,
the plain HTTP API interface, and a convenience interface for working with
`blob tables`_. It details attributes, methods, and behaviors of the
``Connection`` and ``Cursor`` objects.
:ref:`blob tables <crate-reference:blob_support>`. It details attributes,
methods, and behaviors of the ``Connection`` and ``Cursor`` objects.

.. toctree::
:maxdepth: 1
Expand All @@ -32,8 +32,8 @@ the plain HTTP API interface, and a convenience interface for working with

.. _sqlalchemy-by-example:

SQLAlchemy interface
====================
SQLAlchemy by example
=====================

The examples in this section are all about CrateDB's `SQLAlchemy`_ dialect, and
its corresponding API interfaces, see also :ref:`sqlalchemy-support`.
Expand All @@ -43,9 +43,10 @@ its corresponding API interfaces, see also :ref:`sqlalchemy-support`.

sqlalchemy/getting-started
sqlalchemy/crud
sqlalchemy/working-with-types
sqlalchemy/advanced-querying
sqlalchemy/inspection-reflection


.. _blob tables: https://crate.io/docs/crate/reference/en/latest/general/blobs.html
.. _Python DBAPI: https://peps.python.org/pep-0249/
.. _SQLAlchemy: https://www.sqlalchemy.org/
267 changes: 267 additions & 0 deletions docs/by-example/sqlalchemy/advanced-querying.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,267 @@
.. _sqlalchemy-advanced-querying:

=============================
SQLAlchemy: Advanced querying
=============================

This section of the documentation demonstrates running queries with
aggregations, and queries using a fulltext index with analyzer, both
using the CrateDB SQLAlchemy dialect.


.. rubric:: Table of Contents

.. contents::
:local:


Introduction
============

Import the relevant symbols:

>>> import sqlalchemy as sa
>>> from sqlalchemy.ext.declarative import declarative_base
>>> from sqlalchemy.orm import sessionmaker
>>> from uuid import uuid4

Establish a connection to the database, see also :ref:`sa:engines_toplevel`
and :ref:`connect`:

>>> engine = sa.create_engine(f"crate://{crate_host}")
>>> connection = engine.connect()

Create an SQLAlchemy :doc:`Session <sa:orm/session_basics>`:

>>> session = sessionmaker(bind=engine)()
>>> Base = declarative_base()


Introduction to fulltext indexes
================================

:ref:`crate-reference:fulltext-indices` take the contents of one or more fields
and split it up into tokens that are used for fulltext-search. The transformation
from a text to separate tokens is done by an analyzer. In order to create
fulltext search queries a :ref:`fulltext index with an analyzer
<crate-reference:sql_ddl_index_fulltext>` must be defined for the related columns.

In order to support fulltext query operations, the CrateDB SQLAlchemy dialect
provides the :ref:`crate-reference:predicates_match` through its ``match`` function.

For exercising those features, let's define a schema using SQLAlchemy's
:ref:`sa:orm_declarative_mapping`:

>>> def gen_key():
... return str(uuid4())

>>> class Character(Base):
... __tablename__ = 'characters'
... id = sa.Column(sa.String, primary_key=True, default=gen_key)
... name = sa.Column(sa.String)
... quote = sa.Column(sa.String)
... name_ft = sa.Column(sa.String)
... quote_ft = sa.Column(sa.String)
... __mapper_args__ = {
... 'exclude_properties': ['name_ft', 'quote_ft']
... }

Please note that the schema defined above is provisioned to the database using
the following SQL DDL statement.

.. code-block:: sql
CREATE TABLE characters (
id STRING PRIMARY KEY,
name STRING,
quote STRING,
INDEX name_ft USING fulltext(name) WITH (analyzer = 'english'),
INDEX quote_ft USING fulltext(quote) WITH (analyzer = 'english')
)
.. note::

Currently, it is not supported to define a *named index column definition*
using a :ref:`crate-reference:sql_ddl_index_fulltext` on behalf of the
SQLAlchemy declarative schema.

Let's add two records we use for testing.

>>> arthur = Character(name='Arthur Dent')
>>> arthur.quote = "Let's go somewhere."
>>> session.add(arthur)

>>> trillian = Character(name='Tricia McMillan')
>>> trillian.quote = "We're on a space ship Arthur. In space."
>>> session.add(trillian)

>>> session.commit()

After ``INSERT`` statements are submitted to the database, the newly inserted
records aren't immediately available for retrieval because the index is only
updated periodically (default: each second). In order to synchronize that,
refresh the table:

>>> _ = connection.execute(sa.text("REFRESH TABLE characters"))


Aggregates: Counting and grouping
=================================

SQLAlchemy supports different approaches to issue a query with a count
aggregate function. Take a look at the `count result rows`_ documentation
for a full overview.

CrateDB currently does not support all variants as it can not handle the
sub-queries yet.

This means that queries using ``count()`` have to be written in one of the
following ways:

>>> session.query(sa.func.count(Character.id)).scalar()
2

>>> session.query(sa.func.count('*')).select_from(Character).scalar()
2

Using the ``group_by`` clause is similar:

>>> session.query(sa.func.count(Character.id), Character.name) \
... .group_by(Character.name) \
... .order_by(sa.desc(sa.func.count(Character.id))) \
... .order_by(Character.name).all()
[(1, 'Arthur Dent'), (1, 'Tricia McMillan')]


Fulltext search with MATCH predicate
====================================

Fulltext search in CrateDB is performed using the :ref:`crate-reference:predicates_match`.
The CrateDB SQLAlchemy dialect comes with a ``match`` function, which can be used to
search on one or multiple fields.

>>> from crate.client.sqlalchemy.predicates import match

>>> session.query(Character.name) \
... .filter(match(Character.name_ft, 'Arthur')) \
... .all()
[('Arthur Dent',)]

To get the relevance of a matching row, an internal system column ``_score``
can be selected. It is a numeric value which is relative to the other rows.
The higher the score value, the more relevant the row.

In most cases, ``_score`` is not part of the SQLAlchemy table definition,
so it must be passed as a string:

>>> session.query(Character.name, sa.literal_column('_score')) \
... .filter(match(Character.quote_ft, 'space')) \
... .all()
[('Tricia McMillan', ...)]

To search on multiple columns, pass a dictionary with columns and ``boost``
attached. ``boost`` is a factor that increases the relevance of a column in
respect to the other columns:

>>> session.query(Character.name) \
... .filter(match({Character.name_ft: 1.5, Character.quote_ft: 0.1},
... 'Arthur')) \
... .order_by(sa.desc(sa.literal_column('_score'))) \
... .all()
[('Arthur Dent',), ('Tricia McMillan',)]

The ``match_type`` argument determines how a single ``query_term`` is applied
and how the resulting ``_score`` is computed. Thus, it influences which
documents are considered more relevant. The default selection is ``best_fields``.
For more information, see :ref:`crate-reference:predicates_match_types`.

Results are ordered by ``_score`` by default, but can be overridden by adding
an ``order_by()`` clause.

>>> session.query(Character.name) \
... .filter(
... match(Character.name_ft, 'Arth',
... match_type='phrase',
... options={'fuzziness': 3})
... ) \
... .all()
[('Arthur Dent',)]

It is not possible to specify options without the ``match_type`` argument:

>>> session.query(Character.name) \
... .filter(
... match(Character.name_ft, 'Arth',
... options={'fuzziness': 3})
... ) \
... .all()
Traceback (most recent call last):
ValueError: missing match_type. It's not allowed to specify options without match_type


``INSERT...FROM SELECT``
========================

In SQLAlchemy, the ``insert().from_select()`` function returns a new ``Insert``
construct, which represents an ``INSERT...FROM SELECT`` statement. This
functionality is supported by the CrateDB client library. Here is an example
that uses ``insert().from_select()``.

First, let's define and create the tables:

>>> from sqlalchemy import select, insert

>>> class Todos(Base):
... __tablename__ = 'todos'
... __table_args__ = {
... 'crate_number_of_replicas': '0'
... }
... id = sa.Column(sa.String, primary_key=True, default=gen_key)
... content = sa.Column(sa.String)
... status = sa.Column(sa.String)

>>> class ArchivedTasks(Base):
... __tablename__ = 'archived_tasks'
... __table_args__ = {
... 'crate_number_of_replicas': '0'
... }
... id = sa.Column(sa.String, primary_key=True)
... content = sa.Column(sa.String)

>>> Base.metadata.create_all(bind=engine)

Let's add a task to the ``Todo`` table:

>>> task = Todos(content='Write Tests', status='done')
>>> session.add(task)
>>> session.commit()
>>> _ = connection.execute(sa.text("REFRESH TABLE todos"))

Now, let's use ``insert().from_select()`` to archive the task into the
``ArchivedTasks`` table:

>>> sel = select([Todos.id, Todos.content]).where(Todos.status == "done")
>>> ins = insert(ArchivedTasks).from_select(['id','content'], sel)
>>> result = session.execute(ins)
>>> session.commit()

This will emit the following ``INSERT`` statement to the database::

INSERT INTO archived_tasks (id, content)
(SELECT todos.id, todos.content FROM todos WHERE todos.status = 'done')

Now, verify that the data is present in the database:

>>> _ = connection.execute(sa.text("REFRESH TABLE archived_tasks"))
>>> pprint([str(r) for r in session.execute("SELECT content FROM archived_tasks")])
["('Write Tests',)"]


.. hidden: Disconnect from database
>>> session.close()
>>> connection.close()
>>> engine.dispose()
.. _count result rows: http://docs.sqlalchemy.org/en/14/orm/tutorial.html#counting
Loading

0 comments on commit 55cac25

Please sign in to comment.