-
Notifications
You must be signed in to change notification settings - Fork 31
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Documentation: Refactor "SQLAlchemy: Getting started" section
The previous version was a bit of a potpourri of different topics. From that mix, two new documents have been created. One is about "advanced querying", including aggregations and full-text capabilities, and the other one exercises operations on the CrateDB container and geospatial special types concisely.
- Loading branch information
Showing
9 changed files
with
670 additions
and
450 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,267 @@ | ||
.. _sqlalchemy-advanced-querying: | ||
|
||
============================= | ||
SQLAlchemy: Advanced querying | ||
============================= | ||
|
||
This section of the documentation demonstrates running queries with | ||
aggregations, and queries using a fulltext index with analyzer, both | ||
using the CrateDB SQLAlchemy dialect. | ||
|
||
|
||
.. rubric:: Table of Contents | ||
|
||
.. contents:: | ||
:local: | ||
|
||
|
||
Introduction | ||
============ | ||
|
||
Import the relevant symbols: | ||
|
||
>>> import sqlalchemy as sa | ||
>>> from sqlalchemy.ext.declarative import declarative_base | ||
>>> from sqlalchemy.orm import sessionmaker | ||
>>> from uuid import uuid4 | ||
|
||
Establish a connection to the database, see also :ref:`sa:engines_toplevel` | ||
and :ref:`connect`: | ||
|
||
>>> engine = sa.create_engine(f"crate://{crate_host}") | ||
>>> connection = engine.connect() | ||
|
||
Create an SQLAlchemy :doc:`Session <sa:orm/session_basics>`: | ||
|
||
>>> session = sessionmaker(bind=engine)() | ||
>>> Base = declarative_base() | ||
|
||
|
||
Introduction to fulltext indexes | ||
================================ | ||
|
||
:ref:`crate-reference:fulltext-indices` take the contents of one or more fields | ||
and split it up into tokens that are used for fulltext-search. The transformation | ||
from a text to separate tokens is done by an analyzer. In order to create | ||
fulltext search queries a :ref:`fulltext index with an analyzer | ||
<crate-reference:sql_ddl_index_fulltext>` must be defined for the related columns. | ||
|
||
In order to support fulltext query operations, the CrateDB SQLAlchemy dialect | ||
provides the :ref:`crate-reference:predicates_match` through its ``match`` function. | ||
|
||
For exercising those features, let's define a schema using SQLAlchemy's | ||
:ref:`sa:orm_declarative_mapping`: | ||
|
||
>>> def gen_key(): | ||
... return str(uuid4()) | ||
|
||
>>> class Character(Base): | ||
... __tablename__ = 'characters' | ||
... id = sa.Column(sa.String, primary_key=True, default=gen_key) | ||
... name = sa.Column(sa.String) | ||
... quote = sa.Column(sa.String) | ||
... name_ft = sa.Column(sa.String) | ||
... quote_ft = sa.Column(sa.String) | ||
... __mapper_args__ = { | ||
... 'exclude_properties': ['name_ft', 'quote_ft'] | ||
... } | ||
|
||
Please note that the schema defined above is provisioned to the database using | ||
the following SQL DDL statement. | ||
|
||
.. code-block:: sql | ||
CREATE TABLE characters ( | ||
id STRING PRIMARY KEY, | ||
name STRING, | ||
quote STRING, | ||
INDEX name_ft USING fulltext(name) WITH (analyzer = 'english'), | ||
INDEX quote_ft USING fulltext(quote) WITH (analyzer = 'english') | ||
) | ||
.. note:: | ||
|
||
Currently, it is not supported to define a *named index column definition* | ||
using a :ref:`crate-reference:sql_ddl_index_fulltext` on behalf of the | ||
SQLAlchemy declarative schema. | ||
|
||
Let's add two records we use for testing. | ||
|
||
>>> arthur = Character(name='Arthur Dent') | ||
>>> arthur.quote = "Let's go somewhere." | ||
>>> session.add(arthur) | ||
|
||
>>> trillian = Character(name='Tricia McMillan') | ||
>>> trillian.quote = "We're on a space ship Arthur. In space." | ||
>>> session.add(trillian) | ||
|
||
>>> session.commit() | ||
|
||
After ``INSERT`` statements are submitted to the database, the newly inserted | ||
records aren't immediately available for retrieval because the index is only | ||
updated periodically (default: each second). In order to synchronize that, | ||
refresh the table: | ||
|
||
>>> _ = connection.execute(sa.text("REFRESH TABLE characters")) | ||
|
||
|
||
Aggregates: Counting and grouping | ||
================================= | ||
|
||
SQLAlchemy supports different approaches to issue a query with a count | ||
aggregate function. Take a look at the `count result rows`_ documentation | ||
for a full overview. | ||
|
||
CrateDB currently does not support all variants as it can not handle the | ||
sub-queries yet. | ||
|
||
This means that queries using ``count()`` have to be written in one of the | ||
following ways: | ||
|
||
>>> session.query(sa.func.count(Character.id)).scalar() | ||
2 | ||
|
||
>>> session.query(sa.func.count('*')).select_from(Character).scalar() | ||
2 | ||
|
||
Using the ``group_by`` clause is similar: | ||
|
||
>>> session.query(sa.func.count(Character.id), Character.name) \ | ||
... .group_by(Character.name) \ | ||
... .order_by(sa.desc(sa.func.count(Character.id))) \ | ||
... .order_by(Character.name).all() | ||
[(1, 'Arthur Dent'), (1, 'Tricia McMillan')] | ||
|
||
|
||
Fulltext search with MATCH predicate | ||
==================================== | ||
|
||
Fulltext search in CrateDB is performed using the :ref:`crate-reference:predicates_match`. | ||
The CrateDB SQLAlchemy dialect comes with a ``match`` function, which can be used to | ||
search on one or multiple fields. | ||
|
||
>>> from crate.client.sqlalchemy.predicates import match | ||
|
||
>>> session.query(Character.name) \ | ||
... .filter(match(Character.name_ft, 'Arthur')) \ | ||
... .all() | ||
[('Arthur Dent',)] | ||
|
||
To get the relevance of a matching row, an internal system column ``_score`` | ||
can be selected. It is a numeric value which is relative to the other rows. | ||
The higher the score value, the more relevant the row. | ||
|
||
In most cases, ``_score`` is not part of the SQLAlchemy table definition, | ||
so it must be passed as a string: | ||
|
||
>>> session.query(Character.name, sa.literal_column('_score')) \ | ||
... .filter(match(Character.quote_ft, 'space')) \ | ||
... .all() | ||
[('Tricia McMillan', ...)] | ||
|
||
To search on multiple columns, pass a dictionary with columns and ``boost`` | ||
attached. ``boost`` is a factor that increases the relevance of a column in | ||
respect to the other columns: | ||
|
||
>>> session.query(Character.name) \ | ||
... .filter(match({Character.name_ft: 1.5, Character.quote_ft: 0.1}, | ||
... 'Arthur')) \ | ||
... .order_by(sa.desc(sa.literal_column('_score'))) \ | ||
... .all() | ||
[('Arthur Dent',), ('Tricia McMillan',)] | ||
|
||
The ``match_type`` argument determines how a single ``query_term`` is applied | ||
and how the resulting ``_score`` is computed. Thus, it influences which | ||
documents are considered more relevant. The default selection is ``best_fields``. | ||
For more information, see :ref:`crate-reference:predicates_match_types`. | ||
|
||
Results are ordered by ``_score`` by default, but can be overridden by adding | ||
an ``order_by()`` clause. | ||
|
||
>>> session.query(Character.name) \ | ||
... .filter( | ||
... match(Character.name_ft, 'Arth', | ||
... match_type='phrase', | ||
... options={'fuzziness': 3}) | ||
... ) \ | ||
... .all() | ||
[('Arthur Dent',)] | ||
|
||
It is not possible to specify options without the ``match_type`` argument: | ||
|
||
>>> session.query(Character.name) \ | ||
... .filter( | ||
... match(Character.name_ft, 'Arth', | ||
... options={'fuzziness': 3}) | ||
... ) \ | ||
... .all() | ||
Traceback (most recent call last): | ||
ValueError: missing match_type. It's not allowed to specify options without match_type | ||
|
||
|
||
``INSERT...FROM SELECT`` | ||
======================== | ||
|
||
In SQLAlchemy, the ``insert().from_select()`` function returns a new ``Insert`` | ||
construct, which represents an ``INSERT...FROM SELECT`` statement. This | ||
functionality is supported by the CrateDB client library. Here is an example | ||
that uses ``insert().from_select()``. | ||
|
||
First, let's define and create the tables: | ||
|
||
>>> from sqlalchemy import select, insert | ||
|
||
>>> class Todos(Base): | ||
... __tablename__ = 'todos' | ||
... __table_args__ = { | ||
... 'crate_number_of_replicas': '0' | ||
... } | ||
... id = sa.Column(sa.String, primary_key=True, default=gen_key) | ||
... content = sa.Column(sa.String) | ||
... status = sa.Column(sa.String) | ||
|
||
>>> class ArchivedTasks(Base): | ||
... __tablename__ = 'archived_tasks' | ||
... __table_args__ = { | ||
... 'crate_number_of_replicas': '0' | ||
... } | ||
... id = sa.Column(sa.String, primary_key=True) | ||
... content = sa.Column(sa.String) | ||
|
||
>>> Base.metadata.create_all(bind=engine) | ||
|
||
Let's add a task to the ``Todo`` table: | ||
|
||
>>> task = Todos(content='Write Tests', status='done') | ||
>>> session.add(task) | ||
>>> session.commit() | ||
>>> _ = connection.execute(sa.text("REFRESH TABLE todos")) | ||
|
||
Now, let's use ``insert().from_select()`` to archive the task into the | ||
``ArchivedTasks`` table: | ||
|
||
>>> sel = select([Todos.id, Todos.content]).where(Todos.status == "done") | ||
>>> ins = insert(ArchivedTasks).from_select(['id','content'], sel) | ||
>>> result = session.execute(ins) | ||
>>> session.commit() | ||
|
||
This will emit the following ``INSERT`` statement to the database:: | ||
|
||
INSERT INTO archived_tasks (id, content) | ||
(SELECT todos.id, todos.content FROM todos WHERE todos.status = 'done') | ||
|
||
Now, verify that the data is present in the database: | ||
|
||
>>> _ = connection.execute(sa.text("REFRESH TABLE archived_tasks")) | ||
>>> pprint([str(r) for r in session.execute("SELECT content FROM archived_tasks")]) | ||
["('Write Tests',)"] | ||
|
||
|
||
.. hidden: Disconnect from database | ||
>>> session.close() | ||
>>> connection.close() | ||
>>> engine.dispose() | ||
.. _count result rows: http://docs.sqlalchemy.org/en/14/orm/tutorial.html#counting |
Oops, something went wrong.