SQLAlchemy Support

Since most people prefer to use an ORM the crate client library provides support for SQLAlchemy out of the box.

The Dialect is installed and registered by default (e.g. if crate was installed using pip) and can be used without further configuration.

Connection String

In SQLAlchemy a connection is established using the create_engine function. This function takes a connection string that varies from database to database.

In order to connect to a crate cluster the following connection strings are valid:

>>> sa.create_engine('crate://')
Engine(crate://)

This will connect to the default server (‘127.0.0.1:4200’). In order to connect to a different server the following syntax can be used:

>>> sa.create_engine('crate://otherserver:4200')
Engine(crate://otherserver:4200)

Since Crate is a clustered database that usually consists of multiple server it is recommended to connect to all of them. This enables the DB-API layer to use round-robin to distribute the load and skip a server if it becomes unavailable.

The connect_args parameter has to be used to do so:

>>> sa.create_engine('crate://', connect_args={
...     'servers': ['host1:4200', 'host2:4200']
... })
Engine(crate://)

Supported Operations and Limitations

Currently Crate only implements a subset of the SQL standard. Therefore many operations that SQLAlchemy provides are currently not supported.

An overview of supported operations:

- Simple select statements including filter operations, limit and offset,
  group by and order by. But without joins or subselects
- Inserting new rows.
- Updating rows.

Unlike other databases that are usually used with SQLAlchemy, Crate isn’t a RDBMS but instead is document oriented with eventual consistency.

This means that there is no transaction support and the database should be modeled without relationships.

In SQLAlchemy a Session object is used to query the database. This Session object contains a rollback method that in Crate won’t do anything at all. Similar the commit method will only flush but not actually commit, as there is no commit in crate.

Please refer to the SQLAlchemy documentation for more information about the session management and the concept of flush.

Complex Types

In a document oriented database it is a common pattern to store complex objects within a single field. For such cases the crate package provides the Object type. The Object type can be seen as some kind of dictionary or map like type.

Below is a schema definition using SQLAlchemy’s declarative approach:

>>> from crate.client.sqlalchemy.types import Object, ObjectArray
>>> from uuid import uuid4

>>> def gen_key():
...     return str(uuid4())

>>> class Character(Base):
...     __tablename__ = 'characters'
...     id = sa.Column(sa.String, primary_key=True, default=gen_key)
...     name = sa.Column(sa.String)
...     details = sa.Column(Object)
...     more_details = sa.Column(ObjectArray)

Using the Session two characters are added that have additional attributes inside the details column that weren’t defined in the schema:

>>> arthur = Character(name='Arthur Dent')
>>> arthur.details = {}
>>> arthur.details['gender'] = 'male'
>>> arthur.details['species'] = 'human'
>>> session.add(arthur)

>>> trillian = Character(name='Tricia McMillan')
>>> trillian.details = {}
>>> trillian.details['gender'] = 'female'
>>> trillian.details['species'] = 'human'
>>> trillian.details['female_only_attribute'] = 1
>>> session.add(trillian)
>>> session.commit()

After INSERT statements are sent to the database the newly inserted rows aren’t immediately available for search because the index is only updated periodically:

>>> from time import sleep
>>> sleep(2)

Note

Newly inserted rows can still be queried immediately if a lookup by the primary key is done.

A regular select query will then fetch the whole documents:

>>> query = session.query(Character).order_by(Character.name)
>>> [(c.name, c.details['gender']) for c in query]
[('Arthur Dent', 'male'), ('Tricia McMillan', 'female')]

But it is also possible to just select a part of the document, even inside the Object type:

>>> sorted(session.query(Character.details['gender']).all())
[('female',), ('male',)]

In addition, filtering on the attributes inside the details column is also possible:

>>> query = session.query(Character.name)
>>> query.filter(Character.details['gender'] == 'male').all()
[('Arthur Dent',)]

Updating Complex Types

The SQLAlchemy Crate dialect supports change tracking deep down the nested levels of a Object type field. For example the following query will only update the gender key. The species key which is on the same level will be left untouched.

>>> char = session.query(Character).filter_by(name='Arthur Dent').one()
>>> char.details['gender'] = 'manly man'
>>> session.commit()
>>> session.refresh(char)
>>> char.details['gender']
u'manly man'
>>> char.details['species']
u'human'

Object Array

in addition to the Object type the Crate sqlalchemy dialect also includes a type called ObjectArray. This type maps to a Python list of dictionaries.

Note that opposed to the Object type the ObjectArray type isn’t smart and doesn’t have an intelligent change tracking. Therefore the generated UPDATE statement will affect the whole list:

>>> char.more_details = [{'foo': 1, 'bar': 10}, {'foo': 2}]
>>> session.commit()

>>> char.more_details.append({'foo': 3})
>>> session.commit()

This will generate a UPDATE statement roughly like this:

"UPDATE characters set more_details = ? ...", ([{'foo': 1, 'bar': 10}, {'foo': 2}, {'foo': 3}],)

>>> refresh()

If a query is done on a element inside a dictionary only one of the dictionaries inside the dictionaries has to match in order for the result to be returned:

>>> query = session.query(Character.name)
>>> query.filter(Character.more_details['foo'] == 1).all()
[(u'Arthur Dent',)]

Count and Group By

SQLAlchemy supports different approaches to issue a query with a count aggregate function. Take a look at the Counting section in the tutorial for a full overview.

Crate currently doesn’t support all variants as it can’t handle the sub-queries yet.

This means that queries with count have to be written in one of the following ways:

>>> session.query(sa.func.count(Character.id)).scalar()
2

>>> session.query(sa.func.count('*')).select_from(Character).scalar()
2

Note

The column that is passed to the count method has to be the primary key. Other columns won’t work.

Using the group_by clause is similar:

>>> session.query(sa.func.count(Character.id), Character.name) \
...     .group_by(Character.name) \
...     .order_by(sa.desc(sa.func.count(Character.id))) \
...     .order_by(Character.name).all()
[(1, u'Arthur Dent'), (1, u'Tricia McMillan')]