Data Definition in Crate

Table Basics

To create a table use the CREATE TABLE command. You must at least specify a name for the table and names and types of the columns. See Data Types for information about the supported data types.

Let’s create a simple table with two columns of type integer and string:

cr> create table my_table (
...   first_column integer,
...   second_column string
... )
CREATE OK (... sec)

A table can be removed by using the DROP TABLE command:

cr> drop table my_table
DROP OK (... sec)

Constraints

Primary Key

The primary key constraint combines a unique constraint and a not-null constraint. It also defines the default routing value used for sharding. Example:

cr> create table my_table1 (
...   first_column integer primary key,
...   second_column string
... )
CREATE OK (... sec)

Currently primary key’s cannot be auto generated and have to be specified if data is inserted, otherwise an error is returned.

Note

Multiple primary keys are not supported yet.

Data Types

boolean

A basic boolean type. Accepting true and false as values. Example:

cr> create table my_bool_table (
...   first_column boolean
... )
CREATE OK (... sec)

cr> drop table my_bool_table
DROP OK (... sec)

string

A text-based basic type containing one or more character. Example:

cr> create table my_table2 (
...   first_column string
... )
CREATE OK (... sec)

number

Crate supports a set of number types: integer, long, short, double, float and byte. All types have the same ranges as corresponding Java types. You can insert any number for any type, be it a float, integer, or byte as long as its within the corresponding range. Example:

cr> create table my_table3 (
...   first_column integer,
...   second_column long,
...   third_column short,
...   fourth_column double,
...   fifth_column float,
...   sixth_column byte
... )
CREATE OK (... sec)

timestamp

The timestamp type is a special type which maps to a formatted string. Internally it maps to the UTC milliseconds since 1970-01-01T00:00:00Z stored as long. They are always returned as long. The default format is dateOptionalTime and cannot be changed currently. Formatted date strings containing timezone offset information will be converted to UTC. Formated string without timezone offset information will be treated as UTC. Timestamps will also accept a long representing UTC milliseconds since the epoch or a float or double representing UTC seconds since the epoch with milliseconds as fractions. Example:

cr> create table my_table4 (
...   id integer,
...   first_column timestamp
... )
CREATE OK (... sec)

cr> insert into my_table4 (id, first_column) values (0, '1970-01-01T00:00:00')
INSERT OK, 1 row affected (... sec)

cr> insert into my_table4 (id, first_column) values (1, '1970-01-01T00:00:00+0100')
INSERT OK, 1 row affected (... sec)

cr> insert into my_table4 (id, first_column) values (2, 0)
INSERT OK, 1 row affected (... sec)

cr> insert into my_table4 (id, first_column) values (3, 1.0)
INSERT OK, 1 row affected (... sec)

cr> insert into my_table4 (id, first_column) values (3, 'wrong')
ValidationException[Validation failed for first_column: wrong type 'string'. expected: 'timestamp']

object

The object type allows to define nested documents instead of old-n-busted flat tables. An object can contain other fields of any type, even further object columns. An Object column can be either schemaless or enforce its defined schema. It can even be used as a kind of json-blob.

Syntax:

<columnName> OBJECT [ ({DYNAMIC|STRICT|IGNORED}) ] [ AS ( <columnDefinition>* ) ]

The only required part of this column definition is OBJECT. The object type defining this objects behaviour is optional, if left out DYNAMIC will be used. The list of subcolumns is optional as well, if left out, this object will have no schema (with a schema created on the fly on first inserts in case of DYNAMIC).

Example:

cr> create table my_table11 (
...   title string,
...   col1 object,
...   col3 object(strict) as (
...     age integer,
...     name string,
...     col31 object as (
...       birthday timestamp
...     )
...   )
... )
CREATE OK (... sec)

strict

It can be configured to be strict, rejecting any subcolumn that is not defined upfront in the schema. As you might have guessed, defining strict objects without subcolumns results in an unusable column that will always be null, which is the most useless column one could create.

Example:

cr> create table my_table12 (
...   title string,
...   author object(strict) as (
...     name string,
...     birthday timestamp
...   )
... )
CREATE OK (... sec)

dynamic

Another option is dynamic, which means that new subcolumns can be added in this object.

Note that adding new columns to a dynamic object will affect the schema of the table. Once a column is added, it shows up in the information_schema.columns and information_schema.indices tables and its type and attributes are fixed. They will have the type that was guessed by their inserted/updated value and they will always be not_indexed which means they are analyzed with the plain analyzer, which means as-is. If a new column a was added with type integer, adding strings to this column will result in an error.

Examples:

cr> create table my_table13 (
...   title string,
...   author object as (
...     name string,
...     birthday timestamp
...   )
... )
CREATE OK (... sec)

which is exactly the same as:

cr> create table my_table14 (
...   title string,
...   author object(dynamic) as (
...     name string,
...     birthday timestamp
...   )
... )
CREATE OK (... sec)

New columns added to dynamic objects are, once added, usable as usual subcolumns. One can retrieve them, sort by them and use them in where clauses.

ignored

The third option is ignored which results in an object that allows inserting new subcolumns but this adding will not affect the schema, they are not mapped according to their type, which is therefor not guessed as well. You can in fact add any value to an added column of the same name. The first value added does not determine what you can add further, like with dynamic objects. An object configured like this will simply accept and return the columns inserted into it, but otherwise ignore them.

cr> create table my_table15 ( ... title string, ... details object(ignored) as ( ... num_pages integer, ... font_size float ... ) ... ) CREATE OK (... sec)

New columns added to ignored objects can be retrieved as result column in a SELECT statement, but one cannot order by them or use them in a where clause. They are simply there for fetching, nothing else.

Sharding

Number of shards

Crate supports sharding natively, it even uses 5 shards by default if not further defined. The number of shards can be defined by using the CLUSTERED INTO <number> SHARDS statement on table creation. Example:

cr> create table my_table5 (
...   first_column int
... ) clustered into 10 shards
CREATE OK (... sec)

Note

The number of shards can only be set on table creation, it cannot be changed later on.

Routing

The column used for routing can be freely defined using the CLUSTERED BY (<column>) statement and is used to route a row to a particular shard. Example:

cr> create table my_table6 (
...   first_column int primary key,
...   second_column string
... ) clustered by (first_column)
CREATE OK (... sec)

By default Crate is using the primary keys for routing the request to the involved shards. So following two examples resulting in the same behaviour:

cr> create table my_table7 (
...   first_column int primary key,
...   second_column string
... )
CREATE OK (... sec)

cr> create table my_table8 (
...   first_column int primary key,
...   second_column string
... ) clustered by (first_column)
CREATE OK (... sec)

If no primary is defined an internal generated unique id is used for routing.

Note

It is currently not supported to define a column for routing which is not a primary key or member of a composite primary key.

Example for combining custom routing and shard definition:

cr> create table my_table9 (
...   first_column int primary key,
...   second_column string
... ) clustered by (first_column) into 10 shards
CREATE OK (... sec)

Replication

By default Crate uses an replication factor of 1. If e.g. a cluster with 2 nodes is set up and an index is created using 5 shards, each node will have 5 shards. Defining the number of replicas is done using the REPLICAS <number_of_replicas> statement. Example:

cr> create table my_table10 (
...   first_column int,
...   second_column string
... ) replicas 1
CREATE OK (... sec)

Note

The number of replicas can be changed at any time.

System Columns

On every table Crate implements several implicitly defined system columns. Their names are reserved and cannot be used as user-defined column names. All system columns are prefixed with an underscore and therefor must be quoted on usage.

_version
Crate uses an internal versioning for every row, the version number is increased on every write. This column can be used for Optimistic Concurrency Control, see Optimistic Concurrency Control with Crate for usage details.
_score
This internal system column is available on all documents retrieved by a SELECT query. It is representing the scoring ratio of the document related to the used query filter and makes most sense on fulltext searches. The scoring ratio is always related to the highest score determined by a search, thus scores are not directly comparable across searches. If the query does not include a fulltext search the value is 1.0f in most cases.