Data Definition in Crate

Table Basics

To create a table use the CREATE TABLE command. You must at least specify a name for the table and names and types of the columns. See Data Types for information about the supported data types.

Let’s create a simple table with two columns of type integer and string:

cr> create table my_table (
...   first_column integer,
...   second_column string
... )
CREATE OK (... sec)

A table can be removed by using the DROP TABLE command:

cr> drop table my_table
DROP OK (... sec)

Constraints

Primary Key

The primary key constraint combines a unique constraint and a not-null constraint. It also defines the default routing value used for sharding. Example:

cr> create table my_table1 (
...   first_column integer primary key,
...   second_column string
... )
CREATE OK (... sec)

Currently primary key’s cannot be auto generated and have to be specified if data is inserted, otherwise an error is returned.

Note

Multiple primary keys are not supported yet.

Index

At Crate, every column’s data is plain indexed by default.

Disable indexing

Indexing can be turned off by using the INDEX OFF constraint. Without an index the column can never be hit by a query, and is only available as a result column:

cr> create table my_table1b (
...   first_column string INDEX OFF
... )
CREATE OK (... sec)

Plain index (Default)

Using a plain index can also declared explicit:

cr> create table my_table1b1 (
...   first_column string INDEX using plain
... )
CREATE OK (... sec)

Which results in the same behaviour than without any index declaration:

cr> create table my_table1b2 (
...   first_column string
... )
CREATE OK (... sec)

Fulltext index with analyzer

By defining an index on a column, it’s analyzed data is indexed instead of the raw data. Thus, depending on the used analyzer, querying for the exact data may not work anymore. See Builtin Analyzer for details about available builtin analyzer or Create custom analyzer.

If no analyzer is specified using a fulltext index, the standard analyzer is used:

cr> create table my_table1c (
...   first_column string INDEX using fulltext
... )
CREATE OK (... sec)

Defining the usage of a concrete analyzer is straight forward by defining the analyzer as a parameter using the WITH statement:

cr> create table my_table1d (
...   first_column string INDEX using fulltext with(analyzer='english')
... )
CREATE OK (... sec)

Defining a named index column definition

It’s also possible to define an index column which treat the data of a given column as input. This is especially useful if you want to search for both, the exact and analyzed data:

cr> create table my_table1e (
...   first_column string,
...   INDEX first_column_ft using fulltext(first_column)
... )
CREATE OK (... sec)

Of course defining a custom analyzer is possible here too:

cr> create table my_table1f (
...   first_column string,
...   INDEX first_column_ft using fulltext(first_column) with(analyzer='english')
... )
CREATE OK (... sec)

Defining a composite index

Defining a composite (or combined) index is done using the same syntax as above despite multiple columns are given to the fulltext index method:

cr> create table documents (
...   title string,
...   body string,
...   INDEX title_body_ft using fulltext(title, body) with(analyzer='english')
... )
CREATE OK (... sec)

Warning

Creating a column index/analysis after a table was already created is currently not supported, so think carefully while designing your table definition.

Data Types

string

A text-based basic type containing one or more character. Example:

cr> create table my_table2 (
...   first_column string
... )
CREATE OK (... sec)

number

Crate supports a set of number types: integer,``long``, short, double, float and byte. All types have the same ranges as corresponding Java types. Example:

cr> create table my_table3 (
...   first_column integer,
...   second_column long,
...   third_column short,
...   fourth_column double,
...   fifth_column float,
...   sixth_column byte
... )
CREATE OK (... sec)

timestamp

The timestamp type is a special type which maps to a formatted string. Internally it maps to a long, adding the parsing from long to string and vice versa. All timestamps a treat as UTC. The default format is dateOptionalTime and cannot be changed currently. It will also accept a long representing UTC milliseconds since the epoch. Example:

cr> create table my_table4 (
...   first_column timestamp
... )
CREATE OK (... sec)

Sharding

Number of shards

Crate supports sharding natively, it even uses 5 shards by default if not further defined. The number of shards can be defined by using the CLUSTERED INTO <number> SHARDS statement on table creation. Example:

cr> create table my_table5 (
...   first_column int
... ) clustered into 10 shards
CREATE OK (... sec)

Note

The number of shards can only be set on table creation, it cannot be changed later on.

Routing

The column used for routing can be freely defined using the CLUSTERED BY (<column>) statement and is used to route a row to a particular shard. Example:

cr> create table my_table6 (
...   first_column int primary key,
...   second_column string
... ) clustered by (first_column)
CREATE OK (... sec)

By default Crate is using the primary keys for routing the request to the involved shards. So following two examples resulting in the same behaviour:

cr> create table my_table7 (
...   first_column int primary key,
...   second_column string
... )
CREATE OK (... sec)

cr> create table my_table8 (
...   first_column int primary key,
...   second_column string
... ) clustered by (first_column)
CREATE OK (... sec)

If no primary is defined an internal generated unique id is used for routing.

Note

It is currently not supported to define a column for routing which is not a primary key or member of a composite primary key.

Example for combining custom routing and shard definition:

cr> create table my_table9 (
...   first_column int primary key,
...   second_column string
... ) clustered by (first_column) into 10 shards
CREATE OK (... sec)

Replication

By default Crate uses an replication factor of 1. If e.g. a cluster with 2 nodes is set up and an index is created using 5 shards, each node will have 5 shards. Defining the number of replicas is done using the REPLICAS <number_of_replicas> statement. Example:

cr> create table my_table10 (
...   first_column int,
...   second_column string
... ) replicas 1
CREATE OK (... sec)

Note

The number of replicas can be changed at any time.

System Columns

On every table Crate implements several implicitly defined system columns. Their names are reserved and cannot be used as user-defined column names. All system columns are prefixed with an underscore and therefore must be quoted on usage.

_version
Crate uses an internal versioning for every row, the version number is increased on every write. This column can be used for Optimistic Concurrency Control, see Optimistic Concurrency Control with Crate for usage details.
_score
This internal system column is available on all documents retrieved by a SELECT query. It is representing the scoring ratio of the document related to the used query filter and makes most sense on fulltext searches. The scoring ratio is always related to the highest score determined by a search, thus scores are not directly comparable across searches. If the query does not include a fulltext search the value is 1.0f in most cases.