Data Definition in Crate

Table Basics

To create a table use the CREATE TABLE command. You must at least specify a name for the table and names and types of the columns. See Data Types for information about the supported data types.

Let’s create a simple table with two columns of type integer and string:

cr> create table my_table (
...   first_column integer,
...   second_column string
... )
CREATE OK (... sec)

A table can be removed by using the DROP TABLE command:

cr> drop table my_table
DROP OK (... sec)

Constraints

Primary Key

The primary key constraint combines a unique constraint and a not-null constraint. It also defines the default routing value used for sharding. Example:

cr> create table my_table1 (
...   first_column integer primary key,
...   second_column string
... )
CREATE OK (... sec)

Currently primary key’s cannot be auto generated and have to be specified if data is inserted, otherwise an error is returned.

Note

Multiple primary keys are not supported yet.

Data Types

string

A text-based basic type containing one or more character. Example:

cr> create table my_table2 (
...   first_column string
... )
CREATE OK (... sec)

number

Crate supports a set of number types: integer, long, short, double, float and byte. All types have the same ranges as corresponding Java types. Example:

cr> create table my_table3 (
...   first_column integer,
...   second_column long,
...   third_column short,
...   fourth_column double,
...   fifth_column float,
...   sixth_column byte
... )
CREATE OK (... sec)

timestamp

The timestamp type is a special type which maps to a formatted string. Internally it maps to a long, adding the parsing from long to string and vice versa. All timestamps a treat as UTC. The default format is dateOptionalTime and cannot be changed currently. It will also accept a long representing UTC milliseconds since the epoch. Example:

cr> create table my_table4 (
...   first_column timestamp
... )
CREATE OK (... sec)

Sharding

Number of shards

Crate supports sharding natively, it even uses 5 shards by default if not further defined. The number of shards can be defined by using the CLUSTERED INTO <number> SHARDS statement on table creation. Example:

cr> create table my_table5 (
...   first_column int
... ) clustered into 10 shards
CREATE OK (... sec)

Note

The number of shards can only be set on table creation, it cannot be changed later on.

Routing

The column used for routing can be freely defined using the CLUSTERED BY (<column>) statement and is used to route a row to a particular shard. Example:

cr> create table my_table6 (
...   first_column int primary key,
...   second_column string
... ) clustered by (first_column)
CREATE OK (... sec)

By default Crate is using the primary keys for routing the request to the involved shards. So following two examples resulting in the same behaviour:

cr> create table my_table7 (
...   first_column int primary key,
...   second_column string
... )
CREATE OK (... sec)

cr> create table my_table8 (
...   first_column int primary key,
...   second_column string
... ) clustered by (first_column)
CREATE OK (... sec)

If no primary is defined an internal generated unique id is used for routing.

Note

It is currently not supported to define a column for routing which is not a primary key or member of a composite primary key.

Example for combining custom routing and shard definition:

cr> create table my_table9 (
...   first_column int primary key,
...   second_column string
... ) clustered by (first_column) into 10 shards
CREATE OK (... sec)

Replication

By default Crate uses an replication factor of 1. If e.g. a cluster with 2 nodes is set up and an index is created using 5 shards, each node will have 5 shards. Defining the number of replicas is done using the REPLICAS <number_of_replicas> statement. Example:

cr> create table my_table10 (
...   first_column int,
...   second_column string
... ) replicas 1
CREATE OK (... sec)

Note

The number of replicas can be changed at any time.

System Columns

On every table Crate implements several implicitly defined system columns. Their names are reserved and cannot be used as user-defined column names. All system columns are prefixed with an underscore and therefore must be quoted on usage.

_version
Crate uses an internal versioning for every row, the version number is increased on every write. This column can be used for Optimistic Concurrency Control, see Optimistic Concurrency Control with Crate for usage details.
_score
This internal system column is available on all documents retrieved by a SELECT query. It is representing the scoring ratio of the document related to the used query filter and makes most sense on fulltext searches. The scoring ratio is always related to the highest score determined by a search, thus scores are not directly comparable across searches. If the query does not include a fulltext search the value is 1.0f in most cases.