Table Basics
To create a table use the CREATE TABLE command. You must at least specify a name for the
table and names and types of the columns.
See Data Types for information about the supported data types.
Let’s create a simple table with two columns of type integer and string:
cr> create table my_table (
... first_column integer,
... second_column string
... )
CREATE OK (... sec)
A table can be removed by using the DROP TABLE command:
cr> drop table my_table
DROP OK (... sec)
Data Types
boolean
A basic boolean type. Accepting true and false as values. Example:
cr> create table my_bool_table (
... first_column boolean
... )
CREATE OK (... sec)
cr> drop table my_bool_table
DROP OK (... sec)
string
A text-based basic type containing one or more character. Example:
cr> create table my_table2 (
... first_column string
... )
CREATE OK (... sec)
number
Crate supports a set of number types: integer, long, short, double,
float and byte. All types have the same ranges as corresponding Java types.
You can insert any number for any type, be it a float, integer, or byte
as long as its within the corresponding range.
Example:
cr> create table my_table3 (
... first_column integer,
... second_column long,
... third_column short,
... fourth_column double,
... fifth_column float,
... sixth_column byte
... )
CREATE OK (... sec)
timestamp
The timestamp type is a special type which maps to a formatted string. Internally it maps to
the UTC milliseconds since 1970-01-01T00:00:00Z stored as long. They are always returned as long.
The default format is dateOptionalTime and cannot be changed currently.
Formatted date strings containing timezone offset information will be converted to UTC.
Formated string without timezone offset information will be treated as UTC.
Timestamps will also accept a long representing UTC milliseconds since the epoch or
a float or double representing UTC seconds since the epoch with milliseconds as
fractions. Example:
cr> create table my_table4 (
... id integer,
... first_column timestamp
... )
CREATE OK (... sec)
cr> insert into my_table4 (id, first_column) values (0, '1970-01-01T00:00:00')
INSERT OK, 1 row affected (... sec)
cr> insert into my_table4 (id, first_column) values (1, '1970-01-01T00:00:00+0100')
INSERT OK, 1 row affected (... sec)
cr> insert into my_table4 (id, first_column) values (2, 0)
INSERT OK, 1 row affected (... sec)
cr> insert into my_table4 (id, first_column) values (3, 1.0)
INSERT OK, 1 row affected (... sec)
cr> insert into my_table4 (id, first_column) values (3, 'wrong')
ValidationException[Validation failed for first_column: Invalid timestamp: Invalid ISO-date string]
object
The object type allows to define nested documents instead of old-n-busted flat tables.
An object can contain other fields of any type, even further object columns.
An Object column can be either schemaless or enforce its defined schema.
It can even can be used as a kind of json-blob.
Syntax:
<columnName> OBJECT [ ({DYNAMIC|STRICT|IGNORED}) ] [ AS ( <columnDefinition>* ) ]
The only required part of this column definition is OBJECT.
The object type defining this objects behaviour is optional, if left out DYNAMIC will be used.
The list of subcolumns is optional as well, if left out, this object will have any schema.
Example:
cr> create table my_table11 (
... title string,
... col1 object,
... col3 object(strict) as (
... age integer,
... name string,
... col31 object as (
... birthday timestamp
... )
... )
... )
CREATE OK (... sec)
strict
It can be configured to be strict, rejecting any subcolumn that is not defined upfront
in the schema. As you might have guessed, defining strict objects without subcolumns results
in an unusable column that will always be null, which is the most useless column one could create.
Example:
cr> create table my_table12 (
... title string,
... author object(strict) as (
... name string,
... birthday timestamp
... )
... )
CREATE OK (... sec)
dynamic
Another option is dynamic, which means that new subcolumns can be added in this object.
Note that adding new columns to a dynamic object will affect the schema of the
table. Once a column is added, it shows up in the information_schema.columns and information_schema.indices
tables and its its type and attributes are fixed. They will have the type that was guessed by their
inserted/updated value and they will always be not_indexed which means
they are analyzed with the plain analyzer, which means as-is.
If a new column a was added with type integer,
adding strings to this column will result in an error.
Examples:
cr> create table my_table13 (
... title string,
... author object as (
... name string,
... birthday timestamp
... )
... )
CREATE OK (... sec)
which is exactly the same as:
cr> create table my_table14 (
... title string,
... author object(dynamic) as (
... name string,
... birthday timestamp
... )
... )
CREATE OK (... sec)
New columns added to dynamic objects are, once added,
usable as usual subcolumns. One can retrieve them, sort by them
and use them in where clauses.
ignored
The third option is ignored which results in an object that allows inserting new subcolumns
but this adding will not affect the schema, they are not mapped according to their type,
which is therefor not guessed as well. You can in fact add any value to an added column of the
same name. The first value added does not determine what you can add further,
like with dynamic objects.
An object configured like this will simply accept and return the columns inserted into it,
but otherwise ignore them.
cr> create table my_table15 (
... title string,
... details object(ignored) as (
... num_pages integer,
... font_size float
... )
... )
CREATE OK (... sec)
New columns added to ignored objects can be retrieved as result column in a SELECT statement,
but one cannot order by them or use them in a where clause. They are simply there for fetching,
nothing else.
Sharding
Number of shards
Crate supports sharding natively, it even uses 5 shards by default if not further defined.
The number of shards can be defined by using the CLUSTERED INTO <number> SHARDS statement on
table creation. Example:
cr> create table my_table5 (
... first_column int
... ) clustered into 10 shards
CREATE OK (... sec)
Note
The number of shards can only be set on table creation, it cannot be changed later on.
Routing
The column used for routing can be freely defined using the CLUSTERED BY (<column>)
statement and is used to route a row to a particular shard. Example:
cr> create table my_table6 (
... first_column int primary key,
... second_column string
... ) clustered by (first_column)
CREATE OK (... sec)
By default Crate is using the primary keys for routing the request to the involved shards. So
following two examples resulting in the same behaviour:
cr> create table my_table7 (
... first_column int primary key,
... second_column string
... )
CREATE OK (... sec)
cr> create table my_table8 (
... first_column int primary key,
... second_column string
... ) clustered by (first_column)
CREATE OK (... sec)
If no primary is defined an internal generated unique id is used for routing.
Note
It is currently not supported to define a column for routing which is not a primary key or
member of a composite primary key.
Example for combining custom routing and shard definition:
cr> create table my_table9 (
... first_column int primary key,
... second_column string
... ) clustered by (first_column) into 10 shards
CREATE OK (... sec)
Replication
By default Crate uses an replication factor of 1. If e.g. a cluster with 2 nodes is set up and
an index is created using 5 shards, each node will have 5 shards.
Defining the number of replicas is done using the REPLICAS <number_of_replicas> statement.
Example:
cr> create table my_table10 (
... first_column int,
... second_column string
... ) replicas 1
CREATE OK (... sec)
Note
The number of replicas can be changed at any time.
Indices and fulltext search
Fulltext indices take the contents of one or more fields and split it up into tokens that are
used for fulltext-search. The transformation from a text to separate tokens is done by an
analyzer. In order to create fulltext search queries a
fulltext index with an analyzer must be defined for the related
columns.
Index Definition
At Crate, every column’s data is indexed using the plain index method by default.
Currently 3 choices related to index definition exists:
Warning
Creating an index after a table was already created is currently not supported,
so think carefully while designing your table definition.
Disable indexing
Indexing can be turned off by using the INDEX OFF column definition.
Without an index the column can never be hit by a query, and is only available as a result
column:
cr> create table my_table1b (
... first_column string INDEX OFF
... )
CREATE OK (... sec)
Plain index (Default)
An index of type plain is indexing the input data as-is without analyzing.
Using the plain index method is the default behaviour but can also be declared explicitly:
cr> create table my_table1b1 (
... first_column string INDEX using plain
... )
CREATE OK (... sec)
This results in the same behaviour than without any index declaration:
cr> create table my_table1b2 (
... first_column string
... )
CREATE OK (... sec)
Fulltext index with analyzer
By defining an index on a column, it’s analyzed data is indexed instead of the raw data.
Thus, depending on the used analyzer, querying for the exact data may not work anymore.
See Builtin Analyzer for details about available builtin analyzer or
Create custom analyzer.
If no analyzer is specified using a fulltext index, the standard
analyzer is used:
cr> create table my_table1c (
... first_column string INDEX using fulltext
... )
CREATE OK (... sec)
Defining the usage of a concrete analyzer is straight forward by defining the analyzer as a
parameter using the WITH statement:
cr> create table my_table1d (
... first_column string INDEX using fulltext with(analyzer='english')
... )
CREATE OK (... sec)
Defining a named index column definition
It’s also possible to define an index column which treat the data of a given column as input.
This is especially useful if you want to search for both, the exact and analyzed data:
cr> create table my_table1e (
... first_column string,
... INDEX first_column_ft using fulltext(first_column)
... )
CREATE OK (... sec)
Of course defining a custom analyzer is possible here too:
cr> create table my_table1f (
... first_column string,
... INDEX first_column_ft using fulltext(first_column) with(analyzer='english')
... )
CREATE OK (... sec)
Defining a composite index
Defining a composite (or combined) index is done using the same syntax as above despite multiple
columns are given to the fulltext index method:
cr> create table documents (
... title string,
... body string,
... INDEX title_body_ft using fulltext(title, body) with(analyzer='english')
... )
CREATE OK (... sec)
Composite indices can include nested columns within object columns as well:
cr> create table my_table1g (
... title string,
... author object(dynamic) as (
... name string,
... birthday timestamp
... ),
... INDEX author_title_ft using fulltext(title, author['name'])
... )
CREATE OK (... sec)
Create custom analyzer
An analyzer consists of one tokenizer, zero or more token-filters, and zero or more char-filters.
When a field-content is analyzed to become a stream of tokens, the char-filter is applied at first.
It is used to filter some special chars from the stream of characters that make up the content.
Tokenizers split the possibly filtered stream of characters into tokens.
Token-filters can add tokens, delete tokens or transform them to finally produce the desired
stream of tokens.
With these elements in place, analyzer provide finegrained control over building a token stream
used for fulltext search.
For example you can use language specific analyzers, tokenizers and token-filters to get proper
search results for data provided in a certain language.
Create Analyzer Syntax:
CREATE ANALYZER <analyzer_name> [EXTENDS <analyzer_name>] (
[
TOKENIZER <tokenizer_name> [WITH] (
<tokenizer_property>=<value>,
...
),
]
[
TOKEN_FILTERS [WITH] (
<token_filter_name>
[ [WITH] (
<token_filter_property>=<value>,
...
)
],
...
),
]
[
CHAR_FILTERS [WITH] (
<char_filter_name>
[ [WITH] (
<char_filter_property>=<value>,
...
)
],
...
)
]
)
Multiple char filters and token filters are allowed but at maximum one tokenizer.
Order does not matter.
A simple Example:
cr> create ANALYZER myanalyzer (
... TOKENIZER whitespace,
... TOKEN_FILTERS WITH (
... lowercase,
... kstem
... ),
... CHAR_FILTERS (
... html_strip,
... mymapping WITH (
... type='mapping',
... mappings = ['ph=>f', 'qu=>q', 'foo=>bar']
... )
... )
... )
CREATE OK (... sec)
This example creates an analyzer called myanalyzer to be used in index-definitions and
index-constraints.
It will use a whitespace tokenizer, a lowercase token-filter
and a kstem token-filter, a html strip char-filter
and a custom char-filter that extends the mapping char-filter.
You can use Builtin Tokenizer, Builtin Token Filter and Builtin Char Filter
by just writing their names and you can extend and parameterize them,
see for example the mymapping char-filter above. You have to give these extended ones a
unique name.
Note
Starting with release 0.18.0 one could extend custom tokenizers,
token-filters and char-filters. This is not possible anymore.
Nonetheless you can still extend custom analyzers to reuse their elements.
We might reintroduce this feature when we support creating tokenizers etc. standalone
e.g. by a CREATE TOKENIZER statement.
Extending Bultin Analyzer
Existing Analyzers can be used to create custom Analyzers by means of extending them.
You can extend and parameterize Builtin Analyzer like this:
cr> create ANALYZER "german_snowball" extends snowball WITH (
... language='german'
... )
CREATE OK (... sec)
If you extend Builtin Analyzer, tokenizer, char-filter or token-filter cannot be defined.
In this case use the parameters available for the extended Builtin Analyzer.
If you extend custom-analyzers, every part of the analyzer that is ommitted will be taken from
the extended one.
Example:
cr> create ANALYZER e2 EXTENDS myanalyzer (
... TOKENIZER mypattern (
... type='pattern',
... pattern='.*'
... )
... )
CREATE OK (... sec)
This analyzer will use the char-filters and token-filters from myanalyzer
and will override the tokenizer with mypattern.