Information Schema

The Information Schema is a special schema that contains virtual tables which are read-only and can be queried to get information about the state of the cluster.

Note

currently only the tables tables and columns are implemented.

Tables

The information schema contains a table called tables.

This table can be queried to get a list of all available tables and their settings like the number of shards or number of replicas:

cr> select * from information_schema.tables
... where table_name not like 'my_table%' order by schema_name asc, table_name asc;
+--------------------+-------------------+------------------+--------------------+--------------+----------------+
| schema_name        | table_name        | number_of_shards | number_of_replicas | clustered_by | partitioned_by |
+--------------------+-------------------+------------------+--------------------+--------------+----------------+
| blob               | myblobs           | 3                | 1                  | digest       | NULL           |
| doc                | documents         | 5                | 1                  | _id          | NULL           |
| doc                | locations         | 2                | 0                  | id           | NULL           |
| doc                | partitioned_table | 5                | 1                  | _id          | [u'date']      |
| doc                | quotes            | 2                | 0                  | id           | NULL           |
| information_schema | columns           | 1                | 0                  | NULL         | NULL           |
| information_schema | routines          | 1                | 0                  | NULL         | NULL           |
| information_schema | table_constraints | 1                | 0                  | NULL         | NULL           |
| information_schema | table_partitions  | 1                | 0                  | NULL         | NULL           |
| information_schema | tables            | 1                | 0                  | NULL         | NULL           |
| sys                | cluster           | 1                | 0                  | NULL         | NULL           |
| sys                | nodes             | 1                | 0                  | NULL         | NULL           |
| sys                | shards            | 1                | 0                  | NULL         | NULL           |
+--------------------+-------------------+------------------+--------------------+--------------+----------------+
SELECT 13 rows in set (... sec)

Columns

This table can be queried to get a list of all available columns of all tables and their definition like data type and ordinal position inside the table:

cr> select * from information_schema.columns
... where schema_name='doc' and table_name not like 'my_table%'
... order by table_name asc, column_name asc;
+-------------+-------------------+------------------+------------------+--------------+
| schema_name | table_name        | column_name      | ordinal_position | data_type    |
+-------------+-------------------+------------------+------------------+--------------+
| doc         | documents         | body             | 1                | string       |
| doc         | documents         | title            | 2                | string       |
| doc         | locations         | date             | 1                | timestamp    |
| doc         | locations         | description      | 2                | string       |
| doc         | locations         | id               | 3                | string       |
| doc         | locations         | kind             | 4                | string       |
| doc         | locations         | name             | 5                | string       |
| doc         | locations         | position         | 6                | integer      |
| doc         | locations         | race             | 7                | object       |
| doc         | locations         | race.description | 8                | string       |
| doc         | locations         | race.interests   | 9                | string_array |
| doc         | locations         | race.name        | 10               | string       |
| doc         | partitioned_table | date             | 1                | timestamp    |
| doc         | partitioned_table | id               | 2                | long         |
| doc         | partitioned_table | title            | 3                | string       |
| doc         | quotes            | id               | 1                | integer      |
| doc         | quotes            | quote            | 2                | string       |
+-------------+-------------------+------------------+------------------+--------------+
SELECT 17 rows in set (... sec)

You can even query this tables’ own columns (attention: this might lead to infinite recursion of your mind, beware!):

cr> select column_name, data_type, ordinal_position from information_schema.columns
... where schema_name = 'information_schema' and table_name = 'columns' order by ordinal_position asc;
+------------------+-----------+------------------+
| column_name      | data_type | ordinal_position |
+------------------+-----------+------------------+
| schema_name      | string    | 1                |
| table_name       | string    | 2                |
| column_name      | string    | 3                |
| ordinal_position | short     | 4                |
| data_type        | string    | 5                |
+------------------+-----------+------------------+
SELECT 5 rows in set (... sec)

Note

Columns at Crate are always sorted alphabetically in ascending order despite in which order they were defined on table creation. Thus the ordinal_position reflects the alphabetical position.

Table Constraints

This table can be queries to get a list of all defined table constraints, their type, name and which table they are defined in.

Note

Currently only PRIMARY_KEY constraints are supported.

cr> select * from information_schema.table_constraints
... where table_name not like 'my_table%'
... order by schema_name desc, table_name desc limit 10;
+--------------------+-------------------+------------------------------------------------------------+-----------------+
| schema_name        | table_name        | constraint_name                                            | constraint_type |
+--------------------+-------------------+------------------------------------------------------------+-----------------+
| sys                | shards            | [u'schema_name', u'table_name', u'id', u'partition_ident'] | PRIMARY_KEY     |
| sys                | nodes             | [u'id']                                                    | PRIMARY_KEY     |
| information_schema | tables            | [u'schema_name', u'table_name']                            | PRIMARY_KEY     |
| information_schema | columns           | [u'schema_name', u'table_name', u'column_name']            | PRIMARY_KEY     |
| doc                | quotes            | [u'id']                                                    | PRIMARY_KEY     |
| doc                | partitioned_table | [u'_id']                                                   | PRIMARY_KEY     |
| doc                | locations         | [u'id']                                                    | PRIMARY_KEY     |
| doc                | documents         | [u'_id']                                                   | PRIMARY_KEY     |
| blob               | myblobs           | [u'digest']                                                | PRIMARY_KEY     |
+--------------------+-------------------+------------------------------------------------------------+-----------------+
SELECT 9 rows in set (... sec)

Table Partitions

This table can be queried to get information about all partitioned tables, Each partition of a table is represented as one row. The row contains the information table name, schema name, partition ident, and the values of the partition. values is a key-value object with the ‘partitioned by column’ as key(s) and the corresponding value as value(s).

For further information see Partitioned Tables.

The following example shows a table where the column ‘content’ of table ‘a_partitioned_table’ has been used to partition the table. The table has two partitions. The partitions are introduced when data is inserted where ‘content’ is ‘content_a’, and ‘content_b’:

cr> select * from information_schema.table_partitions
... order by table_name, partition_ident
+---------------------+-------------+--------------------+----------------------------+
| table_name          | schema_name | partition_ident    | values                     |
+---------------------+-------------+--------------------+----------------------------+
| a_partitioned_table | doc         | 04566rreehimst2vc4 | {u'content': u'content_a'} |
| a_partitioned_table | doc         | 04566rreehimst2vc8 | {u'content': u'content_b'} |
+---------------------+-------------+--------------------+----------------------------+
SELECT 2 rows in set (... sec)

Indices

Note

currently not implemented

This table can be queried to get a list of all defined indices of all columns and their definition like index method, expression list and property list. Using a plain index for every column is the default behaviour at Crate, so almost all columns are listed as an index as well:

cr> select * from information_schema.indices
... where table_name not like 'my_table%' order by table_name asc, index_name asc; #doctest: +SKIP
+------------+---------------------+----------+---------------------------+------------------+
| table_name | index_name          | method   | columns                   | properties       |
+------------+---------------------+----------+---------------------------+------------------+
| documents  | body                | plain    | [u'body']                 |                  |
| documents  | title               | plain    | [u'title']                |                  |
| documents  | title_body_ft       | fulltext | [u'body', u'title']       | analyzer=english |
| locations  | date                | plain    | [u'date']                 |                  |
| locations  | description         | plain    | [u'description']          |                  |
| locations  | id                  | plain    | [u'id']                   |                  |
| locations  | kind                | plain    | [u'kind']                 |                  |
| locations  | name                | plain    | [u'name']                 |                  |
| locations  | name_description_ft | fulltext | [u'description', u'name'] | analyzer=english |
| locations  | position            | plain    | [u'position']             |                  |
| locations  | race                | plain    | [u'race']                 |                  |
| quotes     | id                  | plain    | [u'id']                   |                  |
| quotes     | quote               | plain    | [u'quote']                |                  |
+------------+---------------------+----------+---------------------------+------------------+
SELECT 13 rows in set (... sec)

Routines

The routines table contains all custom analyzers, tokenizers, token-filters and char-filters and all custom analyzers created by CREATE ANALYZER statements (see Create custom analyzer):

cr> select routine_name, routine_type from information_schema.routines
... group by routine_name, routine_type order by routine_name asc limit 5;
+----------------------+--------------+
| routine_name         | routine_type |
+----------------------+--------------+
| arabic               | ANALYZER     |
| arabic_normalization | TOKEN_FILTER |
| arabic_stem          | TOKEN_FILTER |
| armenian             | ANALYZER     |
| asciifolding         | TOKEN_FILTER |
+----------------------+--------------+
SELECT 5 rows in set (... sec)

For example you can use this table to list existing tokenizers like this:

cr> select routine_name from information_schema.routines
... where routine_type='TOKENIZER'
... order by routine_name asc limit 10;
+----------------+
| routine_name   |
+----------------+
| classic        |
| e2_mypattern   |
| edgeNGram      |
| edge_ngram     |
| keyword        |
| letter         |
| lowercase      |
| nGram          |
| ngram          |
| path_hierarchy |
+----------------+
SELECT 10 rows in set (... sec)

Or get an overview of how many routines and routine types are available:

cr> select count(*), routine_type from information_schema.routines
... group by routine_type order by routine_type;
+----------+--------------+
| count(*) | routine_type |
+----------+--------------+
| 45       | ANALYZER     |
| 5        | CHAR_FILTER  |
| 14       | TOKENIZER    |
| 46       | TOKEN_FILTER |
+----------+--------------+
SELECT 4 rows in set (... sec)