The Information Schema is a special schema that contains virtual tables which are read-only and can be queried to get information about the state of the cluster.
Note
currently only the tables tables and columns are implemented.
The information schema contains a table called tables.
This table can be queried to get a list of all available tables and their settings like the number of shards or number of replicas:
cr> select * from information_schema.tables
... where table_name not like 'my_table%' order by schema_name asc, table_name asc;
+--------------------+-------------------+------------------+--------------------+--------------+----------------+
| schema_name | table_name | number_of_shards | number_of_replicas | clustered_by | partitioned_by |
+--------------------+-------------------+------------------+--------------------+--------------+----------------+
| blob | myblobs | 3 | 1 | digest | NULL |
| doc | documents | 5 | 1 | _id | NULL |
| doc | locations | 2 | 0 | id | NULL |
| doc | partitioned_table | 5 | 1 | _id | [u'date'] |
| doc | quotes | 2 | 0 | id | NULL |
| information_schema | columns | 1 | 0 | NULL | NULL |
| information_schema | routines | 1 | 0 | NULL | NULL |
| information_schema | table_constraints | 1 | 0 | NULL | NULL |
| information_schema | table_partitions | 1 | 0 | NULL | NULL |
| information_schema | tables | 1 | 0 | NULL | NULL |
| sys | cluster | 1 | 0 | NULL | NULL |
| sys | nodes | 1 | 0 | NULL | NULL |
| sys | shards | 1 | 0 | NULL | NULL |
+--------------------+-------------------+------------------+--------------------+--------------+----------------+
SELECT 13 rows in set (... sec)
This table can be queried to get a list of all available columns of all tables and their definition like data type and ordinal position inside the table:
cr> select * from information_schema.columns
... where schema_name='doc' and table_name not like 'my_table%'
... order by table_name asc, column_name asc;
+-------------+-------------------+------------------+------------------+--------------+
| schema_name | table_name | column_name | ordinal_position | data_type |
+-------------+-------------------+------------------+------------------+--------------+
| doc | documents | body | 1 | string |
| doc | documents | title | 2 | string |
| doc | locations | date | 1 | timestamp |
| doc | locations | description | 2 | string |
| doc | locations | id | 3 | string |
| doc | locations | kind | 4 | string |
| doc | locations | name | 5 | string |
| doc | locations | position | 6 | integer |
| doc | locations | race | 7 | object |
| doc | locations | race.description | 8 | string |
| doc | locations | race.interests | 9 | string_array |
| doc | locations | race.name | 10 | string |
| doc | partitioned_table | date | 1 | timestamp |
| doc | partitioned_table | id | 2 | long |
| doc | partitioned_table | title | 3 | string |
| doc | quotes | id | 1 | integer |
| doc | quotes | quote | 2 | string |
+-------------+-------------------+------------------+------------------+--------------+
SELECT 17 rows in set (... sec)
You can even query this tables’ own columns (attention: this might lead to infinite recursion of your mind, beware!):
cr> select column_name, data_type, ordinal_position from information_schema.columns
... where schema_name = 'information_schema' and table_name = 'columns' order by ordinal_position asc;
+------------------+-----------+------------------+
| column_name | data_type | ordinal_position |
+------------------+-----------+------------------+
| schema_name | string | 1 |
| table_name | string | 2 |
| column_name | string | 3 |
| ordinal_position | short | 4 |
| data_type | string | 5 |
+------------------+-----------+------------------+
SELECT 5 rows in set (... sec)
Note
Columns at Crate are always sorted alphabetically in ascending order despite in which order they were defined on table creation. Thus the ordinal_position reflects the alphabetical position.
This table can be queries to get a list of all defined table constraints, their type, name and which table they are defined in.
Note
Currently only PRIMARY_KEY constraints are supported.
cr> select * from information_schema.table_constraints
... where table_name not like 'my_table%'
... order by schema_name desc, table_name desc limit 10;
+--------------------+-------------------+------------------------------------------------------------+-----------------+
| schema_name | table_name | constraint_name | constraint_type |
+--------------------+-------------------+------------------------------------------------------------+-----------------+
| sys | shards | [u'schema_name', u'table_name', u'id', u'partition_ident'] | PRIMARY_KEY |
| sys | nodes | [u'id'] | PRIMARY_KEY |
| information_schema | tables | [u'schema_name', u'table_name'] | PRIMARY_KEY |
| information_schema | columns | [u'schema_name', u'table_name', u'column_name'] | PRIMARY_KEY |
| doc | quotes | [u'id'] | PRIMARY_KEY |
| doc | partitioned_table | [u'_id'] | PRIMARY_KEY |
| doc | locations | [u'id'] | PRIMARY_KEY |
| doc | documents | [u'_id'] | PRIMARY_KEY |
| blob | myblobs | [u'digest'] | PRIMARY_KEY |
+--------------------+-------------------+------------------------------------------------------------+-----------------+
SELECT 9 rows in set (... sec)
This table can be queried to get information about all partitioned tables, Each partition of a table is represented as one row. The row contains the information table name, schema name, partition ident, and the values of the partition. values is a key-value object with the ‘partitioned by column’ as key(s) and the corresponding value as value(s).
For further information see Partitioned Tables.
The following example shows a table where the column ‘content’ of table ‘a_partitioned_table’ has been used to partition the table. The table has two partitions. The partitions are introduced when data is inserted where ‘content’ is ‘content_a’, and ‘content_b’:
cr> select * from information_schema.table_partitions
... order by table_name, partition_ident
+---------------------+-------------+--------------------+----------------------------+
| table_name | schema_name | partition_ident | values |
+---------------------+-------------+--------------------+----------------------------+
| a_partitioned_table | doc | 04566rreehimst2vc4 | {u'content': u'content_a'} |
| a_partitioned_table | doc | 04566rreehimst2vc8 | {u'content': u'content_b'} |
+---------------------+-------------+--------------------+----------------------------+
SELECT 2 rows in set (... sec)
Note
currently not implemented
This table can be queried to get a list of all defined indices of all columns and their definition like index method, expression list and property list. Using a plain index for every column is the default behaviour at Crate, so almost all columns are listed as an index as well:
cr> select * from information_schema.indices
... where table_name not like 'my_table%' order by table_name asc, index_name asc; #doctest: +SKIP
+------------+---------------------+----------+---------------------------+------------------+
| table_name | index_name | method | columns | properties |
+------------+---------------------+----------+---------------------------+------------------+
| documents | body | plain | [u'body'] | |
| documents | title | plain | [u'title'] | |
| documents | title_body_ft | fulltext | [u'body', u'title'] | analyzer=english |
| locations | date | plain | [u'date'] | |
| locations | description | plain | [u'description'] | |
| locations | id | plain | [u'id'] | |
| locations | kind | plain | [u'kind'] | |
| locations | name | plain | [u'name'] | |
| locations | name_description_ft | fulltext | [u'description', u'name'] | analyzer=english |
| locations | position | plain | [u'position'] | |
| locations | race | plain | [u'race'] | |
| quotes | id | plain | [u'id'] | |
| quotes | quote | plain | [u'quote'] | |
+------------+---------------------+----------+---------------------------+------------------+
SELECT 13 rows in set (... sec)
The routines table contains all custom analyzers, tokenizers, token-filters and char-filters and all custom analyzers created by CREATE ANALYZER statements (see Create custom analyzer):
cr> select routine_name, routine_type from information_schema.routines
... group by routine_name, routine_type order by routine_name asc limit 5;
+----------------------+--------------+
| routine_name | routine_type |
+----------------------+--------------+
| arabic | ANALYZER |
| arabic_normalization | TOKEN_FILTER |
| arabic_stem | TOKEN_FILTER |
| armenian | ANALYZER |
| asciifolding | TOKEN_FILTER |
+----------------------+--------------+
SELECT 5 rows in set (... sec)
For example you can use this table to list existing tokenizers like this:
cr> select routine_name from information_schema.routines
... where routine_type='TOKENIZER'
... order by routine_name asc limit 10;
+----------------+
| routine_name |
+----------------+
| classic |
| e2_mypattern |
| edgeNGram |
| edge_ngram |
| keyword |
| letter |
| lowercase |
| nGram |
| ngram |
| path_hierarchy |
+----------------+
SELECT 10 rows in set (... sec)
Or get an overview of how many routines and routine types are available:
cr> select count(*), routine_type from information_schema.routines
... group by routine_type order by routine_type;
+----------+--------------+
| count(*) | routine_type |
+----------+--------------+
| 45 | ANALYZER |
| 5 | CHAR_FILTER |
| 14 | TOKENIZER |
| 46 | TOKEN_FILTER |
+----------+--------------+
SELECT 4 rows in set (... sec)