Copy data from files into a table.
COPY table_ident FROM uri [ WITH ( option = value [, ...] ) ]
where option can be one of:
COPY FROM copies data from a URI to the specified table.
The nodes in the cluster will attempt to access the resources available under the URI and import the data.
The file(s) must contain one JSON formatted row per line.
For examples see: Importing data.
The default scheme used if the scheme isn’t part of the used URI.
The filepath given must be an absolute path and accessable by the crate process.
By default each node will attempt to read the files specified. If the URI points to a shared folder the shared option must be set to true in order to avoid duplicate imports.
Can be used to access buckets on the Amazon AWS S3 Service:
s3://[<accesskey>:<secretkey>]@<bucketname>/<path>
If the accesskey and secret key is ommited an attempt to load the credentials from the environment or java settings is made.
Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY Java System Properties - aws.accessKeyId and aws.secretKey
Using a s3 URI will set the shared option implicitly.
table_ident: | The name (optionally schema-qualified) of an existing table where the data should be put. |
---|---|
uri: | An expression which evaluates to a uri as defined in RFC2396. The supported schemes are listed above. The last part of the path may also contain * wildcards to match multiple files. |
The optional WITH clause can specify options for the COPY FROM statement.
[ WITH ( option = value [, ...] ) ]
Crate will process the lines it reads from the path in bulks. This option specifies the size of such a bulk. The default is 10000.
Must be set to a number greater than 0
Warning
A bulk_size that is too high will cause a very high load on the cluster and some lines that should be inserted might even be dropped.
The number of parallel bulk actions that should be executed. Default is 4. Must be set to a number greater than 0.
Warning
Similar to a high bulk_size a high concurrency setting will cause a very high load on the cluster.
The number of nodes that will read the resources specified in the URI. Defaults to the number of nodes available in the cluster. If the option is set to a number greater than the number of available nodes it will still only use each node only once to do the import.
Must be an integer that is greater than 0.
The default value is null. Can be set to gzip to read gzipped files.