COPY FROM does not work on all files inside folder
jphjsoares opened this issue · comments
CrateDB version
4.6
CrateDB setup information
Number of nodes: 1
Problem description
According to crate documentation, the following syntax is used to import all files inside a directory: COPY quotes FROM '/tmp/import_data/qu*.json';
but unfortunately I was unable to use that query when trying to import files that were exported by cratedb as well.
When instead of providing a directory with *, a filename is specified, the COPY
works as expected
Steps to Reproduce
- Export data from a table I want to copy, into a directory, using
COPY TO DIRECTORY
- Import that data into a new table, with the same schema using
COPY FROM
Actual Result
io.crate.exceptions.SQLParseException: Illegal char <*> at index 9: /test/*
at io.crate.exceptions.SQLExceptions.esToCrateException(SQLExceptions.java:137)
at io.crate.exceptions.SQLExceptions.prepareForClientTransmission(SQLExceptions.java:126)
at io.crate.rest.action.SqlHttpHandler.sendResponse(SqlHttpHandler.java:161)
at io.crate.rest.action.SqlHttpHandler.lambda$channelRead0$0(SqlHttpHandler.java:117)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:883)
at java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2321)
at io.crate.rest.action.SqlHttpHandler.channelRead0(SqlHttpHandler.java:115)
at io.crate.rest.action.SqlHttpHandler.channelRead0(SqlHttpHandler.java:78)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.crate.protocols.http.HttpBlobHandler.channelRead0(HttpBlobHandler.java:166)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.crate.auth.HttpAuthUpstreamHandler.handleHttpChunk(HttpAuthUpstreamHandler.java:135)
at io.crate.auth.HttpAuthUpstreamHandler.channelRead0(HttpAuthUpstreamHandler.java:84)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at org.elasticsearch.http.netty4.cors.Netty4CorsHandler.channelRead(Netty4CorsHandler.java:85)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:831)
Caused by: java.nio.file.InvalidPathException: Illegal char <*> at index 9: /test/*
at java.base/sun.nio.fs.WindowsPathParser.normalize(WindowsPathParser.java:182)
at java.base/sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:153)
at java.base/sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:77)
at java.base/sun.nio.fs.WindowsPath.parse(WindowsPath.java:92)
at java.base/sun.nio.fs.WindowsFileSystem.getPath(WindowsFileSystem.java:230)
at java.base/java.nio.file.Path.of(Path.java:147)
at java.base/java.nio.file.Paths.get(Paths.java:69)
at io.crate.execution.engine.collect.files.FileReadingIterator.toURI(FileReadingIterator.java:350)
at io.crate.execution.engine.collect.files.FileReadingIterator.getUrisWithGlob(FileReadingIterator.java:311)
at io.crate.execution.engine.collect.files.FileReadingIterator.<init>(FileReadingIterator.java:105)
at io.crate.execution.engine.collect.files.FileReadingIterator.newInstance(FileReadingIterator.java:132)
at io.crate.execution.engine.collect.sources.FileCollectSource.getIterator(FileCollectSource.java:80)
at io.crate.execution.engine.collect.sources.ProjectorSetupCollectSource.getIterator(ProjectorSetupCollectSource.java:49)
at io.crate.execution.engine.collect.MapSideDataCollectOperation.createIterator(MapSideDataCollectOperation.java:58)
at io.crate.execution.engine.collect.CollectTask.start(CollectTask.java:162)
at io.crate.execution.jobs.RootTask.start(RootTask.java:191)
at io.crate.execution.engine.JobLauncher.setupTasks(JobLauncher.java:240)
at io.crate.execution.engine.JobLauncher.execute(JobLauncher.java:153)
at io.crate.planner.statement.CopyFromPlan.executeOrFail(CopyFromPlan.java:123)
at io.crate.planner.Plan.execute(Plan.java:73)
at io.crate.action.sql.Session.singleExec(Session.java:678)
at io.crate.action.sql.Session.exec(Session.java:541)
at io.crate.action.sql.Session.triggerDeferredExecutions(Session.java:515)
at io.crate.action.sql.Session.sync(Session.java:499)
at io.crate.rest.action.SqlHttpHandler.executeSimpleRequest(SqlHttpHandler.java:266)
at io.crate.rest.action.SqlHttpHandler.handleSQLRequest(SqlHttpHandler.java:205)
at io.crate.rest.action.SqlHttpHandler.channelRead0(SqlHttpHandler.java:114)
... 49 more
Expected Result
All the data should be imported correctly from all files and the * should work as expectedly.
Hi @jphjsoares, thanks for reporting! Looks like Windows specific bug.
I forgot to mention it, but I'm also unable to copy all the data from one table to the other by using the good and old: INSERT INTO table2 (SELECT * FROM table1)
. The command will copy the shards, but Crate reports 0 total records. Does that also have to do with this bug or should I create a separate issue?
I forgot to mention it, but I'm also unable to copy all the data from one table to the other by using the good and old:
INSERT INTO table2 (SELECT * FROM table1)
. The command will copy the shards, but Crate reports 0 total records. Does that also have to do with this bug or should I create a separate issue?
Yes please, with details of SHOW CREATE TABLE
for both tables and any error you get in the logs.
Keep in mind, that before opening a concrete github issue, you can also use our community portal: https://community.cratedb.com/
Keep in mind we have an open issue to address this scenario of INSERT INTO ... SELECT
where no error is returned, just number of inserted rows: #12218
Sure, here it is:
You can see that the only difference is in the shards. It was an attempt to migrate to a table with less shards.
The query is executed with success, but 0 records are actually copied. Only shards.
CREATE TABLE IF NOT EXISTS "my_schema"."test" (
"test_int" INTEGER,
"test_string" TEXT,
"timestamp" TIMESTAMP WITH TIME ZONE,
"edate" TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS date_trunc('week', "timestamp")
)
CLUSTERED INTO 4 SHARDS
PARTITIONED BY ("timestamp")
WITH (
"allocation.max_retries" = 5,
"blocks.metadata" = false,
"blocks.read" = false,
"blocks.read_only" = false,
"blocks.read_only_allow_delete" = false,
"blocks.write" = false,
codec = 'default',
column_policy = 'strict',
"mapping.total_fields.limit" = 1000,
max_ngram_diff = 1,
max_shingle_diff = 3,
number_of_replicas = '0-1',
"routing.allocation.enable" = 'all',
"routing.allocation.total_shards_per_node" = -1,
"store.type" = 'fs',
"translog.durability" = 'REQUEST',
"translog.flush_threshold_size" = 536870912,
"translog.sync_interval" = 5000,
"unassigned.node_left.delayed_timeout" = 60000,
"write.wait_for_active_shards" = '1'
)
CREATE TABLE IF NOT EXISTS "my_schema"."test2" (
"test_int" INTEGER,
"test_string" TEXT,
"timestamp" TIMESTAMP WITH TIME ZONE,
"edate" TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS date_trunc('week', "timestamp")
)
CLUSTERED INTO 1 SHARDS
PARTITIONED BY ("timestamp")
WITH (
"allocation.max_retries" = 5,
"blocks.metadata" = false,
"blocks.read" = false,
"blocks.read_only" = false,
"blocks.read_only_allow_delete" = false,
"blocks.write" = false,
codec = 'default',
column_policy = 'strict',
"mapping.total_fields.limit" = 1000,
max_ngram_diff = 1,
max_shingle_diff = 3,
number_of_replicas = '0-1',
"routing.allocation.enable" = 'all',
"routing.allocation.total_shards_per_node" = -1,
"store.type" = 'fs',
"translog.durability" = 'REQUEST',
"translog.flush_threshold_size" = 536870912,
"translog.sync_interval" = 5000,
"unassigned.node_left.delayed_timeout" = 60000,
"write.wait_for_active_shards" = '1'
)
@jphjsoares thanks for the additional information.
Since it's not related to COPY FROM,
please create a new issue with the table schemas you posted above. Also, please post there a result of select count(*) from my_schema.test2
Asking because not sure what you mean by
The command will copy the shards, but Crate reports 0 total records
So you are saying that INSERT works but reports incorrect number of inserted records or you meant something different? In any case, please elaborate in the new dedicated issue.
Thanks!
Just created it here, hopefully I was able to explain it a little bit better there 😄
Hi @jphjsoares, I could reproduce on a Windows machine but the problem can be resolved by applying a tip from the docs
Tip
If you are using Microsoft Windows, you must include the drive letter in the file URI.
For example:
file://C:/tmp/import_data/quotes.json
copy quotes from 'file:///C:/full/path/te*.json' ;
worked for me locally. Could you please try it?
ftr: We do have tests covering such scenario and we even used to run Windows tests on CI in the past.
Tests passes on Windows since we don't use path as a string but compute it like getResource("test_file").toURI()...
and this gets resolved to a correct path with drive, as hinted in docs.
Closing this since I have tested documented hint and it worked on a Windows machine (see comment above).