crate / crate

CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.

Home Page:https://cratedb.com/product

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

COPY FROM does not work on all files inside folder

jphjsoares opened this issue · comments

CrateDB version

4.6

CrateDB setup information

Number of nodes: 1

Problem description

According to crate documentation, the following syntax is used to import all files inside a directory: COPY quotes FROM '/tmp/import_data/qu*.json'; but unfortunately I was unable to use that query when trying to import files that were exported by cratedb as well.
When instead of providing a directory with *, a filename is specified, the COPY works as expected

Steps to Reproduce

  1. Export data from a table I want to copy, into a directory, using COPY TO DIRECTORY
  2. Import that data into a new table, with the same schema using COPY FROM

Actual Result

io.crate.exceptions.SQLParseException: Illegal char <*> at index 9: /test/*
	at io.crate.exceptions.SQLExceptions.esToCrateException(SQLExceptions.java:137)
	at io.crate.exceptions.SQLExceptions.prepareForClientTransmission(SQLExceptions.java:126)
	at io.crate.rest.action.SqlHttpHandler.sendResponse(SqlHttpHandler.java:161)
	at io.crate.rest.action.SqlHttpHandler.lambda$channelRead0$0(SqlHttpHandler.java:117)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:883)
	at java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2321)
	at io.crate.rest.action.SqlHttpHandler.channelRead0(SqlHttpHandler.java:115)
	at io.crate.rest.action.SqlHttpHandler.channelRead0(SqlHttpHandler.java:78)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.crate.protocols.http.HttpBlobHandler.channelRead0(HttpBlobHandler.java:166)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.crate.auth.HttpAuthUpstreamHandler.handleHttpChunk(HttpAuthUpstreamHandler.java:135)
	at io.crate.auth.HttpAuthUpstreamHandler.channelRead0(HttpAuthUpstreamHandler.java:84)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at org.elasticsearch.http.netty4.cors.Netty4CorsHandler.channelRead(Netty4CorsHandler.java:85)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at java.base/java.lang.Thread.run(Thread.java:831)
Caused by: java.nio.file.InvalidPathException: Illegal char <*> at index 9: /test/*
	at java.base/sun.nio.fs.WindowsPathParser.normalize(WindowsPathParser.java:182)
	at java.base/sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:153)
	at java.base/sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:77)
	at java.base/sun.nio.fs.WindowsPath.parse(WindowsPath.java:92)
	at java.base/sun.nio.fs.WindowsFileSystem.getPath(WindowsFileSystem.java:230)
	at java.base/java.nio.file.Path.of(Path.java:147)
	at java.base/java.nio.file.Paths.get(Paths.java:69)
	at io.crate.execution.engine.collect.files.FileReadingIterator.toURI(FileReadingIterator.java:350)
	at io.crate.execution.engine.collect.files.FileReadingIterator.getUrisWithGlob(FileReadingIterator.java:311)
	at io.crate.execution.engine.collect.files.FileReadingIterator.<init>(FileReadingIterator.java:105)
	at io.crate.execution.engine.collect.files.FileReadingIterator.newInstance(FileReadingIterator.java:132)
	at io.crate.execution.engine.collect.sources.FileCollectSource.getIterator(FileCollectSource.java:80)
	at io.crate.execution.engine.collect.sources.ProjectorSetupCollectSource.getIterator(ProjectorSetupCollectSource.java:49)
	at io.crate.execution.engine.collect.MapSideDataCollectOperation.createIterator(MapSideDataCollectOperation.java:58)
	at io.crate.execution.engine.collect.CollectTask.start(CollectTask.java:162)
	at io.crate.execution.jobs.RootTask.start(RootTask.java:191)
	at io.crate.execution.engine.JobLauncher.setupTasks(JobLauncher.java:240)
	at io.crate.execution.engine.JobLauncher.execute(JobLauncher.java:153)
	at io.crate.planner.statement.CopyFromPlan.executeOrFail(CopyFromPlan.java:123)
	at io.crate.planner.Plan.execute(Plan.java:73)
	at io.crate.action.sql.Session.singleExec(Session.java:678)
	at io.crate.action.sql.Session.exec(Session.java:541)
	at io.crate.action.sql.Session.triggerDeferredExecutions(Session.java:515)
	at io.crate.action.sql.Session.sync(Session.java:499)
	at io.crate.rest.action.SqlHttpHandler.executeSimpleRequest(SqlHttpHandler.java:266)
	at io.crate.rest.action.SqlHttpHandler.handleSQLRequest(SqlHttpHandler.java:205)
	at io.crate.rest.action.SqlHttpHandler.channelRead0(SqlHttpHandler.java:114)
	... 49 more

Expected Result

All the data should be imported correctly from all files and the * should work as expectedly.

commented

Hi @jphjsoares, thanks for reporting! Looks like Windows specific bug.

I forgot to mention it, but I'm also unable to copy all the data from one table to the other by using the good and old: INSERT INTO table2 (SELECT * FROM table1). The command will copy the shards, but Crate reports 0 total records. Does that also have to do with this bug or should I create a separate issue?

I forgot to mention it, but I'm also unable to copy all the data from one table to the other by using the good and old: INSERT INTO table2 (SELECT * FROM table1). The command will copy the shards, but Crate reports 0 total records. Does that also have to do with this bug or should I create a separate issue?

Yes please, with details of SHOW CREATE TABLE for both tables and any error you get in the logs.
Keep in mind, that before opening a concrete github issue, you can also use our community portal: https://community.cratedb.com/

Keep in mind we have an open issue to address this scenario of INSERT INTO ... SELECT where no error is returned, just number of inserted rows: #12218

Sure, here it is:
You can see that the only difference is in the shards. It was an attempt to migrate to a table with less shards.
The query is executed with success, but 0 records are actually copied. Only shards.

CREATE TABLE IF NOT EXISTS "my_schema"."test" (
   "test_int" INTEGER,
   "test_string" TEXT,
   "timestamp" TIMESTAMP WITH TIME ZONE,
   "edate" TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS date_trunc('week', "timestamp")
)
CLUSTERED INTO 4 SHARDS
PARTITIONED BY ("timestamp")
WITH (
   "allocation.max_retries" = 5,
   "blocks.metadata" = false,
   "blocks.read" = false,
   "blocks.read_only" = false,
   "blocks.read_only_allow_delete" = false,
   "blocks.write" = false,
   codec = 'default',
   column_policy = 'strict',
   "mapping.total_fields.limit" = 1000,
   max_ngram_diff = 1,
   max_shingle_diff = 3,
   number_of_replicas = '0-1',
   "routing.allocation.enable" = 'all',
   "routing.allocation.total_shards_per_node" = -1,
   "store.type" = 'fs',
   "translog.durability" = 'REQUEST',
   "translog.flush_threshold_size" = 536870912,
   "translog.sync_interval" = 5000,
   "unassigned.node_left.delayed_timeout" = 60000,
   "write.wait_for_active_shards" = '1'
)

CREATE TABLE IF NOT EXISTS "my_schema"."test2" (
   "test_int" INTEGER,
   "test_string" TEXT,
   "timestamp" TIMESTAMP WITH TIME ZONE,
   "edate" TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS date_trunc('week', "timestamp")
)
CLUSTERED INTO 1 SHARDS
PARTITIONED BY ("timestamp")
WITH (
   "allocation.max_retries" = 5,
   "blocks.metadata" = false,
   "blocks.read" = false,
   "blocks.read_only" = false,
   "blocks.read_only_allow_delete" = false,
   "blocks.write" = false,
   codec = 'default',
   column_policy = 'strict',
   "mapping.total_fields.limit" = 1000,
   max_ngram_diff = 1,
   max_shingle_diff = 3,
   number_of_replicas = '0-1',
   "routing.allocation.enable" = 'all',
   "routing.allocation.total_shards_per_node" = -1,
   "store.type" = 'fs',
   "translog.durability" = 'REQUEST',
   "translog.flush_threshold_size" = 536870912,
   "translog.sync_interval" = 5000,
   "unassigned.node_left.delayed_timeout" = 60000,
   "write.wait_for_active_shards" = '1'
)
commented

@jphjsoares thanks for the additional information.

Since it's not related to COPY FROM, please create a new issue with the table schemas you posted above. Also, please post there a result of select count(*) from my_schema.test2

Asking because not sure what you mean by

The command will copy the shards, but Crate reports 0 total records

So you are saying that INSERT works but reports incorrect number of inserted records or you meant something different? In any case, please elaborate in the new dedicated issue.

Thanks!

Just created it here, hopefully I was able to explain it a little bit better there 😄

commented

Hi @jphjsoares, I could reproduce on a Windows machine but the problem can be resolved by applying a tip from the docs

Tip
If you are using Microsoft Windows, you must include the drive letter in the file URI.
For example:
file://C:/tmp/import_data/quotes.json

copy quotes from 'file:///C:/full/path/te*.json' ; worked for me locally. Could you please try it?

commented

ftr: We do have tests covering such scenario and we even used to run Windows tests on CI in the past.
Tests passes on Windows since we don't use path as a string but compute it like getResource("test_file").toURI()... and this gets resolved to a correct path with drive, as hinted in docs.

commented

Closing this since I have tested documented hint and it worked on a Windows machine (see comment above).