Could not create or read partition table

Question

Could not create or read partition table

smallzhongfeng opened this issue a year ago · comments

Describe the bug
After the partition table is created, it cannot be read normally

To Reproduce

echo "1,2" > tmp/year=2022/data.csv
echo "3,4" > tmp/year=2021/data.csv

run in ballista-cli


❯ CREATE EXTERNAL TABLE t2 (a INT, b INT) STORED AS CSV PARTITIONED BY (year) LOCATION 'tmp';
ArrowError(SchemaError("Unable to get field named \"year\". Valid fields: [\"a\", \"b\"]"))

I deployed it in standalone mode.

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

jokercurry · Answer 1 · Tue Aug 08 2023 11:12:02 GMT+0800 (China Standard Time)

I deployed it using the latest online version, and the client is also the latest version 0.11.0

jokercurry · Answer 2 · Tue Aug 08 2023 11:12:59 GMT+0800 (China Standard Time)

@thinkharderdev @yahoNanJing @Dandandan Have you ever encountered similar problems? Could you guys give me some advice

jokercurry · Answer 3 · Tue Aug 08 2023 14:43:03 GMT+0800 (China Standard Time)

Similar issue like this: #747

jokercurry · Answer 4 · Tue Aug 08 2023 22:25:03 GMT+0800 (China Standard Time)

use datafusion::arrow::datatypes::DataType;
use datafusion::datasource::file_format::parquet::DEFAULT_PARQUET_EXTENSION;
use ballista::prelude::{BallistaConfig, BallistaContext, Result};
use datafusion::prelude::{CsvReadOptions, ParquetReadOptions, SessionContext};

#[tokio::main]
async fn main() -> Result<()> {
    let config = BallistaConfig::builder()
        .set("ballista.shuffle.partitions", "1")
        .build()?;

    let ctx = BallistaContext::standalone(&config, 2).await?;

    let options = ParquetReadOptions {
        file_extension: DEFAULT_PARQUET_EXTENSION,
        table_partition_cols: vec![("date".to_string(), DataType::Utf8)],
        parquet_pruning: Some(false),
        skip_metadata: Some(true),
    };
    let path= format!("tmp");

    let arc = ctx.read_parquet(&path, options).await?;
    println!("{}", arc.schema());
    arc.clone().select_columns(&["String", "date"]).unwrap();
    arc.clone().show().await?;
    Ok(())
}

This case also fail, so is it currently not supported to create a partition table?

yahoNanJing · Answer 5 · Fri Aug 11 2023 14:26:21 GMT+0800 (China Standard Time)

Hi @smallzhongfeng, I'll take a look at this issue in this week.

jokercurry · Answer 6 · Fri Aug 11 2023 15:40:19 GMT+0800 (China Standard Time)

Thank you for your reply. @yahoNanJing At present, my guess is that the partition field is treated as an ordinary field, resulting in an error when the schema is matched.

jokercurry · Answer 7 · Mon Aug 21 2023 12:08:03 GMT+0800 (China Standard Time)

Any update ?

André Claudino · Answer 8 · Fri Oct 20 2023 00:38:44 GMT+0800 (China Standard Time)

It looks the partitions are ignored, and the files inside are not loaded. Is there any update on how to deal that?

Brandon McMillan · Answer 9 · Wed Jan 31 2024 00:56:39 GMT+0800 (China Standard Time)

Any update?