pola-rs / polars-cli

CLI interface for running SQL queries with Polars as backend

Home Page:https://pola.rs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cross join is treated as inner join

l1t1 opened this issue · comments

commented

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of the Polars CLI.

Reproducible example

〉select count(*) from read_parquet('slow3.parquet');
┌────────┐
│ count  │
│ ---    │
│ u32    │
╞════════╡
│ 100000 │
└────────┘
〉select count(*) from read_parquet('slow3.parquet') t1,read_parquet('slow3.parquet') t2;
┌────────┐
│ count  │
│ ---    │
│ u32    │
╞════════╡
│ 100000 │
└────────┘
〉select count(*) from read_parquet('slow3.parquet') t1 cross join read_parquet('slow3.parquet') t2;
Error: cross joins would produce more rows than fits into 2^32; consider compiling with polars-big-idx feature, or set 'streaming'


### Issue description

the second sql should return 10000000000, but returns 10000
the third sql  should return 10000000000 too

### Expected behavior

the second sql and the third sql both return 10000000000

### Installed version

0.6.0

Could you make a minimal reproducible example, e.g. without reading parquet files? I tried reproducing this on the latest Polars main branch from Python but am unable to do so:

import polars as pl

df1 = pl.DataFrame({"a": [1, 1], "b": [3, 4]})
df2 = pl.DataFrame({"a": [1, 2], "c": [5, 6]})

result = df1.join(df2, how="cross")
print(result)

sql = pl.SQLContext({"df1": df1, "df2": df2})
result = sql.execute("select * from df1 cross join df2;", eager=True)
print(result)  # same result
commented

use your example, see the result of duckdb

>>> result = sql.execute("select * from df1, df2;", eager=True)
>>> print(result)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 13   │
│ 14   │
└─────┴─────┘
>>> import pandas as pd
>>> import duckdb as dd
>>> dd.sql("select * from df1, df2;")
┌───────┬───────┬───────┬───────┐
│   a   │   b   │   a   │   c   │
│ int64 │ int64 │ int64 │ int64 │
├───────┼───────┼───────┼───────┤
│     1315 │
│     1326 │
│     1415 │
│     1426 │
└───────┴───────┴───────┴───────┘

Right, closing as a duplicate then.

commented

still returns wrong result in version 10. 20.31