h2oai / datatable

A Python package for manipulating 2-dimensional tabular data structures

Home Page:https://datatable.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Created zero-column frame has wrong number of rows

hallmeier opened this issue · comments

from datatable import dt

dt.Frame([()])
#    |
#    |
# -- +
# [0 rows x 0 columns]
dt.Frame([{}])
#    |
#    |
# -- +
# [0 rows x 0 columns]

Expected behavior:

dt.Frame([()])
#    |
#    |
# -- +
#  0 |
# [1 row x 0 columns]
dt.Frame([{}])
#    |
#    |
# -- +
#  0 |
# [1 row x 0 columns]

Well, you pass 1 column and 0 rows to dt.Frame(), so I'm not sure why you expect datatable to create 0 columns and 1 row. Note, that datatable frame is a column oriented container of data: https://datatable.readthedocs.io/en/latest/api/frame.html

The result is still wrong, but I would say that the expected behavior in the both cases should be

   |   C0
   | void
-- + ----
[0 rows x 1 column]

At least, this is what happens for the empty list and seems reasonable

>>> dt.Frame([[]])
   |   C0
   | void
-- + ----
[0 rows x 1 column]

When creating a frame from a list of lists, each list in the list marks a column, so this behavior is correct. But when creating a frame from a list of tuples or a list of dicts, each tuple/dict marks a row, so I am passing 0 columns and 1 row. See these examples:

>>> dt.Frame([(0,), (0,)])
   |   C0
   | int8
-- + ----
 0 |    0
 1 |    0
[2 rows x 1 column]
>>> dt.Frame([{"A": 0}, {"A": 0}])
   |    A
   | int8
-- + ----
 0 |    0
 1 |    0
[2 rows x 1 column]

>>> dt.Frame([0])[:, f[[]]].to_tuples()
[()]

You are right, but at the same time in docs we say

When the source is a non-empty list containing other lists or compound objects, then each item will be interpreted as a column initializer, and the resulting frame will have as many columns as the number of items in the list.

My feeling is that we need to review this part of functionality/docs to make it consistent. And obviously fixing the bug you have discovered.