data-8 / textbook

The textbook Computational and Inferential Thinking: The Foundations of Data Science

Home Page:http://www.inferentialthinking.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cannot run ch 1 code in colab: Table' object has no attribute 'cumsum'

murphyk opened this issue · comments

Hi. I don't have a Berkeley account (so cannot use your jupyterhub) and binder is way too slow.
So I tried to run your code in Google colab. I added %pip install datascience to the top of your notebook and could run
sec 1.3.0 but get the error below when running sec 1.3.1.

AttributeError                            Traceback (most recent call last)
[<ipython-input-13-f9eef04bf6f8>](https://localhost:8080/#) in <cell line: 12>()
     10 # how many times in Chapter 1, how many times in Chapters 1 and 2, and so on.
     11 
---> 12 cum_counts = counts.cumsum().with_column('Chapter', np.arange(1, 44, 1))
     13 cum_counts.plot(column_for_xticks=3)
     14 plots.title('Cumulative Number of Times Each Name Appears', y=1.08);
<img width="746" alt="Screenshot 2023-05-12 at 5 16 24 PM" src="https://github.com/data-8/textbook/assets/4632336/242e7d6e-e5d2-44cb-a79e-49c1e2f1f2a3">


AttributeError: 'Table' object has no attribute 'cumsum'

Colab to reproduce the problem is here.
See screenshot below

Screenshot 2023-05-12 at 5 16 24 PM

I am experiencing the same issue. I think maybe this PR in the datascience package created the issue by removing Table.__getattr__. Its previous existence had made it possible to call cumsum() in the code above. As a workaround, reverting to datascience v0.17.5 works for me and makes the above code compile.

Yes, that seems to fix the problem.

%pip install datascience==0.17.5 

from datascience import *

counts = Table().with_columns([
        'A', [1,2,3],
        'B', [10, 20,30]
    ])
print(counts)
print(counts.cumsum()) # fails on v0.17.6


import numpy as np

def convert_to_np(counts):
  C = np.stack((counts['A'], counts['B']), axis=0).T
  return C

C = convert_to_np(counts)
assert np.alltrue(C.shape == (3,2))

CS_expected = np.cumsum(C, axis=0)
CS_obs = convert_to_np(counts.cumsum())
assert np.alltrue(CS_expected == CS_obs)

Hi folks, you're right that the linked PR removed the Table#cumsum() functionality. If I get some time, I'll ry to propose a PR for the textbook, but please feel free to suggest a PR and tag me for review here!