fastai / courses

fast.ai Courses

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ml1 notebooks improvements

stas00 opened this issue · comments

I'm not sure how to contribute to the course notebook improvements.

Here are a few corrections/improvements:

  1. courses/ml1/lesson1-rf.ipynb

replace:

*todo* define r^2

with:

In statistics, the coefficient of determination, denoted R2 or r2 and pronounced "R squared", is the proportion of the variance in the dependent variable that is predictable from the independent variable(s). https://en.wikipedia.org/wiki/Coefficient_of_determination

  1. before:

df_raw.UsageBand = df_raw.UsageBand.cat.codes

add explanation:

"Normally pandas will continue displaying the text categories, while treating them as numerical data internally. Optionally we can replace the text categories with numbers, which will make this variable non-categorical, like so:"

  1. courses/ml1/lesson2-rf_interpretation.ipynb

In "One-hot section encoding" there is no explanation of what it does. Here is my attempt to explain:

"Using proc_df's *max_n_cat* argument we can turn some categorical variables into new columns, 
where MyCol with categories (small, medium, large) will turn into 3 new one-hot encoded columns 
MyCol_small, MyCol_medium, MyCol_large (removing the original one).

It will only happen to columns whose number of categories is no more than max_n_cat.

Now we may have columns with more important features than they were earlier where all categories were in one column."
  1. courses/ml1/lesson3-rf_foundations.ipynb

a small fix here:
http://forums.fast.ai/t/another-treat-early-access-to-intro-to-machine-learning-videos/6826/615

Thanks.

If there is a better way to do it please let me know how (link?)

I'm currently trying to figure out the use of nbdime for notebook diff patches. I'll post more once the nbdime's developer has all the issues resolved.