sryza / aas

Code to accompany Advanced Analytics with Spark from O'Reilly Media

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

running p68 of book takes very long time ????

sanderlenselink opened this issue · comments

Hi Sean,
during the Datanomics event in Berlin (november 2015) you gave me your book "Advanced Analytics with Spark" as a gift. In the meantime I'm studying the chapters and I like the stuff very much!!!!. At this moment I try to execute the code on page 68 (printed book). However, my computer is already running for an hour but still not finished.

Of course the performance of a PC depends of how much memory is in it . . . but is it normal that the program on p68 take so much time? If not, any suggestion (except run to the shop to buy more memory).

Regards

-Sander
screenshot from 2016-08-23 15-00-31

Hi Sean,

grrrrrr . . . I experienced that regarding the code on p68 I have to add the line e.g "metrics.confusionMatrix" on p69.

Maybe this is helpful for others (it wasn't clear for me but is due to my shortage regarding Spark. However therefore I follow the the examples in the book -:) )

-Sander

Training the decision tree could take some time on a laptop. I wouldn't expect an hour though. Evaluating metrics should be fairly fast and printing the confusion matrix immediate. Are you saying you were just looking for the confusion matrix and then found the command to execute? This will change in any event in the 2nd edition.

Hi Sean,

grrrrrr . . . I experienced that regarding the code on p68 I have to add the line e.g "metrics.confusionMatrix" on p69.

Maybe this is helpful for others (it wasn't clear for me but is due to my shortage regarding Spark. However therefore I follow the the examples in the book -:) )

-Sander

What I was trying to say is that you have to add the command "metrics.confusionMatrix" or metrics.precision (see p69) to the code on p68.
This wasn't clear for me (due to my limit know how of Spark)

Those lines of code appear on page 69 though, along with their output. They don't have to be entered into the shell with the code on line 68, but could be entered after it at any time.