Misleading code [3.8.2]

Question

Misleading code [3.8.2]

OverLordGoldDragon opened this issue 4 years ago · comments

We can easily verify that the two tensors share the same storage
id(points.storage()) == id(points_t.storage())

is should be used, which returns False; try id(float(3)) == id(float(4)). See here.

Thomas Viehmann · Answer 1 · Sun Nov 08 2020 04:21:59 GMT+0800 (China Standard Time)

Hi. You are absolutely right that there is a problem here, tank you!
The issue is that there isn't a 1-1 map between the C++ storage (which would be the same) and the Python Storage objects (which are recreated with every storage call). So if anything is True here, it is due to the incorrect use of ==.
Incidentally, as far as I know, the longer term idea seems to be that PyTorch doesn't use a storage class at all (but instead has views of a tensor), but it's not a priority.

John Muradeli · Answer 2 · Sun Nov 08 2020 04:49:19 GMT+0800 (China Standard Time)

@t-vi The main point that two tensors share the same Storage is correct, but it's also important to not confuse that with identical Python id's; I think the existing examples of editing a storage and affecting both tensors do well enough to explain.

Otherwise a fine textbook you've made available freely, thank you.

John Muradeli · Answer 3 · Mon Nov 09 2020 08:09:01 GMT+0800 (China Standard Time)

@t-vi Also misleading statement in 5.5.2:

This only holds for the baby example and completely breaks down for larger models, so to not underscore importance of input normalization maybe add "... blink (in this simple case):"

Thomas Viehmann · Answer 4 · Mon Nov 09 2020 15:09:00 GMT+0800 (China Standard Time)

Well, so the key takeaway we wanted to have here is that Adam will automatically "equilibrate" the step size between the parameters while SGD will need you to do this manually. We could have made it clearer that it isn't universal that you can use this learning rate, but then we're deeply focusing on this example here.

John Muradeli · Answer 5 · Sun Nov 15 2020 20:23:09 GMT+0800 (China Standard Time)

@t-vi Done reading, ended at ch12. Likely typos:

8.2.4: MaxPool3d -> MaxPool2d
8.3.1: self.pool3 -> self.pool2
9.4.4: code/p2ch09_explore_data.ipynb -> p2ch10?
10.4.4: getRawNodule -> getRawCandidate
Listing 10.12: add # to 1((CO10-1)) and 1((CO10-2))? confused as part of code
11.1: lets* you move forward

Some feedback on part 2: I'd say the data pipeline complexity was a bit overkill, namely on caching. There wasn't really a need when one can simply extract candidates/nodules into own files and be simpler and faster. Good to know how-to, but among all other things the reader is supposed to learn this may add redundant overhead. Otherwise, quality material - especially in part 1, gradually adding layers of complexity starting at w*x + b.

Thomas Viehmann · Answer 6 · Sun Nov 15 2020 21:02:29 GMT+0800 (China Standard Time)

Heya, yeah. Between the three of us, we've had mixed feelings about the caching, too, and the main reason to have it is to provide the perspective that such thing exists. Obviously, some of us also have inclinations towards more targeted data tooling, e.g. hangar (video).
If you later read the remainder, I'd be keen to hear your feedback on the remainder, too, in particular ch14 and 15.

Thomas Viehmann · Answer 7 · Sun Nov 15 2020 21:32:15 GMT+0800 (China Standard Time)

And thank you for your very thorough and critical reading and sharing your comments. It is very interesting and, indeed, helpful.