amueller / COMS4995-s20

COMS W4995 Applied Machine Learning - Spring 20

Home Page:https://www.cs.columbia.edu/~amueller/comsw4995s20/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Best practices for comparing continuum data in a map

kaelgabriel opened this issue · comments

Hey, Andreas Mueller, could you share the "consensus" on best practices in response to this question, relative to the matplotlib lecture?

2.3 Can you find a better way to compare the two distributions? [10pts]

Thanks.

P.S: If there is a proper venue for asking these questions, that isn't the git:issues, I would gladly post there as well.

Hey! Here's probably good (there's also a piazza but that's internal to the actual class last spring).

There's no consensus, it's tricky to do and an open-ended question. You could potentially do contour maps in different colors, but I think small multiples (with hexbins or similar) is probably the best bet? You could also do splatter plots, but I haven't tried that. You could also partition into local regions and then use a glyph to show the distribution in each region? There's many options but not one answer.

An important take-away is that there's not necessarily a "best" plot for a particular dataset. There's better plots and worse plots but generally there's a trade-off between emphasizing different aspects of the data.

Ah, I see! I was hoping for an easy answer, but I think it is somewhat subjective on how to best approach this. I agree that using a glyph in each subset of the partition would be a good idea. By the way, I was pretty amazed by the quality of the glyphs in Eamonn Maguire's thesis, great stuff.

Thanks for your answer, Andreas.

Well one of the lessons about visualization (and many other things in the class) is that there's usually not one clear cut answer and it's important to consider the pros and cons of different approaches.