datacarpentry / python-ecology-lesson

Data Analysis and Visualization in Python for Ecologists

Home Page:https://datacarpentry.org/python-ecology-lesson

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

describe better joins

sheraaronhurt opened this issue · comments

How could the content be improved?

I think there is a significant opportunity to describe better joins here. Inner joins and left joins are described in detail. Right and full joins are defined in a smaller section. It would be helpful to describe all types of joins, including cross and self-join. I recommend adding:
First, cross join:
Cross-join generates the Cartesian product of the two data frames, producing all possible combinations of rows. The resulting product is a number of rows equal to the product of the number of rows in both data frames. While this join often results in large output, it can be helpful to see all the qualities of both tables, including each common and duplicate column.
Second, self-join. Self-join joins a data frame with itself. Self-join can be useful when you want to, for instance, compare records within the same dataset based on a given criteria.
Additionally, figures such as the ones shown here could be added: https://javarevisited.blogspot.com/2013/05/difference-between-left-and-right-outer-join-sql-mysql.html#axzz8PwmFS4FN

Submitted on behalf of trainee for checkout.

Which part of the content does your suggestion apply to?

https://datacarpentry.org/python-ecology-lesson/05-merging-data.html

Thank you and the trainee for taking time to engage with the curriculum and provide suggestions. With so much already coming at the learners, I am reluctant to of covering each of the join types in full detail, but I think it would improve things to expand the description of cross joins with the provided description as well as add self-joins to the list of other join types along with a link out to https://blog.devgenius.io/self-join-and-cross-join-in-pandas-dataframe-b30bfbc0e52a for additional discussion of those two types. If there is additionlal feed back or anything I need to do for the trainee to get credit, please let me know.