PhantomInsights / baby-names-analysis

Data ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

These are not baby names

Prooffreader opened this issue · comments

Social security started in 1935. That means those born in 1880 self-reported their names at 55 years old. This makes the database tremendously biased towards those rich enough to survive to 55 years old. It is also tremendously sex-biased, as only widows of professionals were eligibile at first. Working black women were not eligibile for social security until the 1960s.

None of this is your fault, it is the Social Security Administration's fault for calling this dataset a "baby names dataset", when the first babies were rich babies born in 1935. Of course, a little investigating on your part would show many, many anomalies in the data until about the 1970s. Look at the ratio of male to female "babies" over time; it's pretty constant for human births.