datacarpentry / python-ecology-lesson

Data Analysis and Visualization in Python for Ecologists

Home Page:https://datacarpentry.org/python-ecology-lesson

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add a section to read and parse data into a DataFrame

anuradhawick opened this issue · comments

How could the content be improved?

The following section introduce how data can be processed using loops

Automating data processing using For Loops

I believe it would also be advantageous to have a similar section in the following

Reading CSV Data Using Pandas

Here we can briefly introduce python generators as well. For example, consider a CSV file where entries are name, age, location. We can parse this data to a dataframe using a generator. Image location is a comma separated string field and we want to read latitude and longitude separately.

name age location
John 50 123341,123321
Emily 25 321321,123321
Wick 35 123341,654789
Raj 40 987789,123321
import csv
import pandas as pd


def transform_lines(csv_path):
    reader = csv.reader(open(csv_path))

    for line_no, line in enumerate(reader):
        if line_no == 0:
            yield ["Name", "Age", "Latitude", "Longitude"]
        else:
            name, age, location = line
            lat, lng = location.split(",")
            yield [name, int(age), float(lat), float(lng)]


lines = transform_lines("./data.csv")
df = pd.DataFrame(lines)

print(df.head())

This is specially useful in large datasets where loading large amount of data in text form is memory consuming.

Hi, @anuradhawick
Thanks for taking time to suggest this modification. It definitely addresses an issue that many researchers will likely incur at some point. That said, what are your thoughts though on this being a good match for potentially absolute beginners. I fear that if someone is brand new to all this, there is a lot of automagical stuff introduced by the yield keyword that might be a bridge to far for some to wrap their head around. It might be a better fit for the instructor notes. Also, there is the potential for a community based re-write (https://carpentries.slack.com/archives/C03LE48AY/p1711535383742769) so I will likely table any major changes like this until that is settled one way or another. If you wanted to do a PR to put it into the instructor section prior to that, though, I would be happy to consider it.