Data Wrangling

In-person class challenge for data wrangling.

Assigned play:

I am from section 01, group B and my number is 3, so I was assigned the play, 'The Comedy of Errors' by Shakespeare.

http://shakespeare.mit.edu/comedy_errors/full.html

My Speakers:

Speaker 1 - LUCIANA
Speaker 2 - ADRIANA

Question asked:

Who speaks more? Speaker 1 or speaker 2?

Commands used (BASH):

I have used curl command for storingand sorting text data in an input file called "ac-input.txt" from the URL.

$ curl "http://shakespeare.mit.edu/comedy_errors/full.html" | sed 's/<\/*[^>]*>//g' > "ac-input.txt"

I have used the curl command for displaying 'LUCIANA' from the input text file.

$ grep 'LUCIANA' ac-input.txt

Then, I used the following command for counting the number of times 'LUCIANA' appears.

$ grep 'LUCIANA' ac-input.txt -c
50

Similarly, I used the same command for counting 'ADRIANA'.

$ grep 'ADRIANA' ac-input.txt -c
84

I have then stored these values in two different output text files.

$ grep 'ADRIANA' ac-input.txt -c > "ac-output.txt"
$ grep 'LUCIANA' ac-input.txt -c > "ac-output2.txt"
$ grep 'LUCIANA' ac-input.txt -c > "ac-output-luciana.txt"
$ grep 'ADRIANA' ac-input.txt -c > "ac-output-adriana.txt"

Note: I have deleted ac-output.txt and ac-output2.txt

Link to the input file:

(after data sorting) ac-input.txt

Links to the count:

Answer:

Clearly, Andriana likes to speak more than Luciana ;)

Screenshot:

About

In-person class challenge for data wrangling.

Languages

Language:HTML 100.0%