In-person class challenge for data wrangling.
I am from section 01, group B and my number is 3, so I was assigned the play, 'The Comedy of Errors' by Shakespeare.
http://shakespeare.mit.edu/comedy_errors/full.html
-
Speaker 1 - LUCIANA
-
Speaker 2 - ADRIANA
Who speaks more? Speaker 1 or speaker 2?
- I have used curl command for storingand sorting text data in an input file called "ac-input.txt" from the URL.
$ curl "http://shakespeare.mit.edu/comedy_errors/full.html" | sed 's/<\/*[^>]*>//g' > "ac-input.txt"
- I have used the curl command for displaying 'LUCIANA' from the input text file.
$ grep 'LUCIANA' ac-input.txt
- Then, I used the following command for counting the number of times 'LUCIANA' appears.
$ grep 'LUCIANA' ac-input.txt -c
50
- Similarly, I used the same command for counting 'ADRIANA'.
$ grep 'ADRIANA' ac-input.txt -c
84
- I have then stored these values in two different output text files.
$ grep 'ADRIANA' ac-input.txt -c > "ac-output.txt"
$ grep 'LUCIANA' ac-input.txt -c > "ac-output2.txt"
$ grep 'LUCIANA' ac-input.txt -c > "ac-output-luciana.txt"
$ grep 'ADRIANA' ac-input.txt -c > "ac-output-adriana.txt"
Note: I have deleted ac-output.txt and ac-output2.txt
(after data sorting) ac-input.txt
Clearly, Andriana likes to speak more than Luciana ;)