keiraqz / flink-training-ex

My solution to Flink Training Exercises from Data Artisan

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Flink Training Exercises

My solutions to Flink Training Exercises from Data Artisan: http://dataartisans.github.io/flink-training/index.html.

All my solutions are here.

Exercise 1: Mail Count

  • Description: The task of the “Mail Count” exercise is to count the number of emails in the archive of the Flink development mailing list for each unique combination of email address and month.
  • My solution: FlinkMail.java

Exercise 2: Reply Graph

  • Description: The task of the “Reply Graph” exercise is to extract reply connections between mails in the archives of Apache Flink’s developer mailing list. A reply connection is defined as a pair of two email addresses (Tuple2<String, String>) where the first email address replied to an email of the second email address. The task of this exercise is to compute all reply connections between emails in the Mail Data Set and count the number of reply connections for each unique pair of email addresses.
  • My solution: ReplyGraph.java

Exercise 3: TF-IDF

  • Description: The task of the TF-IDF exercise is to compute the term-frequency/inverted-document-frequency (TF-IDF) metric for words in mails of the Flink developer mailing list archives.
  • My solution: TfIdf.java

About

My solution to Flink Training Exercises from Data Artisan


Languages

Language:Java 100.0%