The project consists from the following high-level components.
Reads the list of Github accounts from the /tmp/github-accounts.txt
file (JSON format) and writes them in Kafka in topic github-accounts
.
Consumes github accounts and produces commits in topic github-commits
:
- Receives topics with the specified accounts from Kafka,
- gets accounts' commits data using Github REST API,
- writes the commits back into Kafka.
Analyzes commits from Kafka and writes back the metrics based on them. Works with exactly-once semantics.
Writes the corresponding metrics into the following topics:
github-metrics-total-commits
;github-metrics-total-committers
;github-metrics-top-committers
;github-metrics-languages
.
Reads the metrics from Kafka and writes it into the /tmp/github-metrics.txt
file (CSV format with :
as a separator).
It is assumed that all the commands are called from the root directory of the project.
-
At first, it's needed to run Kafka cluster and create the needed topics. It can be done using the commands provided in
kafka-cluster/commands.md
, the required properties files are provided there as well. -
Then it's required to run Kafka Connect workers, and to create necessary connectors. It can be done using the commands provided in
kafka-connect/commands.md
. -
To build Java components of the system Maven is required:
mvn clean package
- To run Github Accounts Analyzer use the following command:
java -jar github-accounts-analyzer/target/github-accounts-analyzer-1.0-jar-with-dependencies.jar
- To run Commits Metrics:
java -jar commits-metrics-kafka-stream/target/commits-metrics-kafka-stream-1.0-jar-with-dependencies.jar
- Now by writing into file
/tmp/github-accounts.txt
we can see the corresponding metrics in the file/tmp/github-metrics.txt
. Example of the input for accounts file (it uses JSON syntax, each record should be at one line):
{"account": "ilyavy", "interval": "1d"}
All the metrics will be written in the file with metrics (it uses CSV format with :
as a separator). Example:
total_commits: 1
Java: 1
top5_committers: ilyavy (1)
total_committers: 1
├── kafka-cluster # properties files and CLI commands needed to run Kafka cluster
├── kafka-connect # properties files and CLI commands needed to run Kafka Connect and create connectors
├── github-accounts-analyzer # Github Accounts Analyzer
├── commits-metrics-kafka-stream # Commits Metrics Kafka Stream
├── modules-integration-test # End-2-End integration tests for modules of the project