linvieson / wikimedia-api

Big Data Processing coursework project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WikiMedia project

Description

This is a system consisting of several servers, which main idea is to provide an API for the user to get statistics on the data from WikiMedia.

The system has two kinds of APIs, let API A and API B.

Architecture

There are several servers processing the requests. The reliability of the system is ensures by adding a queue of requests and repsonses. The availability of the system as far as its consistency may be questioned, though (you can experience that while using the API) :)

The detailed design documentation can be found in the Wikimedia System Document file.

Data models

The data is preserved in Cassandra database. The data is saved in a way that would be beneficial for getting faster responses for the API calls.

The detailed data models description can be found in the Wikimedia System Document file.

Usage

Clone the repository on your local machine.

Run start.sh in your terminal.

Check that you can run cqlsh in the cassandra container. If yes - good for you, the program has started. If no - wait for 1-2 minutes and run the start.sh again (the cassandra container should be ready by that time, so running the start.sh file will ensure the cql tables are created properly.)

You can now send the requests for the provided API endpoints. There are two ways for that.

  1. On your localhost, port 8000, you can set url as one of the endpoints.
  2. You can use requests library in python and send the requests to the endpoint urls through the code.

Results

There are two APIs implemented, each one having its own logic, but working within the same system.

You can see the image of how system looks like in the Wikimedia System Document file.

About

Big Data Processing coursework project


Languages

Language:Python 95.8%Language:Dockerfile 3.6%Language:Shell 0.6%