brurucy/gitpip

Gitpip

Intro - 1

Gitpip is a small microservice that communicates with two APIs, Pipedrive and Github.

In short, it tracks a certain number of github users’ gists; Gists are routinely scanned, and in case a gist wasn’t seen before, it’ll be saved and published as a pipedrive activity.

Task - 2

Using the Github API you should be able to query a user’s publicly available github gists and create a deal/activity in Pipedrive for each gist. Implement an application that periodically checks for a user’s publicly available gists, this application should also have a web endpoint where it should show the gists for that user that were added since the last visit.

Functionality - 3

All in all, the app has the following requirements:

it must query some user Github’s **Gist** API
it must post a gist from some user as an activity or deal to **Pipedrive’s** API
it must have some kind of periodic check, that will add unseen gist
it must have an endpoint where recently added gist are shown, for some given user
it must have an endpoint that shows all user that are being tracked
Proper logging

Assumptions - 4

I understand posting a gist as posting the contents of it as an activity, with the deal bearing the user’s username.
The periodic checks obviously imply the need for some kind of state
it should show the gists for that user that were added since the *last visit*. I’m assuming that visit is some kind of a session, hence getting last users will first check for the last session in order to use it as the starting timestamp from which all added gists with at least that timestamp, for that specific user, will be returned. After this a new session will be added.

Tech Rationale - 5

I wrote it in Golang, and I tried to make it as raw as possible, that is, with the least amount of external libraries. I am a huge advocate for YAGNI and KISS; only two gorilla/mux and logrus were used.

From the functonality requirements, and logging needs, we can have the following relational structure(I’m omitting the types), which also explains the rationale:

Users

user_id (does not have to be github’s user id), username, created_at

Gists

gist_unique_id (this is the gist’s file id, every gist can have multiple files)
gist_id (the full gist hash)
raw_url_link (the link to the gist file’s text, to be posted as an activity)
username (the user’s username, breaking the normal form for the sake of simplicity)
gist_file_title
created_at

Routine

routine_id (this is the id of each every-three-hours gist fetch)
created_at

Route_gist_user

(the point of this table is to keep track of all gists of all users that were added on each routine)
routine_id
gist_id
user_id

Session

session_id (this will hold each GetLatestGists session. Every time that endpoint is queried it will look for the previous session and respond accordingly)
user_id
created_at

Folder Structure - 6

`kubernetes_files`

Here I keep the generated docker-compose kubernetes deployments and services generated by kompose

`pkg`

Where all code lies in.
domain_types.go holds all types used for SQL and JSON
handler.go - Where the endpoint handler is in
repository.go - Where all the database logic lies
utils.go - where the github and pipedrive API access logic is at.

`scripts`

The SQL init script.

`wait-for-it`

The script to wait for another service to load. The licensing is in it, and it gives the needed credits.

Docker - 7

Everything was developed with Docker, attempting to use lightweight images since the very first commit.

The goal of using docker-compose is, other than the obvious, to easily convert it to kubernetes later using kompose.

All sensitive information is required as an environment variable that ought to be passed by modifying the given docker-compose.yml or kubernetes files. I know environment variables can be a security risk, due to the fact that if a malicious actor exec’s into the container they can easily get the sensitive credentials.

Endpoints - 8

/users

POST

/users/<some_user>/<some_unique_id> - Adds a new user

GET

/users - Gets all tracked users.

/health

GET

/health - Returns “Alive” with header 200 if the service is functional and non blocked.

/latestgists

POST

/latestgists/<some_user> Gets all newly added gists for a specified, already in, user, AND records the session, from which the next call will filter gists from. it will only show gists if a routine has happened, otherwise it won’t show any. with the kubernetes endpoint it is not possible to know when the last routine happened

How to run - 9

I have compiled two binaries, for osx and generic linux. it should suffice to get it up and running given that you have the following environmental variables set up:

PIPEDRIVE_TOKEN
PIPEDRIVE_ORG
POSTGRES_CONNECTION_STRING

You can also use docker-compose as follows:

First, let’s build the microservice image.

docker build -f Dockerfile -t gistdrive:1.0

Then, let us spin up the compose(make sure to fill in your credentials)

docker-compose -f docker-compose.yml up -d

And viola!

curl "localhost:8080/users"

Should return nothing

curl -X POST -H "Context-Type: application/json" "http://localhost:8080/users/<some_username>/<some_unique_id_not_necessarily_githubs_id>"

Should return a new User.

And at last

curl -X POST -H "Context-Type: application/json" "http://localhost:8080/latestgists/<some_added_username>"

Should return the newly added gists, with respect to the last time you’ve made a POST to that endpoint, and given that a routine has happened(EVERY THREE HOURS). So if it didn’t show anything now, come back in 3 hours :) (sorry)

Cloud - 10

All access is secured by RBAC, with firewall rules restricting all and any access other than from the specified microservice endpoint.

everything is in europe-north1a, Finland.

Kubernetes

The project was deployed on a Google Kubernetes Engine instance with 3 nodes, a replication factor of 2, and 12 GB total memory. It uses hardened nodes in order to provent malicious nodes from trying to take over the cluster.

The health check is done using a LivelinessProbe, querying the /health endpoint and the readiness check uses a ReadinessProbe with the wait-for-it.sh script in order to wait for the postgres pod. I know that in order for postgres to have persistency it needs a volume. For the sake of simplicity I decided to make it ephemeral.

I did not use any provisioning tool due to lack of time.

This should suffice for resilience and scalability, within this very specific context.

Docker Images

The images were stored on Google’s Artifact registry under private, source controlled registries.

Stackdriver

All logging, database, service and general kube, is routed to Stackdriver.

I know that logrus is writing everything to stderr, I didn’t have enough time to fix it.

External Service Ip

A load balanced external ip is: 35.228.33.forty-six*(the actual number is 46, I’m just writing it down in order to not get busted by crawlers) on port *8080.

All endpoints are accessible from there.

Tests - 11

I wrote some 300 lines of test, but they were quite shameful, please don’t look.

However, if you really want to check them out you can just look at the past commit.

brurucy / gitpip

Gitpip

Intro - 1

Task - 2

Functionality - 3

Assumptions - 4

Tech Rationale - 5

Users

Gists

Routine

Route_gist_user

Session

Folder Structure - 6

`kubernetes_files`

`pkg`

`scripts`

`wait-for-it`

Docker - 7

Endpoints - 8

/users

POST

GET

/health

GET

/latestgists

POST

How to run - 9

Cloud - 10

Kubernetes

Docker Images

Stackdriver

External Service Ip

Tests - 11

About

Languages