.βΏβ
`:-. Ξ ΒΌ
-yyyyys+:` β β
/yβββyyyyy` +o+/:-. Γ‘ β
`oββββββββy- :hhhhhhhs β -
`sβββββββββy+ yhhhhhhhy β ,.
`-/oyyyyyyββyyo :hhhhhhhhs ββ Β½
-/oyyyyyyyyyyββyo` shhhhhhhhs β / k Β½
+yyyyyyyyyyyyyyββ+ `hhhhhhhhho β β Β½ Γ―
+yyyyyyyyyyyyyyββ .hhhhhhhhho β ; Β½ -
`yyyyyyyyys+/. ββ yhhhhhhhhhs` ; Y β
-yso+/-.` ββ .yhhhhhhhhhy. β β \,
```...ββ` `shhhhhhhhhy. . - βΏ.
-://++++++++ββ++/-` /yhhhhhhhhy. . ^- ~,
/+++++++++++ββ+++++/. `ohhhhhhy+` .β βΏ, ββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
`+++++++++++ββ++++++++++-` -` ββ
`......-...ββ+++++++++++/`
ββ`-/+++++++:`
ββ `.:/+:`
Computer, compute to the last digit the value of pi.
Mobify Data Guide
Welcome to Mobify's data guide! We have provided a list of readings that would be useful in getting started on with working with any data set.
π€ Why this guide?
This is an open-source guide that is intended to gather feedback from various people that have worked with data teams. In Mobify, we work closely with talents from wide variety of backgrounds.
We hope that by opening some of our onboarding materials, this will give you a taste for what is our style of work, as well as helping out candidates on interviews, or data hackathons.
π Legend
We denote each type of articles with Emoji: π π π
- π Articles - expect around 10-15 mins reading time
π Tutorials - expect minimal half day exerciseπ Advance Reference (optional readings) - vary in reading time
What happens if I am preparing for an interview/hackathon tomorrow?
We recommend you at least go through the articles and take the:
- Python + Pandas tutorial
- Setup your environment following Setting up your data dojo and run some practices
π Content of this guide
This is meant to be a list of selected resources on what we think is the minimal set to bootstrap to working on data challenges.
- Getting started
- Data Science 101
- Engineering tools 101
- Setting up your data dojo
- Think about the problem
See CONTRIBUTING.md for contributing guideline
π Getting started
So you would like to work on data eh? There are many great resources to get you started on the path to work with data. We recommend a few of these articles:
-
π Quora's answer - How can I become a data scientist?- Gives good overview for background/readings that would be helpful
- A few of these articles we will dive in at following sections
-
π Applying the Scientific Method to Software Engineering- This is a good article explaining the intersection between academia and a real-world engineering scenario
π Data Science 101
If you come from a non statistics/machine learning background, this will be a good starting point.
π Statistics for hackers - have a basic list of readings about statistics knowledge required.π Machine Learning for hackers - give good coverage of various aspects of machine learning.π Scikit-learn estimator map - is my go to place for picking the right model to use.
π
Engineering tools 101
Learning to code is an important step in becoming data literate. There are 3 main engineering tools we use.
Python + Pandas
At Mobify, we are a Python shop which makes us focus our analysis on Python + Pandas. Below is some of our favourite tutorial to get started:
π DataQuest/Data scientist is a good onboarding for Python and Pandas.- (advance)
π Pandas with Seaborn give a simple article on how to do various Seaborn plots for data visualization.
SQL
SQL is used everywhere.
- The
π Codecademy SQL course is our favourite tutorial.
Command line
Being comfortable with command line will help a great deal in your work. We recommend taking π Codecademy command line course for this.
Git
Git solves 2 big communication challenge working as a team:
- Resolving how multiple people work on the same piece of code, on their own computer. Foe example we have π branching strategy which helps us to organize code.
π Code review andπ pull request. on Github. For example, see aπ pull request on this repo.
The
π Setting up your data dojo
So are you ready to get started? One thing we found correlated to the ability of interview candidates is the ability to get comfortable with the environment that you will use during the interview. We try to give a few tips.
Also, see Disclaimer - that Mobfiy is a Python shop and likely to be Python focus for our data dojo! Our tool of choice is Jupyter notebook
Hosted version
Local setup (Advance)
If you want to setup a self-hosted version of Jupyter, you might want to check out
Getting familiar with Jupyter notebook
π Think about the problem
As most of us being proud of diving into our problems, and present our solutions. Over time, we learn a few tools to align colleagues/fellow hackers with our thoughts. Here are a few:
Focus on the right problem to work on
If I had an hour to solve a problem I'd spend 55 minutes thinking about the problem and 5 minutes thinking about solutions β Albert Einstein
It is a surprisingly difficult skill to learn how to work on the right problem. Here are a few tips:
-
Whiteboarding and Canvasing is a great way to open our mind. More at π Introducing the Deep Learning Canvas - a variation on Startup Canvas - You can print this out or grab a whiteboard and draw this out.
-
Data
π Design sprint - Keeping open minded. We also enjoy a minimal version of thisπ The 25-Minute Design Sprint which we find it helpful to adjust and adapt.
Communicating the results
I'm not a great programmer; I'm just a good programmer with great habits - Kent Beck
Writing a readable notebook and explaining the result is a great
habit.
We would like to Keep your analysis reproducible
Reproducibility is important because it is the only thing that an investigator can guarantee about a study. -- Roger Peng
π Disclaimer
We are a data shop with engineering focus shop and is opinionated towards
selecting easy to get started tools that work with our well with our stack (e.g. Python
,
Jupyter Notebook
) - this is a way that we found it works well for us.
We have no affiliation to any of the companies mentioned in this list.