IFDSItauMarketing

This repository attempts to provide a solution for Marketing Digital challenge as part of Analytics Engineer Exam at Itaú Decision Science Team.

Requirements:

In order to be able to run this application, you should have in your machine:

docker

How to launch the application:

In the root folder of this application, call the following command line to build and run the solution:

docker-compose up -d

How the application works:

Tables that are going to host the data from the ingestion are defined in alembic scripts
- They are created during the building phase of the application as soon as the database is available
Invoke is used to automate some tasks like checking the database availability, calling alembic migration and truncate sor tables
It first loads a configuration file (ingestion.yml) which contains a list of information about the ingestions to be executed.
- The ingestion contains the following information:
  - schema: name of the schema in the database where table is created
  - name: name of the table in that schema which will persist the data
  - fields: fields name in the order they are expected to be read from the file and inserted in the table
  - file_format: file format to be ingested (the values expected here are going to be passed to pyspark when reading the file)
  - file_path: path where resides the file to be ingested
  - header: it tells if the file contains a header
  - parser: Python script initializing parser to be used during the ingestion
Uses pyspark to read the files. i.e.: PageViewParser() for pageviews and None and it is not required
Create a insert sql statement for each record
Executes the insertions in the postgresql database
Then it starts to execute *.sql scripts registered at sql_scripts.yml in the same order as in the file

Information to connect to database

The database which we are inserting the data is going to be available locally after lauching the application. The information to connect to that database is the followin:

host	port	user	password	database
localhost	5432	admin	admin	marketing

JDBC string
postgresql://admin:admin@localhost:5432/marketing

Findings about the data:

a client can have more than one device
a device can have more than one ip address
a device can have seen more than one campaign
clients can sign a contract even though they haven`t seen a campaign
the same campaign could be launch in multiple medias (facebook and google)

Answers

1 - What was the most expensive campaign?

select campaign_name, sum(cost)
from espec.campaign_efficiency
group by campaign_name
order by sum(cost) desc
limit 1;

campaign_name	cost
creditas\|home\|natal2018	19459.090000000004

2 - What was the most profitable campaign?

select campaign_name, sum(profit)
from espec.campaign_efficiency
group by campaign_name
order by sum(profit) desc
limit 1;

campaign_name	profit
emprestimo_garantia\|home\|natal2018	34065.47000000004

3 - Which ad creative is the most effective in terms of clicks?

select ad_creative_name, clicks
from espec.campaign_efficiency
order by clicks desc
limit 1;

ad_creative_name	clicks
mulheres_bracos_alto	5711240

4 - Which ad creative is the most effective in terms of generating leads?

select ad_creative_name, leads
from espec.campaign_efficiency
order by leads desc
limit 1;

ad_creative_name	leads
mulheres_bracos_alto	132

johngodoi / DigitalMarketing