This repository attempts to provide a solution for Marketing Digital challenge as part
of Analytics Engineer Exam at Itaú Decision Science Team.
Requirements:
In order to be able to run this application, you should have in your machine:
docker
How to launch the application:
In the root folder of this application, call the following command line to build and run the solution:
docker-compose up -d
How the application works:
Tables that are going to host the data from the ingestion are defined in alembic scripts
They are created during the building phase of the application as soon as the database is available
Invoke is used to automate some tasks like checking the database availability, calling alembic migration
and truncate sor tables
It first loads a configuration file (ingestion.yml) which contains a list of information
about the ingestions to be executed.
The ingestion contains the following information:
schema: name of the schema in the database where table is created
name: name of the table in that schema which will persist the data
fields: fields name in the order they are expected to be read from the file and inserted in the table
file_format: file format to be ingested (the values expected here are going to be
passed to pyspark when reading the file)
file_path: path where resides the file to be ingested
header: it tells if the file contains a header
parser: Python script initializing parser to be used during the ingestion
Uses pyspark to read the files. i.e.: PageViewParser() for pageviews and None and it is not required
Create a insert sql statement for each record
Executes the insertions in the postgresql database
Then it starts to execute *.sql scripts registered at sql_scripts.yml in the same order as in the file
Information to connect to database
The database which we are inserting the data is going to be available locally after lauching the application.
The information to connect to that database is the followin:
host
port
user
password
database
localhost
5432
admin
admin
marketing
JDBC string
postgresql://admin:admin@localhost:5432/marketing
Findings about the data:
a client can have more than one device
a device can have more than one ip address
a device can have seen more than one campaign
clients can sign a contract even though they haven`t seen a campaign
the same campaign could be launch in multiple medias (facebook and google)
Answers
1 - What was the most expensive campaign?
select campaign_name, sum(cost)
fromespec.campaign_efficiencygroup by campaign_name
order bysum(cost) desclimit1;
campaign_name
cost
creditas|home|natal2018
19459.090000000004
2 - What was the most profitable campaign?
select campaign_name, sum(profit)
fromespec.campaign_efficiencygroup by campaign_name
order bysum(profit) desclimit1;
campaign_name
profit
emprestimo_garantia|home|natal2018
34065.47000000004
3 - Which ad creative is the most effective in terms of clicks?
select ad_creative_name, clicks
fromespec.campaign_efficiencyorder by clicks desclimit1;
ad_creative_name
clicks
mulheres_bracos_alto
5711240
4 - Which ad creative is the most effective in terms of generating leads?
select ad_creative_name, leads
fromespec.campaign_efficiencyorder by leads desclimit1;
ad_creative_name
leads
mulheres_bracos_alto
132
About
This repository attempts to provide a solution for Marketing Digital challenge as part of Analytics Engineer Exam.