mininao / BI-SQL-Homework

Homework for a College Business Intelligence course

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Rendu de projet BI - Youtube

Maxime Ghermaoui - Christophe Loeur - Vincent Salahkar - Maxence Aïci

Modèle de données

Modèle de données

  • Le modèle de données ci-dessus est disponible ici : DataModel.png
  • Le modèle original a été créé avec le logiciel Visual Paradigm, le fichier source est DataModel.vpp
  • Le script permettant de créer les tables et leur contraintes est DataModel.sql

Données de test

  • Les données de test on été générées avec Mockaroo, Permettant d'en créer autant que souhaité
  • Nous avons tenté d'obtenir un maximum de cohérence dans ces données ainsi qu'un volume important pour pouvoir bien tester nos indicateurs.
  • Les scripts sql créant chacune des tables sont dans le dossier FakeData
  • Un dump complet de la base de données est également disponible dans dump.sql.gz

Voici la quantité de données ajoutées :

Table Lignes
Channel 500
Comment 1000
Interaction 10000
Like 1000
Publicite 40
Pubview 3000
Subscription 2000
User 1000
Video 100
Videoview 5000

Indicateurs

  1. Indicateurs Simples
  1. Indicateurs Moyens
  1. Indicateurs Difficiles
  1. Indicateurs DataMart

Indicateurs Simples

Vidéos les plus vues

SELECT Video.video_id, Video.name, COUNT(Videoview.videoview_id) AS Vues
FROM Videoview
INNER JOIN Interaction
INNER JOIN Video
WHERE Videoview.interaction_id = Interaction.interaction_id
AND Interaction.video_id = Video.video_id
AND Interaction.start_date BETWEEN date_sub(now(),INTERVAL 1 WEEK) AND now()
GROUP BY Video.video_id
ORDER BY Vues DESC, Video.name ASC
LIMIT 12;

Pubs qui poussent à cliquer

SELECT S1.publicite_id, S1.name, S2.Clics/S1.Vues AS Pourcentage
FROM (SELECT Publicite.name,Publicite.publicite_id, COUNT(Pubview.pubview_id) AS Vues
FROM Pubview
INNER JOIN Publicite
WHERE Pubview.publicite_id = Publicite.publicite_id
GROUP BY Pubview.publicite_id) AS S1
INNER JOIN (SELECT Publicite.name,Publicite.publicite_id, SUM(Pubview.clicked) AS Clics
FROM Pubview
INNER JOIN Publicite
WHERE Pubview.publicite_id = Publicite.publicite_id
GROUP BY Pubview.publicite_id) AS S2
WHERE S1.publicite_id = S2.publicite_id
GROUP BY S1.publicite_id
ORDER BY Pourcentage DESC
LIMIT 12;

Chaînes avec le plus d'abonnés

SELECT Channel.channel_id, Channel.channel_name, COUNT(Subscription.subscription_id) AS Subs
FROM Subscription
INNER JOIN Channel
WHERE Subscription.channel_id = Channel.channel_id AND Subscription.end_date IS NULL
GROUP BY Subscription.channel_id
ORDER BY Subs DESC, Channel.channel_name ASC
LIMIT 10;

Indicateurs Moyens

Vidéos les plus populaires

SELECT S1.video_id, S1.name, ((3*S1.Comments + S2.Likes)*100/S3.`Views` + S3.`Views`) AS Popularite, S1.Comments, S2.Likes, S3.`Views`
FROM(SELECT Video.video_id, Video.name, COUNT(Comment.comment_id) AS Comments
FROM Comment
INNER JOIN Interaction
INNER JOIN Video
WHERE Comment.interaction_id = Interaction.interaction_id AND Interaction.video_id = Video.video_id AND Comment.end_date IS NULL
GROUP BY Interaction.video_id) AS S1

INNER JOIN(SELECT Video.video_id, Video.name, COUNT(`Like`.`like_id`) AS `Likes`
FROM `Like`
INNER JOIN Interaction
INNER JOIN Video
WHERE `Like`.interaction_id = Interaction.interaction_id AND Interaction.video_id = Video.video_id AND `Like`.end_date IS NULL
GROUP BY Interaction.video_id) AS S2

INNER JOIN (SELECT Video.video_id, Video.name, COUNT(Videoview.videoview_id) AS `Views`
FROM Videoview
INNER JOIN Interaction
INNER JOIN Video
WHERE Videoview.interaction_id = Interaction.interaction_id AND Interaction.video_id = Video.video_id
GROUP BY Interaction.video_id) AS S3

WHERE S1.video_id = S2.video_id AND S2.video_id = S3.video_id
ORDER BY Popularite DESC
LIMIT 12;

Vidéos qui rapportent le plus d'argent

Select Video.video_id,Video.name,
COUNT(S1.interaction_id) AS PubViews,
SUM(S1.clicked) AS PubClics,
AVG(S1.CPC) AS AverageCPC,
AVG(S1.CPV) AS AverageCPV,
SUM(Revenue) AS TotalRevenue
FROM
(Select Pubview.clicked, Publicite.CPV, Publicite.CPC, Pubview.interaction_id, (Publicite.CPV + Pubview.clicked * Publicite.CPC) AS Revenue
FROM Pubview
INNER JOIN Publicite
WHERE Pubview.publicite_id = Publicite.publicite_id) AS S1

INNER JOIN Interaction ON S1.interaction_id = Interaction.interaction_id
INNER JOIN Video ON Interaction.video_id = Video.video_id
GROUP BY Interaction.video_id
ORDER BY TotalRevenue DESC, Video.name ASC
LIMIT 10;

Genre de vidéos qui poussent à les regarder en entier

SELECT Video.genre,
AVG(Videoview.duration / Video.duration) AS AverageViewRatio,
AVG(Video.duration) AS AverageVideoDuration,
AVG(Videoview.duration) AS AverageViewDuration
FROM Videoview
INNER JOIN Interaction ON Videoview.interaction_id = Interaction.interaction_id
INNER JOIN Video ON Interaction.video_id = Video.video_id
GROUP BY Video.genre
ORDER BY AverageViewRatio DESC, Video.Genre ASC
LIMIT 10;

Indicateurs Difficiles

Vidéos les plus populaires avec pondération temporelle

SELECT S1.video_id, S1.name, ((3*S1.FreshComments + S2.FreshLikes)*100/S3.FreshViews + S3.FreshViews) AS FreshPopularite, S1.FreshComments, S2.FreshLikes, S3.FreshViews
FROM(SELECT Video.video_id, Video.name, COUNT(Comment.comment_id) AS Comments,
SUM(GREATEST(
  TIMESTAMPDIFF(SECOND,Interaction.start_date,date_sub(now(),INTERVAL 1 MONTH))
/ TIMESTAMPDIFF(SECOND,now()                 ,date_sub(now(),INTERVAL 1 MONTH))
,0)) AS FreshComments
FROM Comment
INNER JOIN Interaction
INNER JOIN Video
WHERE Comment.interaction_id = Interaction.interaction_id AND Interaction.video_id = Video.video_id AND Comment.end_date IS NULL
GROUP BY Interaction.video_id) AS S1

INNER JOIN(SELECT Video.video_id, Video.name, COUNT(`Like`.like_id) AS Likes,
SUM(GREATEST(
  TIMESTAMPDIFF(SECOND,Interaction.start_date,date_sub(now(),INTERVAL 1 MONTH))
/ TIMESTAMPDIFF(SECOND,now()                 ,date_sub(now(),INTERVAL 1 MONTH))
,0)) AS FreshLikes
FROM `Like`
INNER JOIN Interaction
INNER JOIN Video
WHERE `Like`.interaction_id = Interaction.interaction_id AND Interaction.video_id = Video.video_id AND `Like`.end_date IS NULL
GROUP BY Interaction.video_id) AS S2

INNER JOIN (SELECT Video.video_id, Video.name, COUNT(Videoview.videoview_id) AS `Views`,
SUM(GREATEST(
  TIMESTAMPDIFF(SECOND,Interaction.start_date,date_sub(now(),INTERVAL 1 MONTH))
/ TIMESTAMPDIFF(SECOND,now()                 ,date_sub(now(),INTERVAL 1 MONTH))
,0)) AS FreshViews
FROM Videoview
INNER JOIN Interaction
INNER JOIN Video
WHERE Videoview.interaction_id = Interaction.interaction_id AND Interaction.video_id = Video.video_id
GROUP BY Interaction.video_id) AS S3

WHERE S1.video_id = S2.video_id AND S2.video_id = S3.video_id
ORDER BY FreshPopularite DESC
LIMIT 12;

Chaines qui rapportent le plus d'argent

Select Channel.channel_id, Channel.channel_name,
COUNT(S1.interaction_id) AS PubViews,
SUM(S1.sponsored) AS SponsoredPubViews,
SUM(S1.clicked) AS PubClics,
AVG(S1.CPC) AS AverageCPC,
AVG(S1.CPV) AS AverageCPV,
COUNT(DISTINCT Video.video_id) AS Videos,
SUM(Revenue) AS TotalRevenue
FROM
(Select Pubview.clicked,Publicite.sponsored, Publicite.CPV, Publicite.CPC, Pubview.interaction_id, (Publicite.CPV + Pubview.clicked * Publicite.CPC) AS Revenue
FROM Pubview
INNER JOIN Publicite
WHERE Pubview.publicite_id = Publicite.publicite_id) AS S1

INNER JOIN Interaction ON S1.interaction_id = Interaction.interaction_id
INNER JOIN Video ON Interaction.video_id = Video.video_id
INNER JOIN Channel ON Video.channel_id = Channel.channel_id
GROUP BY Channel.channel_id
ORDER BY TotalRevenue DESC, Video.name ASC
LIMIT 10;

Chaines qui poussent à rester sur youtube

SELECT Channel.channel_id, Channel.channel_name,
ViewCounter.Views,
COUNT(Videoview.videoview_id) AS CreatedViews,
GREATEST(1-(COUNT(Videoview.videoview_id)/ViewCounter.Views),0) AS BounceRate
FROM Videoview
INNER JOIN Video ON Videoview.previous_video_id = Video.video_id
INNER JOIN Channel ON Video.channel_id = Channel.channel_id

INNER JOIN
(SELECT Channel.channel_name, Channel.channel_id, COUNT(Videoview.videoview_id) AS `Views`
FROM Videoview
INNER JOIN Interaction ON Videoview.interaction_id = Interaction.interaction_id
INNER JOIN Video ON Interaction.video_id = Video.video_id
INNER JOIN Channel ON Video.channel_id = Channel.channel_id
GROUP BY Video.channel_id)
AS ViewCounter ON ViewCounter.channel_id = Channel.channel_id

GROUP BY Channel.channel_id
ORDER BY BounceRate ASC, Channel.channel_name ASC
LIMIT 10;

Indicateurs DataMart

Nous avons intégré au datamart deux indicateurs supplémentaires : le revenu publicitaire gagné chaque jour et la durée passée par tous les utilisateurs à regarder des vidéos chaque jour (en secondes).

Ces deux indicateurs devront être représentés à l'aide d'une courbe présentant le temps en abscisses et la quantité en ordonnées, comme celle-ci :
Exemple de courbe

Revenu Journalier

Select date(Interaction.start_date) AS revenue_date,
SUM(Publicite.CPV + Pubview.clicked * Publicite.CPC) AS revenue_amount
FROM Pubview
INNER JOIN Publicite ON Pubview.publicite_id = Publicite.publicite_id
INNER JOIN Interaction ON Pubview.interaction_id = Interaction.interaction_id
GROUP BY date(Interaction.start_date)

Durée Journalière regardée totale

Select date(Interaction.start_date) AS duration_date,
SUM(Videoview.duration) AS duration_amount
FROM Videoview
INNER JOIN Interaction ON Videoview.interaction_id = Interaction.interaction_id
GROUP BY date(Interaction.start_date)

About

Homework for a College Business Intelligence course