16 游戏玩法分析 IV
astak16 opened this issue · comments
柚子uccs commented
题目
查询首次登录游戏并且第二天再次登录游戏玩家的比率,四舍五入到小数点后两位。
create table activity (
player_id int,
device_id int,
event_date date,
games_played int
);
insert into activity values
(1, 2, '2016-03-01', 5),
(1, 2, '2016-03-02', 6),
(1, 3, '2017-06-25', 1),
(3, 1, '2016-03-02', 0),
(3, 4, '2018-07-03', 5);
SQL:方法一
select round(select count(player_id) from (
select
player_id,
event_date,
lag(event_date)
over(partition by player_id order by event_date) as next_date,
rank() over(partition by player_id order by event_date) as login_times
from activity
) as temp where datediff(event_date, next_date) = 1 and login_times = 2
/
select count(player_id) from (
select player_id from activity group by player_id
) as temp, 2) as fraction;
解析
需要求出两个值:第一个值是总人数,第二个值是连续登录两天的玩家数
- 求总人数:
将 activity
表按照 player_id
进行分组在计算出总人数
select count(player_id) from (
select player_id from activity group by player_id
) as temp
还有一种写法:
select count(distinct player_id) from activity
- 求连续两天登录游戏的玩家数:
使用窗口函数 lag(event_date)
将日期按照 player_id
分组,并按照 event_date
升序排列,然后偏移一天:
lag(event_date) over(partition by player_id order by event_date) as next_date
使用 rank
对日期按照 player_id
分组,并按照 event_date
升序排列,然后排序:
rank() over(partition by player_id order by event_date) as login_times
组合成完整的 SQL
:
select
player_id,
event_date,
lag(event_date)
over(partition by player_id order by event_date) as next_date,
rank() over(partition by player_id order by event_date) as login_times
from activity
筛选出日期相差一天,并且是第一次连续登录,这里 login_times
取值 2
,因为上面按照登录日期进行排序过了,第一天登录是 1
,第二天登录是 2
。
计算出 player_id
的个数就可以算出连续登录两天的玩家了数了:
select count(player_id) from (
select
player_id,
event_date,
lag(event_date)
over(partition by player_id order by event_date) as next_date,
rank() over(partition by player_id order by event_date) as login_times
from activity
) as temp where datediff(event_date, next_date) = 1 and login_times = 2
- 最后:
使用 round()
保留两位小数
SQL:方法二
select round((
(select count(player_id) from (
select
player_id,
datediff(event_date, min(event_date) over(partition by player_id)) as diff
from activity
) as temp where diff = 1) / (select count(distinct player_id) from activity)
), 2) as fraction;
解析
- 计算出每个用户最近两个登陆日期时间差:
select
player_id,
datediff(event_date, min(event_date) over(partition by player_id)) as diff
from activity
- 计算出连续两天登录的用户数,
diff
为1
:
select count(player_id) from (
select
player_id,
datediff(event_date, min(event_date) over(partition by player_id)) as diff
from activity
) as temp where diff = 1
- 总人口计算方法,在方法一种已经给出了
- 最后使用
round()
保留两位小数
SQL:方法三
with temp as (
select
player_id,
datediff(event_date, min(event_date) over(partition by player_id)) as diff
from activity
) select round(
sum(case diff when 1 then 1 else 0 end) /
count(distinct player_id),
2) as fraction from temp;
解析
- 计算出每个用户最近两个登陆日期时间差,在方法二中给出了
- 将它作为临时表
temp
,用with
diff = 1
的和是连续两天登录的用户:
sum(case diff when 1 then 1 else 0 end)
- 临时表
temp
中记录了每个用户每次登录的时间差,去重计算player_id
:
count(distinct player_id)
- 最后使用
round()
保留两位小数
SQL:方法四
select round(avg(event_date is not null), 2) as fraction from (
select player_id, min(event_date) as first_login from activity
group by player_id
) temp left join activity
on temp.player_id = activity.player_id
and datediff(event_date, first_login) = 1;
解析
- 使用
min
算出每个用户第一次登录的日期:
select player_id, min(event_date) as first_login from activity
group by player_id
- 左连
activity
表:
select * from (
select player_id, min(event_date) as first_login from activity
group by player_id
) temp left join activity
on temp.player_id = activity.player_id
and datediff(event_date, first_login) = 1;
- 因为
temp
只有每个用户第一次登录游戏的时间,加上连接条件datediff(event_date, first_login) = 1
后,不满足条件的用户数据都null
- 使用
avg(event_date is not null)
算出平均数temp
表是每个用户第一次登录的时间activity
表是存储着所有数据temp
左连activity
筛选出第一次连续两天登录的玩家,满足条件的玩家是有值的,没有满足条件的玩家是null
event_date is not null
结果是1
或者0
,不是日期了,也没有null
了。
- 使用
round()
计算出平均数