--==========================================
需求:有一个用户登陆日志表,记录用户每次登陆时间,然后想查找用户按天连续登陆的情况,找出每次连续登陆的最早时间和最后时间以及连续登陆天数。
--===========================================
由于长久未写此类SQL,有点手生,本着走一步算一步的精神,慢慢来。
首先查看日志表
SELECT [Uid] ,[loginDate] FROM [dbo].[Member_LoginLog] WHERE [UID]=268
由于按天计算连续登陆,表中时间精确到毫秒,很难肉眼看出数据是否连续,于是考虑转换数据
而又由于我们只关心最早登陆时间和最后登陆时间,因此我们可以先按照天来统计用户最早登陆时间和最后登陆时间,并将时间转换成对应天数
--============================================== --统计出用户每天最早登陆时间和最后登陆时间 SELECT T1.[UID] ,DATEDIFF(DAY,‘2014-01-01‘,LoginDate) AS DiffDays ,MAX(LoginDate) AS MaxLoginDate ,MIN(LoginDate) AS MinLoginDate INTO [dbo].[Member_LoginLog_Status1] FROM [dbo].[Member_LoginLog] T1 GROUP BY T1.[UID],DATEDIFF(DAY,‘2014-01-01‘,LoginDate) --====================================== --查看效果 SELECT [UID] ,[DiffDays] ,[MaxLoginDate] ,[MinLoginDate] FROM [dbo].[Member_LoginLog_Status1] WHERE UID=268
从上图很容易看出第二天没连续登陆,是不是很容易看啊
接下来就是查找联系的天数了,如果我们按照UID分组,然后对DiffDays来排序求出排名来,依据DiffDays的增长量和RID量便可以判断出天数是否连续
SELECT ROW_NUMBER()OVER(PARTITION BY UID ORDER BY [DiffDays] ASC) AS RID, T1.* FROM [dbo].[Member_LoginLog_Status1] T1 WHERE [UID]=268
这样我们便可以使用表的自连接来查找连续的登录,由于需要按照用户和天数来算出排名,因此我们可以先建立索引
CREATE CLUSTERED INDEX CIX_UID_Days ON [dbo].[Member_LoginLog_Status1] ( [UID],[DiffDays] )
然后再求连续区间:
--========================================== --查找连续的登录 ;WITH Tem AS( SELECT ROW_NUMBER()OVER(PARTITION BY UID ORDER BY [DiffDays] ASC) AS RID, T1.* FROM [dbo].[Member_LoginLog_Status1] T1 ) ,Tem1 AS( SELECT ROW_NUMBER()OVER( PARTITION BY T1.[UID],T1.[DiffDays] ORDER BY T2.[diffdays]-T1.[diffdays] DESC) AS RID, T1.[UID], T1.MinLoginDate, T2.MaxLoginDate, T1.[diffdays] AS MinDiffDays, T2.[diffdays] AS MAXDiffDays FROM Tem AS T1 INNER JOIN Tem AS T2 ON T1.UID=T2.UID AND T1.[diffdays]<=T2.[diffdays] AND T2.[diffdays]-T1.[diffdays]= T2.RID-T1.RID ) SELECT [UID], MinLoginDate, MaxLoginDate, MinDiffDays, MAXDiffDays INTO [dbo].[Member_LoginLog_Status2] FROM Tem1 AS T1 WHERE T1.RID=1 --========================================= --检查结果 SELECT [UID] ,[MinLoginDate] ,[MaxLoginDate] ,[MinDiffDays] ,[MAXDiffDays] FROM [dbo].[Member_LoginLog_Status2] WHERE [UID]=268
找出连续的区间后,我们会发现有很多区间不是最大连续区间,如第5天到第17天连续,但是比之更大的区间还有第3天到第17天,对于这种问题,解决办法就是依据maxDiffDays分组,求出最小的minDiffDays
由于此时要按照用户和maxDiffDays分组,然后按照MinDiffDays排序求最小值,因此先建立索引
CREATE CLUSTERED INDEX CIX_UID_MAXDiffDays ON [AccMain_101].[dbo].[Member_LoginLog_Status2] ([UID],MAXDiffDays,MinDiffDays ASC)
然后再查询:
--==================================== --求出最大连续区间 ;WITH CTE1 AS( SELECT ROW_NUMBER()OVER(PARTITION BY [UID],MAXDiffDays ORDER BY MinDiffDays ASC) AS RID, [UID], MinLoginDate, MaxLoginDate, MinDiffDays, MAXDiffDays FROM [AccMain_101].[dbo].[Member_LoginLog_Status2] AS T1 ) INSERT INTO [dbo].[Member_LoginLog_Status3] ([Uid] ,[firstLoginDate] ,[lastLoginDate] ,[loginNumber]) SELECT [UID], MinLoginDate, MaxLoginDate, T1.MAXDiffDays-MinDiffDays AS ContinueDays FROM CTE1 T1 WHERE T1.RID=1 --================================== --查看结果 SELECT [Uid] ,[firstLoginDate] ,[lastLoginDate] ,[loginNumber] FROM [dbo].[Member_LoginLog_Status3] WHERE [UID]=268
查询结果:
结果正是我们想要的,因此打完收工,回家吃饭。
--===============================================
总结:其实查找连续或查找孤岛这类原理,都是利用自连接然后看增长是否连续,多折腾几遍就好。
--===============================================
在wwwwgou的回复中,指出一条更快捷的计算方式,同样使用排名来计算,但不使用关联,而是计算排名与登陆天数的差值,如果登陆天数连续增长,则排名也连续增长,两者的差值保持不变;如果登陆天数不连续,则登陆天数增长的值就会比排名增长的值高,这时两者的差值就会变大。
如下图:
随着天数不连续的次数增加,[天数-排名]的值会不断增大,因此可以使用[天数-排名]来分组,便可以定位到连续区间。
PS: 不会出现两个不同连续区间的[天数-排名]值一样的情况
查找代码:
--======================================== --感谢wwwwgou提供, --此代码已略做修改 SELECT [Uid], mindt = MIN(mindt), maxdt = MAX(maxdt), logdays = COUNT(*) FROM ( SELECT [Uid], RowNo = ROW_NUMBER() OVER(PARTITION BY [Uid] ORDER BY DATEDIFF(DAY,‘2014-01-01‘, loginDate)), DiffDay = DATEDIFF(DAY,‘2014-01-01‘, loginDate), mindt = MIN(loginDate), maxdt = MAX(loginDate) FROM dbo.Member_LoginLog GROUP BY [Uid], DATEDIFF(DAY,‘2014-01-01‘, loginDate) ) T GROUP BY [Uid], [RowNo] - DiffDay ORDER BY [Uid], minDt
对wwwwgou筒子再次表示婶婶地感谢。
--===============================================
请原谅我苍白的讲解,让您们只能看代码。
妹子骚猴就上,不要着急。