MySQL窗口函数

简介

MySQL从8.0开始支持窗口函数,这个功能在大多商业数据库和部分开源数据库中早已支持,有的也叫分析函数。

窗口是什么

窗口的概念非常重要,它可以理解为记录集合,窗口函数也就是在满足某种条件的记录集合上执行的特殊函数。对于每条记录都要在此窗口内执行函数,有的函数随着记录不同,窗口大小都是固定的,这种属于静态窗口;有的函数则相反,不同的记录对应着不同的窗口,这种动态变化的窗口叫滑动窗口。

窗口函数和聚合函数

  • 聚合函数是将多条记录聚合为一条;而窗口函数是每条记录都会执行,有几条记录执行完还是几条。
  • 聚合函数也可以用于窗口函数中,这个后面会举例说明。

基本用法

函数名 OVER 子句
over关键字用来指定函数执行的窗口范围,若后面括号中什么都不写,则意味着窗口包含满足WHERE条件的所有行,窗口函数基于所有行进行计算;如果不为空,则支持以下4中语法来设置窗口。
window_name:给窗口指定一个别名。如果SQL中涉及的窗口较多,采用别名可以看起来更清晰易读;
PARTITION BY 子句:窗口按照哪些字段进行分组,窗口函数在不同的分组上分别执行;
ORDER BY子句:按照哪些字段进行排序,窗口函数将按照排序后的记录顺序进行编号;
FRAME子句:FRAME是当前分区的一个子集,子句用来定义子集的规则,通常用来作为滑动窗口使用。

分类

按照功能划分,可以把MySQL支持的窗口函数分为如下几类:
序号函数:row_number() / rank() / dense_rank()
分布函数:percent_rank() / cume_dist()
前后函数:lag() / lead()
头尾函数:first_val() / last_val()
其他函数:nth_value() / nfile()

序号函数

ROW_NUMBER()

用途:显示分区中的当前行号
应用场景:查询每个学生的分数最高的前3门课程

-- 学生-课程-成绩表
+-------+----------+-------+------------+
| stu_id | lesson_id | score | create_time |
+-------+----------+-------+------------+
|     1 | L005     |    98 | 2021-10-01 |
|     1 | L004     |    88 | 2021-09-30 |
|     1 | L001     |    98 | 2021-09-21 |
|     2 | L002     |    90 | 2021-09-13 |
|     2 | L003     |    86 | 2021-09-16 |
|     2 | L001     |    84 | 2021-09-26 |
|     3 | L002     |    91 | 2021-09-02 |
|     3 | L001     |   100 | 2021-09-03 |
|     3 | L003     |    85 | 2021-09-17 |
|     4 | L001     |    99 | 2021-09-01 |
|     4 | L005     |    98 | 2021-09-10 |
|     4 | L002     |    88 | 2021-09-19 |
|     4 | L010     |   100 | 2021-09-22 |
+-------+----------+-------+------------+

查询SQL

SELECT
    *
FROM
    ( SELECT stu_id, ROW_NUMBER() OVER ( PARTITION BY stu_id ORDER BY score DESC ) AS score_order, lesson_id, score FROM t_score ) t
WHERE
    score_order <= 3;

查询结果

+-------+------------+----------+-------+
| stu_id | score_order | lesson_id | score |
+-------+------------+----------+-------+
|     1 |          1 | L005     |    98 |
|     1 |          2 | L001     |    98 |
|     1 |          3 | L004     |    88 |
|     2 |          1 | L002     |    90 |
|     2 |          2 | L003     |    86 |
|     2 |          3 | L001     |    84 |
|     3 |          1 | L001     |   100 |
|     3 |          2 | L002     |    91 |
|     3 |          3 | L003     |    85 |
|     4 |          1 | L010     |   100 |
|     4 |          2 | L001     |    99 |
|     4 |          3 | L005     |    98 |
+-------+------------+----------+-------+

对于stu_id=1的同学,有两门课程的成绩均为98,序号随机排了1和2。但很多情况下二者应该是并列第一,则他的成绩为88的这门课的序号可能是第2名,也可能为第3名。
这时候,ROW_NUMBER()就不能满足需求,需要RANK()和DENSE_RANK()出场,它们和ROW_NUMBER()非常类似,只是在出现重复值时处理逻辑有所不同。

SELECT
    *
FROM
    (
    SELECT
        stu_id,
        ROW_NUMBER() OVER ( PARTITION BY stu_id ORDER BY score DESC ) AS score_order1,
        RANK() OVER ( PARTITION BY stu_id ORDER BY score DESC ) AS score_order2,
        DENSE_RANK() OVER ( PARTITION BY stu_id ORDER BY score DESC ) AS score_order3,
        lesson_id,
        score
    FROM
        t_score
    ) t
WHERE
    score_order1 <= 3
    AND score_order2 <= 3
    AND score_order3 <= 3;

查询结果

+-------+-------------+-------------+-------------+----------+-------+
| stu_id | score_order1 | score_order2 | score_order3 | lesson_id | score |
+-------+-------------+-------------+-------------+----------+-------+
|     1 |           1 |           1 |           1 | L005     |    98 |
|     1 |           2 |           1 |           1 | L001     |    98 |
|     1 |           3 |           3 |           2 | L004     |    88 |
|     2 |           1 |           1 |           1 | L002     |    90 |
|     2 |           2 |           2 |           2 | L003     |    86 |
|     2 |           3 |           3 |           3 | L001     |    84 |
|     3 |           1 |           1 |           1 | L001     |   100 |
|     3 |           2 |           2 |           2 | L002     |    91 |
|     3 |           3 |           3 |           3 | L003     |    85 |
|     4 |           1 |           1 |           1 | L010     |   100 |
|     4 |           2 |           2 |           2 | L001     |    99 |
|     4 |           3 |           3 |           3 | L005     |    98 |
+-------+-------------+-------------+-------------+----------+-------+

ROW_NUMBER():顺序排序——1、2、3
RANK():并列排序,跳过重复序号——1、1、3
DENSE_RANK():并列排序,不跳过重复序号——1、1、2

分布函数

PERCENT_RANK()

用途:每行按照公式(rank-1) / (rows-1)进行计算。其中,rank为RANK()函数产生的序号,rows为当前窗口的记录总行数
应用场景:不常用

SELECT
    RANK() OVER w_s AS rk,
    PERCENT_RANK() OVER w_s AS prk,
    stu_id,
    lesson_id,
    score
FROM
    t_score
WHERE
    stu_id = 1 WINDOW w_s AS ( PARTITION BY stu_id ORDER BY score );

查询结果

+----+-----+-------+----------+-------+
| rk | prk | stu_id | lesson_id | score |
+----+-----+-------+----------+-------+
|  1 |   0 |     1 | L004     |    88 |
|  2 | 0.5 |     1 | L005     |    98 |
|  2 | 0.5 |     1 | L001     |    98 |
+----+-----+-------+----------+-------+

CUME_DIST()

用途:分组内小于、等于当前rank值的行数 / 分组内总行数
应用场景:查询小于等于当前成绩(score)的比例

SELECT
    stu_id,
    lesson_id,
    score,
    ROUND( CUME_DIST() OVER ( ORDER BY score ), 2 ) AS cd1,
    ROUND( CUME_DIST() OVER ( PARTITION BY lesson_id ORDER BY score ), 2 ) AS cd2
FROM
    t_score
WHERE
    lesson_id IN ( 'L001', 'L002' );

查询结果

+-------+----------+-------+------+------+
| stu_id | lesson_id | score | cd1  | cd2  |
+-------+----------+-------+------+------+
|     2 | L001     |    84 | 0.14 | 0.25 |
|     1 | L001     |    98 | 0.71 |  0.5 |
|     4 | L001     |    99 | 0.86 | 0.75 |
|     3 | L001     |   100 |    1 |    1 |
|     4 | L002     |    88 | 0.29 | 0.33 |
|     2 | L002     |    90 | 0.43 | 0.67 |
|     3 | L002     |    91 | 0.57 |    1 |
+-------+----------+-------+------+------+

前后函数

LAG(expr,n)、LEAD(expr,n)

用途:返回位于当前行的前n行(LAG(expr,n))或后n行(LEAD(expr,n))的expr的值
应用场景:查询前1名同学的成绩和当前同学成绩的差值

SELECT
    stu_id,
    lesson_id,
    score,
    pre_score,
    score - pre_score AS diff1,
    next_score,
    score - next_score AS diff2
FROM
    (
    SELECT
        stu_id,
        lesson_id,
        score,
        LAG( score, 1 ) OVER w AS pre_score,
        LEAD( score, 1 ) OVER w AS next_score
    FROM
        t_score
WHERE
    lesson_id IN ( 'L001', 'L002' ) WINDOW w AS ( PARTITION BY lesson_id ORDER BY score )) t;

查询结果

+-------+----------+-------+-----------+------+------------+------+
| stu_id | lesson_id | score | pre_score | diff1 | next_score | diff2 |
+-------+----------+-------+-----------+------+------------+------+
|     2 | L001     |    84 | NULL      | NULL |         98 |  -14 |
|     1 | L001     |    98 |        84 |   14 |         99 |   -1 |
|     4 | L001     |    99 |        98 |    1 |        100 |   -1 |
|     3 | L001     |   100 |        99 |    1 | NULL       | NULL |
|     4 | L002     |    88 | NULL      | NULL |         90 |   -2 |
|     2 | L002     |    90 |        88 |    2 |         91 |   -1 |
|     3 | L002     |    91 |        90 |    1 | NULL       | NULL |
+-------+----------+-------+-----------+------+------------+------+

头尾函数

FIRST_VALUE(expr)、LAST_VALUE(expr)

用途:返回第一个(FIRST_VALUE(expr))或最后一个(LAST_VALUE(expr))expr的值
应用场景:截止到当前成绩,按照日期排序查询第1个和最后1个同学的分数

SELECT
    stu_id,
    lesson_id,
    score,
    create_time,
    FIRST_VALUE( score ) OVER w AS first_score,
    LAST_VALUE( score ) OVER w AS last_score
FROM
    t_score
WHERE
    lesson_id IN ( 'L001', 'L002' ) WINDOW w AS ( PARTITION BY lesson_id ORDER BY create_time );

查询结果

+-------+----------+-------+------------+-----------+-----------+
| stu_id | lesson_id | score | create_time | first_score | last_score |
+-------+----------+-------+------------+-----------+-----------+
|     4 | L001     |    99 | 2021-09-01 |        99 |        99 |
|     3 | L001     |   100 | 2021-09-03 |        99 |       100 |
|     1 | L001     |    98 | 2021-09-21 |        99 |        98 |
|     2 | L001     |    84 | 2021-09-26 |        99 |        84 |
|     3 | L002     |    91 | 2021-09-02 |        91 |        91 |
|     2 | L002     |    90 | 2021-09-13 |        91 |        90 |
|     4 | L002     |    88 | 2021-09-19 |        91 |        88 |
+-------+----------+-------+------------+-----------+-----------+

其它函数

NTH_VALUE(expr,n)

用途:返回窗口中第n个expr的值。expr可以是表达式,也可以是列名
应用场景:截止到当前成绩,显示每个同学的成绩中排名第2和第3的成绩的分数

SELECT
    stu_id,
    lesson_id,
    score,
    NTH_VALUE( score, 2 ) OVER w AS second_score,
    NTH_VALUE( score, 3 ) OVER w AS third_score
FROM
    t_score
WHERE
    stu_id IN ( 1, 2 ) WINDOW w AS ( PARTITION BY stu_id ORDER BY score );

查询结果

+-------+----------+-------+--------------+------------+
| stu_id | lesson_id | score | second_score | third_score |
+-------+----------+-------+--------------+------------+
|     1 | L004     |    88 | NULL         | NULL       |
|     1 | L005     |    98 |           98 |         98 |
|     1 | L001     |    98 |           98 |         98 |
|     2 | L001     |    84 | NULL         | NULL       |
|     2 | L003     |    86 |           86 | NULL       |
|     2 | L002     |    90 |           86 |         90 |
+-------+----------+-------+--------------+------------+

NTILE(n)

用途:将分区中的有序数据分为n个等级,记录等级数
应用场景:将每门课程按照成绩分成3组

SELECT
    NTILE( 3 ) OVER w AS nf,
    stu_id,
    lesson_id,
    score
FROM
    t_score
WHERE
    lesson_id IN ( 'L001', 'L002' ) WINDOW w AS ( PARTITION BY lesson_id ORDER BY score );

查询结果

+----+-------+----------+-------+
| nf | stu_id | lesson_id | score |
+----+-------+----------+-------+
|  1 |     2 | L001     |    84 |
|  1 |     1 | L001     |    98 |
|  2 |     4 | L001     |    99 |
|  3 |     3 | L001     |   100 |
|  1 |     4 | L002     |    88 |
|  2 |     2 | L002     |    90 |
|  3 |     3 | L002     |    91 |
+----+-------+----------+-------+

NTILE(n)函数在数据分析中应用较多,比如由于数据量大,需要将数据平均分配到n个并行的进程分别计算,此时就可以用NTILE(n)对数据进行分组(由于记录数不一定被n整除,所以数据不一定完全平均),然后将不同桶号的数据再分配。

聚合函数作为窗口函数

用途:在窗口中每条记录动态地应用聚合函数(SUM()、AVG()、MAX()、MIN()、COUNT()),可以动态计算在指定的窗口内的各种聚合函数值
应用场景:截止到当前时间,查询stu_id=1的学生的累计分数、分数最高的科目、分数最低的科目

SELECT
    stu_id,
    lesson_id,
    score,
    create_time,
    SUM( score ) OVER w AS score_sum,
    MAX( score ) OVER w AS score_max,
    MIN( score ) OVER w AS score_min
FROM
    t_score
WHERE
    stu_id = 1 WINDOW w AS ( PARTITION BY stu_id ORDER BY create_time );

查询结果

+-------+----------+-------+------------+-----------+-----------+-----------+
| stu_id | lesson_id | score | create_time | score_sum | score_max | score_min |
+-------+----------+-------+------------+-----------+-----------+-----------+
|     1 | L001     |    98 | 2021-09-21 | 98        |        98 |        98 |
|     1 | L004     |    88 | 2021-09-30 | 186       |        98 |        88 |
|     1 | L005     |    98 | 2021-10-01 | 284       |        98 |        88 |
+-------+----------+-------+------------+-----------+-----------+-----------+
暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇