SQLクエリチューニングメモ

table: "table_name"

A	B
"a"	1
"a"	2
"a"	4
"b"	1
"b"	4
"c"	1
"c"	3

のようなデータが 100k レコードほど存在するデータベースにおいて、指定のB (b_list) が存在するA を ランダムに一つ抽出 する。

例:

blist: [1] -> mathced A: ["a", "b", "c"] -> ランダムに取り出し -> "b"
blist: [1, 2] -> mathced A: ["a"] -> ランダムに取り出し -> "a"
blist: [1, 4] -> matched A: ["a", "b"] -> ランダムに取り出し -> "b"

このとき、

select A from table_name
    where B in ({ b_list })
    group by A
    having count(distinct B) > { b_list_length - 1 }
    order by random()
    limit 1
;

としていたが、1.5 秒ほどかかって困っていた。

order by random() があまり良くなく、 count して自前でランダムしたほうが良いという話を見る *1 が、そもそも limit 1 を消し、得られた結果を count しても同程度に遅いのであんまり関係無さそう。

count(distinct B) が悪いのではということで、はじめから B でも group by して count() はグループ化された A の個数を数えるだけにした。

そして、

select A from (
    select A from table_name
        where B in ({b_list})
        group by A, B
    )
    group by A
    having count() > { b_list_length - 1 }
    order by random()
    limit 1
;