Postgresql源码(143)统计信息基础知识(带实例)
概念与总结
-
高频值(Most Common Values, MCV)
- 存储在 most_common_vals 中。
- 每个高频值的频率通过 most_common_freqs 单独记录(例如 0.010966667 等)。
- MCV 用于优化等值查询(如 poid = 33),直接通过频率计算选择率,无需依赖直方图。
-
直方图(Histogram Bounds)
- 仅覆盖非高频值(未出现在MCV中的值)的分布。
- 用于优化范围查询(如poid BETWEEN 1050 AND 1200),通过分桶插值估算选择率。
-
correlation:物理与逻辑顺序相关性
- correlation 表示列值的 物理存储顺序 与 逻辑顺序(升序/降序) 的线性相关性。
- 取值范围为 [-1, 1]
- 1:完全正相关(物理顺序与逻辑顺序一致,如自增主键)。
- -1:完全负相关(物理顺序与逻辑顺序相反)
- 0:无相关性(随机存储)。
实例一:3000万高频重复值(1-999)
CREATE TABLE ii (poid INT NOT NULL, value NUMERIC, status int);
-- 分布均匀的话,采样会非常准确。
insert into ii select t.i%1000, t.i, 0 from generate_series(1,30000000) t(i);
CREATE INDEX idx_n_poid ON ii(poid);
analyze ii;
统计信息
postgres=# select * from pg_stats where tablename='ii' and attname='poid';
-[ RECORD 1 ]----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | ii
attname | poid
inherited | f
null_frac | 0
avg_width | 4
n_distinct | 1000
most_common_vals | {334,580,881,431,2,33,79,112}
most_common_freqs | {0.0016,0.0015333333,0.0015333333,0.0015,0.0014666667,0.0014666667,0.0014666667,0.0014666667}
histogram_bounds | {0,10,20,30,41,52,61,71,82,91,99,109,119,129,139,149,159,168,178,189,197,207,217,226,236,246,256,266,276,286,296,306,315,325,336,346,356,365,374,386,396,406,417,427,438,448,458,468,478,488,498,508,518,528,537,549,558,569,578,590,599,609,619,629,639,649,659,669,679,687,697,707,719,729,740,750,760,771,781,791,800,810,820,829,839,849,859,869,880,891,901,910,921,931,941,950,961,970,980,990,999}
correlation | 0.002622716
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |postgres=# select count(distinct poid) from ii;count
-------1000
(1 row)postgres=# select count(1) from ii;count
----------30000000
(1 row)
- 默认采样行数:300 × default_statistics_target(默认default_statistics_target=100,即采样30000行)
- n_distinct = 1000
- 唯一值个数与真实值相等,分布均匀的情况下采样最准。
- most_common_vals = {334,580,881,431,2,33,79,112}
- 数据均匀分布,随机采3万MCV的值也比较平均
- histogram_bounds = {0,10,20,30,41,52,61,71,82,91,99,109,119,129,139,149,159,168,178,189,197,207,217,226,236,246,256,266,276,286,296,306,315,325,336,346,356,365,374,386,396,406,417,427,438,448,458,468,478,488,498,508,518,528,537,549,558,569,578,590,599,609,619,629,639,649,659,669,679,687,697,707,719,729,740,750,760,771,781,791,800,810,820,829,839,849,859,869,880,891,901,910,921,931,941,950,961,970,980,990,999}
- 直方图看起来比较平均,因为MCV比较少。
实例二:2800万高频重复值(1-99) + 200万个低频重复值(1000-1899)
CREATE TABLE id (poid INT NOT NULL, value NUMERIC, status int);
INSERT INTO id SELECT t%100, t, 0 FROM generate_series(1, 28000000) t;
INSERT INTO id SELECT 1000 + (t%900), t, 0 FROM generate_series(28000001, 30000000) t;
CREATE INDEX idx_id_poid ON id(poid);
analyze id;
统计信息
postgres=# select * from pg_stats where tablename='id' and attname='poid';
-[ RECORD 1 ]----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | id
attname | poid
inherited | f
null_frac | 0
avg_width | 4
n_distinct | 901
most_common_vals | {33,61,96,2,27,17,41,85,40,59,62,65,93,6,15,46,47,71,54,48,88,44,67,75,79,95,18,21,45,68,82,7,72,91,52,49,94,11,34,22,23,25,38,5,74,30,43,55,4,8,35,1,53,56,92,12,39,51,64,86,89,20,87,0,24,99,10,32,19,57,80,81,31,50,70,90,9,66,73,3,78,97,69,77,58,37,42,28,76,98,84,13,63,14,60,26,16,36,29,83}
most_common_freqs | {0.010966667,0.010933333,0.010866666,0.010533334,0.0104,0.0102,0.0102,0.0101666665,0.0101,0.010066667,0.010066667,0.009966667,0.009966667,0.009833333,0.009833333,0.009833333,0.009833333,0.009833333,0.0098,0.009766666,0.009766666,0.009733333,0.009733333,0.0097,0.0097,0.0097,0.009666666,0.009666666,0.009666666,0.009666666,0.009666666,0.009633333,0.009633333,0.009633333,0.009566667,0.009533334,0.009533334,0.0095,0.0095,0.009466667,0.009433334,0.009433334,0.009433334,0.0094,0.0094,0.009366667,0.009366667,0.009366667,0.009333333,0.009333333,0.009333333,0.0093,0.0093,0.0093,0.0093,0.009266667,0.009266667,0.009266667,0.009266667,0.009266667,0.009266667,0.009233333,0.009233333,0.0092,0.009166666,0.009166666,0.009133333,0.009133333,0.0090333335,0.009,0.008966667,0.008966667,0.008933334,0.008933334,0.008933334,0.008933334,0.0089,0.0089,0.0089,0.008866667,0.008866667,0.008866667,0.0088,0.008733333,0.0087,0.008666666,0.008633333,0.0086,0.0086,0.0086,0.008566666,0.008533333,0.008533333,0.0085,0.0085,0.008333334,0.0083,0.008266667,0.0082,0.0079}
histogram_bounds | {1000,1005,1019,1030,1036,1047,1056,1064,1072,1080,1089,1100,1110,1122,1132,1140,1149,1158,1168,1178,1188,1195,1208,1215,1226,1238,1245,1255,1264,1275,1286,1292,1299,1309,1316,1323,1332,1339,1347,1355,1368,1378,1390,1399,1407,1418,1430,1438,1448,1453,1462,1472,1483,1492,1507,1516,1524,1536,1543,1549,1556,1562,1572,1578,1586,1593,1604,1611,1621,1628,1635,1646,1654,1663,1673,1683,1693,1704,1709,1718,1727,1734,1745,1751,1758,1766,1776,1785,1791,1798,1806,1814,1825,1834,1841,1851,1861,1872,1883,1893,1899}
correlation | 0.19235307
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |postgres=# select count(distinct poid) from ii;count
-------1000
(1 row)postgres=# select count(1) from ii;count
----------30000000
(1 row)
- n_distinct = 901
- 评估的唯一值个数比真实的少了99个,因为数据分布不均匀了
- 采样是30000个,但采集到1-99的概率:采集到1000-1899的概率=2800:200=14:1
- 所以这里会比真实值少一些。
- most_common_vals = {33,61,96,2,27,17,41,85,40,59,62,65,93,6,15,46,47,71,54,48,88,44,67,75,79,95,18,21,45,68,82,7,72,91,52,49,94,11,34,22,23,25,38,5,74,30,43,55,4,8,35,1,53,56,92,12,39,51,64,86,89,20,87,0,24,99,10,32,19,57,80,81,31,50,70,90,9,66,73,3,78,97,69,77,58,37,42,28,76,98,84,13,63,14,60,26,16,36,29,83}
- 为什么MCV的个数比实例一多很多?
- histogram_bounds = {1000,1005,1019,1030,1036,1047,1056,1064,1072,1080,1089,1100,1110,1122,1132,1140,1149,1158,1168,1178,1188,1195,1208,1215,1226,1238,1245,1255,1264,1275,1286,1292,1299,1309,1316,1323,1332,1339,1347,1355,1368,1378,1390,1399,1407,1418,1430,1438,1448,1453,1462,1472,1483,1492,1507,1516,1524,1536,1543,1549,1556,1562,1572,1578,1586,1593,1604,1611,1621,1628,1635,1646,1654,1663,1673,1683,1693,1704,1709,1718,1727,1734,1745,1751,1758,1766,1776,1785,1791,1798,1806,1814,1825,1834,1841,1851,1861,1872,1883,1893,1899}
- 直方图显示了非MCV值的分布情况,符合预期。
实例一的MCV只有8个值,实例二MCV个数远大于实例一,差异的原因?
-
实例一(均匀分布):poid 列通过 t.i%1000 生成,每个值重复约 30,000 次(30,000,000 行 / 1000 唯一值)。由于分布均匀,MCV之间的频率差异极小。PostgreSQL只会选择 《频率显著高于平均值》的值作为 MCV。理想情况下应该一个都选不出来,但由于随机采样,这几个值是因为在采样中略微高频,所以被选出来了。所以这里数量少的原因是,数据太平均了,选出来的比较少。
-
实例二(非均匀分布):poid 列分为 100 个高频值(重复 280,000 次)和 900 个低频值(重复约 2,222 次)。高频值的频率(约 0.01)远高于低频值(约 0.0002),PostgreSQL 将 《所有高频值》存入 most_common_vals。由于默认 default_statistics_target=100,系统会尽量填满 MCV 列表。
实例三:2999万高频重复值(1-49) + 1万个低频重复值(1000-1949)
-- 50个值高频重复值(2999万)+950个低频唯一值(1万)
CREATE TABLE iee (poid INT NOT NULL, value NUMERIC, status int);
INSERT INTO iee SELECT t%50, t, 0 FROM generate_series(1, 29990000) t;
INSERT INTO iee SELECT 1000 + (t%950), t, 0 FROM generate_series(29990001, 30000000) t;
CREATE INDEX idx_iee_poid ON iee(poid);
analyze iee;
统计信息
postgres=# select * from pg_stats where tablename='iee' and attname='poid';
-[ RECORD 1 ]----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | iee
attname | poid
inherited | f
null_frac | 0
avg_width | 4
n_distinct | 56
most_common_vals | {26,3,41,1,20,38,15,37,8,22,46,27,32,48,29,7,43,30,44,12,16,42,5,25,40,24,34,39,6,49,0,11,28,21,18,14,17,33,47,35,31,2,4,10,45,9,19,23,13,36}
most_common_freqs | {0.021866666,0.0215,0.021233333,0.021066668,0.021,0.020966666,0.020766666,0.020766666,0.0206,0.0206,0.0206,0.020433333,0.020366667,0.020333333,0.0203,0.020266667,0.020266667,0.0202,0.020166667,0.020133333,0.0201,0.020066667,0.020033333,0.020033333,0.02,0.019933334,0.019933334,0.019933334,0.019866666,0.019866666,0.0198,0.0198,0.019766666,0.019633334,0.0196,0.019566666,0.019533332,0.019533332,0.0195,0.019466667,0.019433333,0.019366667,0.0193,0.019266667,0.0191,0.019033333,0.018933333,0.018833334,0.0187,0.018433332}
histogram_bounds | {1029,1243,1267,1378,1419,1797}
correlation | 0.026665932
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |postgres=# select count(distinct poid) from ir;count
-------1000
(1 row)postgres=# select count(1) from ir;count
----------30000000
(1 row)
- n_distinct = 56
- 唯一值的偏差更大了
- 因为50个重复值占据了2999万数据,采样30000条大概率都会采样到这50个数,所以这里n_distinct只有56个。
- most_common_vals = {26,3,41,1,20,38,15,37,8,22,46,27,32,48,29,7,43,30,44,12,16,42,5,25,40,24,34,39,6,49,0,11,28,21,18,14,17,33,47,35,31,2,4,10,45,9,19,23,13,36}
- histogram_bounds = {1029,1243,1267,1378,1419,1797}
- 直方图的值少了很多,因为采样到最后1万的概率会很低。
附其他测试数据
drop table mm;
CREATE TABLE mm (poid int primary key, edata date, ooid int);
CREATE INDEX idx_mm_1 ON mm(edata);
insert into mm select t.i, '2025-01-01 10:00:00',100 from generate_series(1,200) t(i);
insert into mm select t.i, '2025-01-01 11:00:00',130 from generate_series(201,400) t(i);
insert into mm select t.i, '2025-01-01 12:00:00',100 from generate_series(401,600) t(i);
insert into mm select t.i, '2025-01-01 13:00:00',130 from generate_series(601,800) t(i);
analyze mm;CREATE TABLE ii (poid INT NOT NULL, value NUMERIC, status int);
-- 分布均匀的话,采样会非常准确。
insert into ii select t.i%1000, t.i, 0 from generate_series(1,30000000) t(i);
CREATE INDEX idx_n_poid ON ii(poid);
analyze ii;postgres=# select * from pg_stats where tablename='ii' and attname='poid';
-[ RECORD 1 ]----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | ii
attname | poid
inherited | f
null_frac | 0
avg_width | 4
n_distinct | 1000
most_common_vals | {334,580,881,431,2,33,79,112}
most_common_freqs | {0.0016,0.0015333333,0.0015333333,0.0015,0.0014666667,0.0014666667,0.0014666667,0.0014666667}
histogram_bounds | {0,10,20,30,41,52,61,71,82,91,99,109,119,129,139,149,159,168,178,189,197,207,217,226,236,246,256,266,276,286,296,306,315,325,336,346,356,365,374,386,396,406,417,427,438,448,458,468,478,488,498,508,518,528,537,549,558,569,578,590,599,609,619,629,639,649,659,669,679,687,697,707,719,729,740,750,760,771,781,791,800,810,820,829,839,849,859,869,880,891,901,910,921,931,941,950,961,970,980,990,999}
correlation | 0.002622716
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |postgres=# select count(distinct poid) from ii;count
-------1000
(1 row)postgres=# select count(1) from ii;count
----------30000000
(1 row)-- 100个值高频重复值(2800万)+900个低频重复值(200万)
CREATE TABLE id (poid INT NOT NULL, value NUMERIC, status int);
INSERT INTO id SELECT t%100, t, 0 FROM generate_series(1, 28000000) t;
INSERT INTO id SELECT 1000 + (t%900), t, 0 FROM generate_series(28000001, 30000000) t;
CREATE INDEX idx_id_poid ON id(poid);
analyze id;postgres=# select * from pg_stats where tablename='id' and attname='poid';
-[ RECORD 1 ]----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | id
attname | poid
inherited | f
null_frac | 0
avg_width | 4
n_distinct | 901
most_common_vals | {33,61,96,2,27,17,41,85,40,59,62,65,93,6,15,46,47,71,54,48,88,44,67,75,79,95,18,21,45,68,82,7,72,91,52,49,94,11,34,22,23,25,38,5,74,30,43,55,4,8,35,1,53,56,92,12,39,51,64,86,89,20,87,0,24,99,10,32,19,57,80,81,31,50,70,90,9,66,73,3,78,97,69,77,58,37,42,28,76,98,84,13,63,14,60,26,16,36,29,83}
most_common_freqs | {0.010966667,0.010933333,0.010866666,0.010533334,0.0104,0.0102,0.0102,0.0101666665,0.0101,0.010066667,0.010066667,0.009966667,0.009966667,0.009833333,0.009833333,0.009833333,0.009833333,0.009833333,0.0098,0.009766666,0.009766666,0.009733333,0.009733333,0.0097,0.0097,0.0097,0.009666666,0.009666666,0.009666666,0.009666666,0.009666666,0.009633333,0.009633333,0.009633333,0.009566667,0.009533334,0.009533334,0.0095,0.0095,0.009466667,0.009433334,0.009433334,0.009433334,0.0094,0.0094,0.009366667,0.009366667,0.009366667,0.009333333,0.009333333,0.009333333,0.0093,0.0093,0.0093,0.0093,0.009266667,0.009266667,0.009266667,0.009266667,0.009266667,0.009266667,0.009233333,0.009233333,0.0092,0.009166666,0.009166666,0.009133333,0.009133333,0.0090333335,0.009,0.008966667,0.008966667,0.008933334,0.008933334,0.008933334,0.008933334,0.0089,0.0089,0.0089,0.008866667,0.008866667,0.008866667,0.0088,0.008733333,0.0087,0.008666666,0.008633333,0.0086,0.0086,0.0086,0.008566666,0.008533333,0.008533333,0.0085,0.0085,0.008333334,0.0083,0.008266667,0.0082,0.0079}
histogram_bounds | {1000,1005,1019,1030,1036,1047,1056,1064,1072,1080,1089,1100,1110,1122,1132,1140,1149,1158,1168,1178,1188,1195,1208,1215,1226,1238,1245,1255,1264,1275,1286,1292,1299,1309,1316,1323,1332,1339,1347,1355,1368,1378,1390,1399,1407,1418,1430,1438,1448,1453,1462,1472,1483,1492,1507,1516,1524,1536,1543,1549,1556,1562,1572,1578,1586,1593,1604,1611,1621,1628,1635,1646,1654,1663,1673,1683,1693,1704,1709,1718,1727,1734,1745,1751,1758,1766,1776,1785,1791,1798,1806,1814,1825,1834,1841,1851,1861,1872,1883,1893,1899}
correlation | 0.19235307
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |postgres=# select count(distinct poid) from ii;count
-------1000
(1 row)postgres=# select count(1) from ii;count
----------30000000
(1 row)-- 50个值高频重复值(2970万)+950个低频唯一值(30万)
CREATE TABLE ie (poid INT NOT NULL, value NUMERIC, status int);
INSERT INTO ie SELECT t%50, t, 0 FROM generate_series(1, 29700000) t;
INSERT INTO ie SELECT 1000 + (t%950), t, 0 FROM generate_series(29700001, 30000000) t;
CREATE INDEX idx_ie_poid ON ie(poid);
analyze ie;postgres=# select * from pg_stats where tablename='ie' and attname='poid';
-[ RECORD 1 ]----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | ie
attname | poid
inherited | f
null_frac | 0
avg_width | 4
n_distinct | 318
most_common_vals | {20,18,44,10,13,9,24,19,29,47,12,30,32,49,48,5,25,2,39,6,7,14,36,43,26,35,16,42,38,31,0,46,4,8,23,11,15,37,41,34,17,45,22,28,21,40,1,3,33,27}
most_common_freqs | {0.021433333,0.021133333,0.021066668,0.021,0.020933334,0.0209,0.020866666,0.020833334,0.020766666,0.0207,0.0206,0.0205,0.0205,0.0204,0.020333333,0.020266667,0.020266667,0.020166667,0.020166667,0.020133333,0.020133333,0.020133333,0.02,0.02,0.019866666,0.019766666,0.019733334,0.019733334,0.019666666,0.019566666,0.019533332,0.019533332,0.019366667,0.019366667,0.019366667,0.0193,0.019233333,0.019166667,0.0191,0.0189,0.018833334,0.018833334,0.018733334,0.0187,0.0186,0.0185,0.0184,0.0184,0.0183,0.0176}
histogram_bounds | {1003,1015,1024,1030,1040,1051,1082,1090,1101,1114,1118,1132,1137,1149,1164,1169,1173,1183,1191,1197,1204,1209,1225,1234,1250,1257,1272,1282,1288,1293,1300,1303,1312,1325,1332,1347,1361,1377,1392,1409,1415,1419,1427,1431,1440,1444,1448,1455,1462,1482,1490,1505,1514,1524,1529,1535,1539,1544,1558,1567,1575,1581,1584,1592,1597,1613,1620,1625,1639,1653,1669,1673,1681,1688,1698,1699,1707,1721,1727,1737,1755,1763,1768,1786,1795,1804,1810,1818,1822,1840,1856,1859,1869,1882,1890,1898,1902,1904,1917,1925,1943}
correlation | 0.057023432
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |postgres=# select count(distinct poid) from ie;count
-------1000
(1 row)postgres=# select count(1) from ie;count
----------30000000
(1 row)postgres=# select * from ie limit 10;poid | value | status
------+-------+--------1 | 1 | 02 | 2 | 03 | 3 | 04 | 4 | 05 | 5 | 06 | 6 | 07 | 7 | 08 | 8 | 09 | 9 | 010 | 10 | 0
(10 rows)-- 50个值高频重复值(2970万)+950个低频唯一值(30万)+ 随机分布CREATE TABLE ir (poid INT NOT NULL, value NUMERIC, status int);
INSERT INTO ir SELECT t%50, t, 0 FROM generate_series(1, 29700000) t ORDER BY random();
INSERT INTO ir SELECT 1000 + (t%950), t, 0 FROM generate_series(29700001, 30000000) t ORDER BY random();
CREATE INDEX idx_ir_poid ON ir(poid);
analyze ir;postgres=# select * from pg_stats where tablename='ir' and attname='poid';
-[ RECORD 1 ]----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | ir
attname | poid
inherited | f
null_frac | 0
avg_width | 4
n_distinct | 330
most_common_vals | {9,29,14,34,25,46,17,39,21,1,18,45,49,27,38,41,42,36,30,11,35,16,47,2,15,40,0,31,24,37,19,44,48,23,3,4,13,6,32,33,43,12,26,10,28,8,22,5,20,7}
most_common_freqs | {0.021433333,0.021233333,0.021,0.0209,0.020833334,0.0208,0.020766666,0.020733334,0.020666666,0.020433333,0.020433333,0.020366667,0.020366667,0.0203,0.0203,0.0202,0.0202,0.020166667,0.020133333,0.0201,0.020066667,0.020033333,0.020033333,0.02,0.019833334,0.0198,0.019733334,0.0197,0.0196,0.0196,0.019566666,0.019566666,0.019566666,0.019466667,0.019433333,0.019366667,0.0193,0.019233333,0.019233333,0.019133333,0.0191,0.019033333,0.018833334,0.018733334,0.0187,0.018533334,0.018466666,0.018366667,0.018333333,0.017566666}
histogram_bounds | {1001,1006,1014,1017,1024,1040,1046,1058,1063,1073,1086,1098,1111,1117,1135,1142,1146,1158,1165,1175,1180,1189,1199,1204,1215,1232,1247,1250,1262,1273,1281,1286,1296,1299,1301,1314,1322,1325,1332,1336,1343,1353,1360,1374,1385,1404,1418,1434,1442,1456,1466,1476,1481,1487,1502,1511,1515,1518,1539,1550,1562,1573,1580,1583,1603,1621,1642,1659,1666,1675,1682,1690,1693,1703,1709,1726,1744,1751,1758,1762,1773,1778,1782,1787,1791,1803,1810,1830,1835,1853,1874,1880,1882,1895,1897,1907,1916,1927,1933,1938,1949}
correlation | 0.057593007
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |postgres=# select count(distinct poid) from ir;count
-------1000
(1 row)postgres=# select count(1) from ir;count
----------30000000
(1 row)postgres=# select * from ir limit 10;poid | value | status
------+----------+--------0 | 26165250 | 043 | 21564993 | 02 | 4975852 | 047 | 28519047 | 037 | 12294337 | 03 | 15888653 | 018 | 18510218 | 00 | 16988450 | 042 | 24585492 | 032 | 19939732 | 0
(10 rows)-- 50个值高频重复值(2999万)+950个低频唯一值(1万)
CREATE TABLE iee (poid INT NOT NULL, value NUMERIC, status int);
INSERT INTO iee SELECT t%50, t, 0 FROM generate_series(1, 29990000) t;
INSERT INTO iee SELECT 1000 + (t%950), t, 0 FROM generate_series(29990001, 30000000) t;
CREATE INDEX idx_iee_poid ON iee(poid);
analyze iee;postgres=# select * from pg_stats where tablename='iee' and attname='poid';
-[ RECORD 1 ]----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | iee
attname | poid
inherited | f
null_frac | 0
avg_width | 4
n_distinct | 56
most_common_vals | {26,3,41,1,20,38,15,37,8,22,46,27,32,48,29,7,43,30,44,12,16,42,5,25,40,24,34,39,6,49,0,11,28,21,18,14,17,33,47,35,31,2,4,10,45,9,19,23,13,36}
most_common_freqs | {0.021866666,0.0215,0.021233333,0.021066668,0.021,0.020966666,0.020766666,0.020766666,0.0206,0.0206,0.0206,0.020433333,0.020366667,0.020333333,0.0203,0.020266667,0.020266667,0.0202,0.020166667,0.020133333,0.0201,0.020066667,0.020033333,0.020033333,0.02,0.019933334,0.019933334,0.019933334,0.019866666,0.019866666,0.0198,0.0198,0.019766666,0.019633334,0.0196,0.019566666,0.019533332,0.019533332,0.0195,0.019466667,0.019433333,0.019366667,0.0193,0.019266667,0.0191,0.019033333,0.018933333,0.018833334,0.0187,0.018433332}
histogram_bounds | {1029,1243,1267,1378,1419,1797}
correlation | 0.026665932
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |postgres=# select count(distinct poid) from ir;count
-------1000
(1 row)postgres=# select count(1) from ir;count
----------30000000
(1 row)
相关文章:
Postgresql源码(143)统计信息基础知识(带实例)
概念与总结 高频值(Most Common Values, MCV) 存储在 most_common_vals 中。每个高频值的频率通过 most_common_freqs 单独记录(例如 0.010966667 等)。MCV 用于优化等值查询(如 poid 33)&…...
【含文档+PPT+源码】基于SpringBoot+vue的疫苗接种系统的设计与实现
项目介绍 本课程演示的是一款 基于SpringBootvue的疫苗接种系统的设计与实现,主要针对计算机相关专业的正在做毕设的学生与需要项目实战练习的 Java 学习者。 1.包含:项目源码、项目文档、数据库脚本、软件工具等所有资料 2.带你从零开始部署运行本套系…...
解决 Dart Sass 的旧 JS API 弃用警告 的详细步骤和解决方案
以下是解决 Dart Sass 的旧 JS API 弃用警告 的详细步骤和解决方案: 错误原因 Dart Sass 1.x 版本中使用的旧 JavaScript API(如 sass.render() 或 sass.compile() 的旧调用方式)将在 2.0.0 版本中被移除。需迁移到新 API 以避免未来报错。…...
Concepts (C++20)
C20 Concepts Concepts 是 C20 引入的核心特性,用于显式约束模板参数,提升代码可读性和错误提示。以下通过代码示例和原理分步骤解析其用法。 1. 基本概念 目标:显式声明模板参数必须满足的条件。优势:替代复杂的 SFINAE 和 ena…...
CVE-2024-23897-Jenkins 2.441之前版本存在任意文件读取漏洞
1.漏洞介绍 Jenkins 2.441及更早版本,以及LTS 2.426.2及更早版本没有禁用其CLI命令解析器的一个功能,该功能会将参数中字符后跟的文件路径替换为该文件的内容,允许未经身份验证的攻击者读取Jenkins控制器文件系统上的任意文件。 2.poc利用 下…...
利用 SSE 实现文字吐字效果:技术与实践
利用 SSE 实现文字吐字效果:技术与实践 引言 在现代 Web 应用开发中,实时交互功能愈发重要。例如,在线聊天、实时数据监控、游戏中的实时更新等场景,都需要服务器能够及时将数据推送给客户端。传统的请求 - 响应模式在处理实时性要求较高的场景时显得力不从心,而 Server…...
离线部署kubernetes
麒麟Linux服务器 AMR架构 🧰 离线部署 Kubernetes v1.25.9(麒麟系统 Docker) 一、验证Docker部署状态 检查Docker服务运行状态 systemctl status docker 预期输出应显示 Active: active (running),表明服务已启动18。 …...
2024武汉邀请赛B.Countless Me
题目链接 #include<bits/stdc.h> using namespace std; using lllong long;int main() {ios::sync_with_stdio(0),cin.tie(0),cout.tie(0);ll n; cin>>n;vector<ll>a(n1);ll res0;for(int i1;i<n;i) cin>>a[i],resa[i];ll ans0;for(int i32;i>…...
第53讲 农学科研中的AI伦理与可解释性——探索SHAP值、LIME等可解释工具与科研可信性建设之道
目录 一、为什么农学科研中需要“可解释AI”? ✅ 场景示例: 二、常见可解释AI工具介绍 1. SHAP(SHapley Additive exPlanations) 2. LIME(Local Interpretable Model-agnostic Explanations) 三、AI伦理问题在农学中的体现 🧭 公平性与偏见 🔐 数据隐私 🤖…...
《Python3网络爬虫开发实战(第二版)》配套案例 spa6
Scrape | Moviehttps://spa6.scrape.center/ 请求影片列表api时,不仅有分页参数,还多了一个token,通过重发请求发现token有时间限制,所以得逆向token的生成代码。 通过xhr断点定位到接口请求位置 刷新页面或者点翻页按钮&#x…...
面试题:Java程序CPU 100%问题排查指南
Java程序CPU 100%问题排查指南 当Java程序出现CPU使用率达到100%的情况时,通常意味着程序存在性能瓶颈或无限循环等问题。以下是系统化的排查方法和解决方案: 1. 快速定位问题线程 使用top命令初步定位 top -H -p <java_pid> # 查看Java进程的所有线程CPU占用线程…...
YOLOv12的注意力机制革新与实时检测性能分析——基于架构优化与历史版本对比
目录 一、摘要 二、引言 三、YOLO架构的技术演变 四、YOLOv12的架构设计 主干网特征提取 头部特征融合和目标检测 五、YOLOv12的架构创新 区域注意力模块 残差高效层聚合网络(R-ELAN) 其他改进和效率提升 六、YOLOv12 的基准评估 延迟与精度…...
Ollama工具调用(Tool Calls)业务应用案例
场景:电商客服自动处理退货请求 业务需求:用户通过聊天界面申请退货,模型需调用外部工具验证订单状态、触发退货流程,并返回处理结果。 1. 定义工具列表 在请求中声明模型可调用的工具(函数)及其参数格式…...
输入捕获模式测频率
前提工作: PA6、PA0通过跳线相连,PA6测试PA0的输出频率 本来只有下列函数,改变占空比 但是我们需要测试频率,需要动态改变频率。 void PWM_SetCompare1(uint16_t Compare) {TIM_SetCompare1(TIM2, Compare); //设置CCR1的值 }…...
【Vue3 实战】插槽封装与懒加载
一、为什么需要插槽?从一个面板组件说起 在电商首页开发中,经常遇到这样的场景: 「新鲜好物」「人气推荐」同样类型模块都需要相同的标题栏,但内容区布局不同 这时候,插槽(Slot)就像一个「内容…...
Matlab 复合多层结构的隔声研究
应用转移矩阵的方法,就平面声波垂直入射的情况,对具有周期结构的无限大多层板的隔声特性进行了理论分析,并对结构不同的多层板的隔声特性进行了数值模拟.理论分析和数值模拟表明:与通常隔声用的单层或双层板相比,在保持面密度不变的条件下,采用多层板结构能够在某些…...
vulkanscenegraph显示倾斜模型(6)-帧循环
前言 上一部分,通过十个章节的内容,对视景器的初始化与准备工作进行了系统性的剖析。本章将在该基础上,探讨vsg中的帧循环机制,主要包含前进到下一帧、事件处理、更新、记录与提交、呈现五个部分,同时整个过程包含了复…...
k8s 1.26版部署
环境规划: pod网段:10.244.0.0/16 service网段:10.10.0.0/16 注意: pod和service网段不可冲突,如果冲突会导致K8S集群安装失败。 容器运行时本次使用containerd。 主机规划: 一、初始化系统(所有节点) 1. 主机名定义以及解析 2. 关闭防火墙 3. 关闭selinux 4. 时间同…...
Android之AI自动化测试--Midscene
文章目录 前言一、准备工作1.安装2.准备 API Key3.安装 adb4.连接设备 二、yaml格式自动化脚本1. 脚本案例2.执行结果 三、文件结构变化android 部分 前言 字节 Web Infra团队官宣Midscene 从 v0.15 开始支持 Android 自动化测试,本篇文章介绍yaml方式的Android自动…...
Cadence 建立复合原理图封装时怎么切换页面
1.在当前页面A绘制完成,若要切换到下一页面B。怎么操作呢? 见下面: CTRLN,切换到下一部分(CTRLB,切换到前一部分)继续放线以及管脚 即:此时在原理图库的A部分 此时按 CTRLN,切换到下一B部分...
Sharding-JDBC 系列专题 - 第八篇:数据治理与高级功能
Sharding-JDBC 系列专题 - 第八篇:数据治理与高级功能 本系列专题旨在帮助开发者全面掌握 Sharding-JDBC,一个轻量级的分布式数据库中间件。本篇作为系列的第八篇文章,将重点探讨 数据治理(Data Governance) 和 高级功能,包括数据加密、影子表、SQL 审计以及 ShardingSp…...
今日行情明日机会——20250424
指数依然是震荡走势,接下来两天调整的概率较大 2025年4月24日涨停主要行业方向分析 一、主要方向 化工(新能源材料) • 涨停家数:8家(最强方向)。 • 代表标的: ◦ 中欣氟材(3连板…...
Kubernetes 常用运维命令整理
目录 Kubernetes 常用运维命令整理一、集群管理二、Pod 和容器管理三、Deployment 和应用管理四、Service 和网络管理五、存储管理六、ConfigMap 和 Secret 管理七、资源使用与监控八、调度和容错九、Role 和权限管理十、清理资源 总结 Kubernetes 常用运维命令整理 Kubernete…...
【Python爬虫基础篇】--4.Selenium入门详细教程
先解释:Selenium:n.硒;硒元素 目录 1.Selenium--简介 2.Selenium--原理 3.Selenium--环境搭建 4.Selenium--简单案例 5.Selenium--定位方式 6.Selenium--常用方法 6.1.控制操作 6.2.鼠标操作 6.3.键盘操作 6.4.获取断言信息 6.5.…...
基于Vulkan Specialization Constants的材质变体系统
材质变体 所谓材质变体,指的是一份材质代码文件,最终对应的是多份运行时gpu程序。比如,shader代码里面有开关或者选项,不同的组合对应不同的最终gpu program。那么,所有的这些组合对应的gpu program,可以统…...
Langchain+RAG+向量数据库
加载数据 import osimport lancedb from langchain_community.document_loaders import TextLoader from langchain_community.embeddings import BaichuanTextEmbeddings from langchain_community.vectorstores import LanceDB from langchain_core.output_parsers import St…...
Stack和Queue和deque的讲解(底层实现 手撕版)
一.底层的基本思路 我们cpp中实现的栈和队列不同于我们数据结构c语言实现的栈和队列,c语言中实现的栈和队列都是通过一个数组指针的形式来完成,每个函数都需要写大量的代码,但是我们的cpp,就是通过函数模板 适配器来完成的。 我们…...
《Pinia 从入门到精通》Vue 3 官方状态管理 -- 插件扩展篇
使用插件扩展功能 可以同时使用多个插件(插件“中间件式”机制)一、使用多个插件的方式二、插件机制简图三、插件互不冲突的关键点四、实战示例:多插件组合使用五、组合使用注意事项推荐插件组合搭配方案(实战模板) 根…...
JavaScript 中的 Reflect 对象:深入理解与应用
JavaScript 中的 Reflect 对象:深入理解与应用 一、引言 在 JavaScript 不断发展的过程中,ES6 引入了许多新的特性和对象,其中 Reflect 对象是一个强大且实用的工具。Reflect 对象提供了一系列静态方法,这些方法与 Proxy 对象的…...
dirsearch 使用教程:详细指南与配置解析
dirsearch 是一款强大的开源命令行工具,用于对 Web 服务器进行目录和文件暴力破解。它通过扫描目标网站,尝试发现隐藏的目录、文件或潜在的敏感资源,广泛应用于渗透测试和安全审计。dirsearch 提供丰富的选项和灵活的配置文件支持,…...
【C++基础知识】C++类型特征组合:`disjunction_v` 和 `conjunction_v` 深度解析
这两个模板是C17引入的类型特征组合工具,用于构建更复杂的类型判断逻辑。下面我将从技术实现到实际应用进行全面剖析: 一、基本概念与C引入版本 1. std::disjunction_v (逻辑OR) 引入版本:C17功能:对多个类型特征进行逻辑或运算…...
ctfhow——web入门214~218(时间盲注开始)
web入门214 #another:uwvwko import requestsurlhttp://b0c11589-31c9-4bf9-8b66-6b5a1fc08726.challenge.ctf.show/api/index.php flag str{-_1234567890qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM}for i in range(1,50):for j in str:# 查数据库# payload "…...
shell练习(2)
1.给脚本service.sh进行修改,当执行的时候要求输入(1、2、3、4、5)时安装对应的httpd、vim、wget、更换aliyum等功能,当输入错误 时提示应该输入正确的值但是不会退出。 [rootbogon yy]# cat service.sh #!/bin/bash while : do cat <<-EOF --------------…...
【GO语言小案例手记】基于GIN的简易代理网关
基于GIN的简易代理网关 背景目标开工依赖主体代码配置文件 后记 背景 正好最近对GO也有点兴趣,搞个小项目练练手。 目标 网关需要能够根据路由自动映射到服务支持轮询、加权轮询、随机轮询三种算法简单好理解好使用,最好一个配置文件就能跑起来网关本…...
Qt 入门 6 之布局管理
Qt 入门 6 之布局管理 对于一个完整的软件,布局管理时必不可少的。其会让界面中嗯嗯部件呈现一个整齐的排列,也可令其大小随着窗口界面的大小变换而变化Qt 主要提供了QLayout 类及其子类作为布局管理器,他们可以实现常用的布局管理功能&…...
Java技术体系的主要产品线详解
Java技术体系的主要产品线详解 Java Card:支持Java小程序(Applets)运行在小内存设备(如智能卡)上的平台。 Java ME(Micro Edition):支持Java程序运行在移动终端(手机、P…...
第四章: 服务集成抽象
Chapter 4: 服务集成抽象 🌟 从上一章到本章 在第三章:传输机制中,我们学习了如何通过STDIO和SSE协议让LLM与不同服务器通信。现在想象这样的场景:你的AI助手需要同时操作本地文件和云端数据库。这时问题来了——如何让LLM像操作…...
高精度并行2D圆弧拟合(C++)
依赖库 Eigen3 GLM Ceres-2.1.0 glog-0.6.0 gflag-2.2.2 基本思路 Step 1: RANSAC找到圆弧,保留inliers点; Step 2:使用ceres非线性优化的方法,拟合inliers点,得到圆心和半径; -------…...
【防火墙 pfsense】1简介
(1) pfSense 有以下可能的用途: 边界防火墙 路由器 交换机 无线路由器 / 无线接入点 从OSI7层模型了解设备在典型网络结构中所处的位置。 (2)边界防火墙 ->要充当边界防火墙,pfSense 系统至少需要两个接…...
GPT-4o最新图像生成完全指南:10大应用场景与提示词模板
引言 OpenAI于近期推出的全新GPT-4o图像生成功能,代表了AI图像创作领域的重大突破。作为一个原生多模态系统,GPT-4o将文本理解和图像生成无缝整合,为创作者、教育工作者和专业人士提供了前所未有的视觉创作灵活性。本文将分享10个GPT-4o图像…...
32单片机——外部中断
STM32F103ZET6的系统中断有10个,外部中断有60个 1、中断的概念 中断是为使单片机具有对外部或内部随机发生的事件实时处理而设置的,中断功能的存在,很大程度上提高了单片机处理外部或内部事件的能力 eg::你打开火&…...
《Pinia 从入门到精通》Vue 3 官方状态管理 -- 进阶使用篇
《Pinia 从入门到精通》Vue 3 官方状态管理 – 基础入门篇 《Pinia 从入门到精通》Vue 3 官方状态管理 – 进阶使用篇 《Pinia 从入门到精通》Vue 3 官方状态管理 – 插件扩展篇 目录 Store 的模块化设计4.1 多模块结构设计✅ 推荐目录结构(中大型项目) …...
HarmonyOs @hadss/hmrouter路由接入
参考文档:官方文档 在根目录oh-package.json5配置 {"dependencies": {"hadss/hmrouter": "^1.0.0-rc.11"} }加入路由编译插件 hvigor/hvigor-config.json文件 {"dependencies": {"hadss/hmrouter-plugin": &…...
第九节:性能优化高频题-首屏加载优化策略
路由懒加载:component: () > import(‘…’) CDN加速第三方库、Tree-Shaking移除未使用代码 前端首屏加载优化核心策略解析 一、路由懒加载:按需拆分代码块 实现原理 通过动态导入语法 import() 将路由组件拆分为独立代码块,仅在用户访问…...
ESP32_IDF_VScode安装多版本共存
ESP32_IDF_VScode安装多版本共存 一、安装离线版本idf 详情见文章:ESP32_IDF_基于win11的开发环境搭建 二、windows的VScode安装乐鑫插件 三、导入已经安装好的idf(将VScode插件和本地安装的IDF绑定的一个关系) 1、选择“配置ESP-IDF扩展”…...
JavaScript 的“积木”:函数入门与实践
引言:告别重复,拥抱模块化 想象一下,你在写代码时发现,有几段逻辑几乎一模一样,需要在不同的地方反复使用。你是选择每次都复制粘贴,还是希望能像搭积木一样,把这段逻辑封装起来,需…...
代码注释标记的含义
在代码中,TODO 是一种常用的注释标记,用于标识需要后续处理或完善的任务。它是开发者之间的常见约定,帮助团队协作和任务管理。以下是详细解释: 1. TODO 的核心含义 待办事项:标记代码中需要完成但尚未实现的功能、需…...
深度学习:迁移学习
迁移学习 标题1.什么是迁移学习 迁移学习(Transfer Learning)是一种机器学习方法,就是把为任务 A 开发 的模型作为初始点,重新使用在为任务 B 开发模型的过程中。迁移学习是通过 从已学习的相关任务中转移知识来改进学习的新任务,虽然大多数…...
Nest集成健康检查
文章目录 前言✅ NestJS 健康检查集成思路(标准实践)📦 推荐使用官方包: 🧱 结构设计✅ 1. 创建健康模块✅ 2. 集成 nestjs/terminushealth.module.tshealth.controller.ts ✅ 3. 在 AppModule 注册模块 🔍…...
第十五届蓝桥杯 2024 C/C++组 拼正方形
目录 题目: 题目描述: 题目链接: 思路: 思路详解: 易错点: 代码: 代码详解: 题目: 题目描述: 题目链接: P10898 [蓝桥杯 2024 省 C] 拼正…...