前言
对于Pod落地的BI日志, 有几种处理方法.
- 直接落本地磁盘
- 挂载NAS
- 挂载对象存储
首先很明显, 这3个存储方式, 性能从高到低是 1 > 2 > 3。而站在需求的角度上考虑, 首先是在弹性伸缩的前提下, 如何保证BI的日志不丢, 如果丢了需要保留原始日志做补传。那日志肯定是需要保留的, 而集群的弹性伸缩, node肯定是要删掉的。所以落本地磁盘就有几个问题。
- 如何保证日志收集完整上传日志以后才删除node.
- 如何保留源日志,以便出现问题能够补发之类的操作.
这次主要是针对于3种类型磁盘, 简单做一个对比.
OSS 测试
写入
- 使用 1 MB 的 I/O 块大小和至少 64 的 I/O 深度,通过执行具有多个(16 个或更多)并行数据流的顺序写入来测试写入吞吐量:
1
2
3
4
5fio --name=write_throughput --directory=oss-test --numjobs=16 \
--size=4G --time_based --runtime=60s --ramp_time=2s --ioengine=libaio \
--direct=1 --verify=0 --bs=1M --iodepth=64 --rw=write \
--group_reporting=1 --iodepth_batch_submit=64 \
--iodepth_batch_complete_max=64
测试结果
任务数量 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25write_throughput: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
write_throughput: (groupid=0, jobs=1): err= 0: pid=2381: Thu May 25 12:51:09 2023
write: IOPS=139, BW=140MiB/s (147MB/s)(8448MiB/60218msec)
slat (msec): min=187, max=1194, avg=456.97, stdev=112.05
clat (nsec): min=2622, max=12165, avg=7049.83, stdev=1409.41
lat (msec): min=187, max=1194, avg=458.17, stdev=111.63
clat percentiles (nsec):
| 1.00th=[ 3280], 5.00th=[ 5408], 10.00th=[ 5920], 20.00th=[ 6112],
| 30.00th=[ 6304], 40.00th=[ 6432], 50.00th=[ 6688], 60.00th=[ 7072],
| 70.00th=[ 7456], 80.00th=[ 7904], 90.00th=[ 8768], 95.00th=[ 9536],
| 99.00th=[12096], 99.50th=[12224], 99.90th=[12224], 99.95th=[12224],
| 99.99th=[12224]
bw ( KiB/s): min=130810, max=262144, per=100.00%, avg=148014.19, stdev=44173.75, samples=116
iops : min= 127, max= 256, avg=144.53, stdev=43.15, samples=116
lat (usec) : 4=1.49%, 10=94.00%, 20=4.51%
cpu : usr=1.40%, sys=0.19%, ctx=67094, majf=0, minf=4
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=116.0%
submit : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=100.0%, >=64=0.0%
complete : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=100.0%, >=64=0.0%
issued rwts: total=0,8384,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=140MiB/s (147MB/s), 140MiB/s-140MiB/s (147MB/s-147MB/s), io=8448MiB (8858MB), run=60218-60218msec任务数量 16
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24Jobs: 7 (f=7): [f(3),_(4),f(1),_(1),f(2),_(4),f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
write_throughput: (groupid=0, jobs=16): err= 0: pid=646: Thu May 25 08:05:26 2023
write: IOPS=118, BW=133MiB/s (140MB/s)(9216MiB/69119msec)
slat (msec): min=1899, max=14210, avg=7683.33, stdev=1646.84
clat (nsec): min=2149, max=21503, avg=9514.00, stdev=2354.12
lat (msec): min=6406, max=14210, avg=8054.36, stdev=1140.92
clat percentiles (nsec):
| 1.00th=[ 3408], 5.00th=[ 6624], 10.00th=[ 6944], 20.00th=[ 7776],
| 30.00th=[ 8384], 40.00th=[ 8768], 50.00th=[ 9408], 60.00th=[ 9792],
| 70.00th=[10048], 80.00th=[10816], 90.00th=[12096], 95.00th=[13376],
| 99.00th=[16064], 99.50th=[21376], 99.90th=[21376], 99.95th=[21376],
| 99.99th=[21376]
bw ( KiB/s): min=88086, max=131072, per=94.70%, avg=129298.54, stdev=7142.76, samples=128
iops : min= 86, max= 128, avg=126.17, stdev= 7.08, samples=128
lat (usec) : 4=1.38%, 10=65.55%, 20=32.30%, 50=0.77%
cpu : usr=0.09%, sys=0.03%, ctx=65557, majf=0, minf=37
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=127.3%
submit : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=100.0%, >=64=0.0%
complete : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=100.0%, >=64=0.0%
issued rwts: total=0,8192,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=133MiB/s (140MB/s), 133MiB/s-133MiB/s (140MB/s-140MB/s), io=9216MiB (9664MB), run=69119-69119msec
- 使用 4 KB 的 I/O 块大小和至少 256 的 I/O 深度,通过执行随机写入来测试写入 IOPS
1 | fio --name=write_iops --directory=./fio-random --size=4G \ |
任务数量 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23write_iops: (groupid=0, jobs=1): err= 0: pid=729: Thu May 25 08:12:07 2023
write: IOPS=426, BW=1722KiB/s (1763kB/s)(102MiB/60671msec)
slat (msec): min=153, max=1230, avg=596.26, stdev=325.83
clat (nsec): min=2231, max=44668, avg=8521.19, stdev=3706.08
lat (msec): min=167, max=1230, avg=600.65, stdev=324.44
clat percentiles (nsec):
| 1.00th=[ 6816], 5.00th=[ 7072], 10.00th=[ 7328], 20.00th=[ 7520],
| 30.00th=[ 7776], 40.00th=[ 7904], 50.00th=[ 8096], 60.00th=[ 8256],
| 70.00th=[ 8512], 80.00th=[ 8768], 90.00th=[ 9280], 95.00th=[ 9536],
| 99.00th=[11072], 99.50th=[44800], 99.90th=[44800], 99.95th=[44800],
| 99.99th=[44800]
bw ( KiB/s): min= 2043, max= 6144, per=100.00%, avg=2522.23, stdev=1032.89, samples=82
iops : min= 510, max= 1536, avg=630.51, stdev=258.24, samples=82
lat (usec) : 4=0.39%, 10=96.65%, 20=1.97%, 50=0.99%
cpu : usr=0.09%, sys=0.30%, ctx=25856, majf=0, minf=4
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=124.8%
submit : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0%
complete : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=100.0%
issued rwts: total=0,25856,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
Run status group 0 (all jobs):
WRITE: bw=1722KiB/s (1763kB/s), 1722KiB/s-1722KiB/s (1763kB/s-1763kB/s), io=102MiB (107MB), run=60671-60671msec任务数量16
1 | write_iops: (groupid=0, jobs=16): err= 1 (file:filesetup.c:137, func=unlink, error=Operation not permitted): pid=0: Thu May 25 08:35:50 2023 |
读取
- 使用 1 MB 的 I/O 块大小和至少 64 的 I/O 深度,通过执行具有多个(16 个或更多)并行数据流的顺序读取来测试读取吞吐量:
1 | fio --name=read_throughput --directory=./oss-read --numjobs=16 \ |
1 | read_throughput: (groupid=0, jobs=16): err= 0: pid=1481: Thu May 25 09:11:12 2023 |
1 job
1 | read_throughput: (groupid=0, jobs=1): err= 0: pid=2524: Thu May 25 13:02:20 2023 |
- 使用 4 KB 的 I/O 块大小和至少 256 的 I/O 深度,通过执行随机读取来测试读取 IOPS:
1 | fio --name=read_iops --directory=./oss-read --size=4G \ |
1 | read_iops: (groupid=0, jobs=1): err= 0: pid=345: Thu May 25 10:02:24 2023 |
1 | read_iops: (groupid=0, jobs=5): err= 0: pid=265: Thu May 25 09:57:59 2023 |
ESSD磁盘
阿里云计算型主机, 已经不能挂载普通云盘,所以只能拿SSD做测试.
100GB 总量
顺序写入
任务数 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27write_throughput: (groupid=0, jobs=1): err= 0: pid=10150: Thu May 25 17:00:44 2023
write: IOPS=124, BW=126MiB/s (132MB/s)(7564MiB/60250msec); 0 zone resets
slat (usec): min=30, max=392485, avg=7982.01, stdev=11484.74
clat (msec): min=16, max=3178, avg=504.04, stdev=159.30
lat (msec): min=34, max=3197, avg=512.02, stdev=159.51
clat percentiles (msec):
| 1.00th=[ 257], 5.00th=[ 351], 10.00th=[ 443], 20.00th=[ 493],
| 30.00th=[ 502], 40.00th=[ 502], 50.00th=[ 502], 60.00th=[ 510],
| 70.00th=[ 510], 80.00th=[ 510], 90.00th=[ 518], 95.00th=[ 523],
| 99.00th=[ 776], 99.50th=[ 1737], 99.90th=[ 2601], 99.95th=[ 3004],
| 99.99th=[ 3171]
bw ( KiB/s): min=98304, max=190464, per=99.58%, avg=128017.07, stdev=10640.05, samples=120
iops : min= 96, max= 186, avg=125.02, stdev=10.39, samples=120
lat (msec) : 20=0.01%, 50=0.08%, 100=0.25%, 250=0.59%, 500=38.16%
lat (msec) : 750=60.69%, 1000=0.15%, 2000=0.56%, >=2000=0.35%
cpu : usr=0.77%, sys=0.11%, ctx=3335, majf=0, minf=58
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=99.8%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=0,7500,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=126MiB/s (132MB/s), 126MiB/s-126MiB/s (132MB/s-132MB/s), io=7564MiB (7931MB), run=60250-60250msec
Disk stats (read/write):
nvme0n1: ios=0/31669, merge=0/50, ticks=0/8117333, in_queue=8117333, util=99.88%任务数 16
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28write_throughput: (groupid=0, jobs=16): err= 0: pid=8207: Thu May 25 16:56:43 2023
write: IOPS=147, BW=164MiB/s (172MB/s)(10.3GiB/64086msec); 0 zone resets
slat (usec): min=38, max=22862k, avg=928750.84, stdev=3152457.07
clat (usec): min=179, max=24708k, avg=5941143.86, stdev=4914369.56
lat (msec): min=17, max=32514, avg=6858.49, stdev=5822.58
clat percentiles (msec):
| 1.00th=[ 209], 5.00th=[ 502], 10.00th=[ 558], 20.00th=[ 1200],
| 30.00th=[ 1989], 40.00th=[ 3339], 50.00th=[ 5000], 60.00th=[ 6477],
| 70.00th=[ 8087], 80.00th=[10000], 90.00th=[12953], 95.00th=[15637],
| 99.00th=[17113], 99.50th=[17113], 99.90th=[17113], 99.95th=[17113],
| 99.99th=[17113]
bw ( KiB/s): min=24344, max=1349169, per=100.00%, avg=245712.02, stdev=19220.22, samples=1185
iops : min= 19, max= 1317, avg=239.52, stdev=18.76, samples=1185
lat (usec) : 250=0.03%, 500=0.07%, 750=0.07%, 1000=0.06%
lat (msec) : 2=0.08%, 20=0.01%, 50=0.08%, 100=0.18%, 250=0.65%
lat (msec) : 500=4.20%, 750=8.02%, 1000=4.24%, 2000=15.60%, >=2000=77.33%
cpu : usr=0.07%, sys=0.01%, ctx=6070, majf=0, minf=945
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=95.8%
submit : 0=0.0%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
complete : 0=0.0%, 4=99.8%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.2%, >=64=0.0%
issued rwts: total=0,9482,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=164MiB/s (172MB/s), 164MiB/s-164MiB/s (172MB/s-172MB/s), io=10.3GiB (11.0GB), run=64086-64086msec
Disk stats (read/write):
nvme0n1: ios=0/42211, merge=0/1316, ticks=0/19013423, in_queue=19013423, util=99.91%
随机写
任务 1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30write_iops: (groupid=0, jobs=1): err= 0: pid=11172: Thu May 25 17:03:06 2023
write: IOPS=2986, BW=11.7MiB/s (12.2MB/s)(702MiB/60080msec); 0 zone resets
slat (usec): min=3, max=149890, avg=34096.87, stdev=13557.29
clat (usec): min=2, max=1417.6k, avg=48246.70, stdev=49087.82
lat (msec): min=8, max=1437, avg=82.34, stdev=49.48
clat percentiles (usec):
| 1.00th=[ 4], 5.00th=[ 19792], 10.00th=[ 30278],
| 20.00th=[ 40109], 30.00th=[ 40109], 40.00th=[ 40109],
| 50.00th=[ 40633], 60.00th=[ 49546], 70.00th=[ 50070],
| 80.00th=[ 50070], 90.00th=[ 60031], 95.00th=[ 69731],
| 99.00th=[ 110625], 99.50th=[ 320865], 99.90th=[ 918553],
| 99.95th=[1010828], 99.99th=[1283458]
bw ( KiB/s): min= 9512, max=12912, per=100.00%, avg=11966.60, stdev=545.75, samples=120
iops : min= 2378, max= 3228, avg=2991.65, stdev=136.44, samples=120
lat (usec) : 4=1.31%, 10=1.97%, 20=0.07%, 50=0.06%, 100=0.05%
lat (usec) : 250=0.14%, 500=0.03%
lat (msec) : 10=0.44%, 20=1.20%, 50=75.06%, 100=18.68%, 250=0.45%
lat (msec) : 500=0.29%, 750=0.16%, 1000=0.07%, 2000=0.07%
cpu : usr=0.14%, sys=1.57%, ctx=14436, majf=0, minf=58
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
submit : 0=0.0%, 4=2.2%, 8=2.1%, 16=5.2%, 32=9.8%, 64=23.9%, >=64=56.8%
complete : 0=0.0%, 4=0.0%, 8=0.0%, 16=0.5%, 32=2.6%, 64=15.0%, >=64=81.9%
issued rwts: total=0,179423,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
Run status group 0 (all jobs):
WRITE: bw=11.7MiB/s (12.2MB/s), 11.7MiB/s-11.7MiB/s (12.2MB/s-12.2MB/s), io=702MiB (736MB), run=60080-60080msec
Disk stats (read/write):
nvme0n1: ios=0/188894, merge=0/11801, ticks=0/8296631, in_queue=8296632, util=99.92%任务 16
1 | write_iops: (groupid=0, jobs=16): err= 0: pid=12485: Thu May 25 17:05:42 2023 |
顺序读
job 1
1 | read_throughput: (groupid=0, jobs=1): err= 0: pid=126765: Thu May 25 21:33:59 2023 |
job 16
1 | read_throughput: (groupid=0, jobs=16): err= 0: pid=123513: Thu May 25 21:26:37 2023 |
随机读
job 1
1 | read_iops: (groupid=0, jobs=1): err= 0: pid=25020: Thu May 25 17:33:48 2023 |
job 16
1 | read_iops: (groupid=0, jobs=16): err= 0: pid=30079: Thu May 25 17:45:55 2023 |
顺序读
1 |
NAS
顺序写
job 1
1 | write_throughput: (groupid=0, jobs=1): err= 0: pid=1621: Thu May 25 14:21:27 2023 |
顺序写 16job
1 | write_throughput: (groupid=0, jobs=16): err= 0: pid=258: Thu May 25 12:38:56 2023 |
随机写 job1
1 | write_iops: (groupid=0, jobs=1): err= 0: pid=318: Thu May 25 12:41:52 2023 |
随机写 job 16
1 | write_iops: (groupid=0, jobs=16): err= 0: pid=375: Thu May 25 12:44:15 2023 |
随机读
fio –name=read_iops –directory=./oss-read –size=4G
–time_based –runtime=60s –ramp_time=2s –ioengine=libaio –direct=1
–verify=0 –bs=4K –iodepth=256 –rw=randread –group_reporting=1
–iodepth_batch_submit=256 –iodepth_batch_complete_max=256 –numjobs=16
job 1
1 | read_iops: (groupid=0, jobs=1): err= 0: pid=1659: Thu May 25 14:23:56 2023 |
job 16
1 | read_iops: (groupid=0, jobs=16): err= 0: pid=1753: Thu May 25 14:31:45 2023 |
顺序读
fio –name=read_throughput –directory=./oss-read –numjobs=1
–size=4G –time_based –runtime=60s –ramp_time=2s –ioengine=libaio
–direct=1 –verify=0 –bs=1M –iodepth=64 –rw=read
–group_reporting=1
–iodepth_batch_submit=64 –iodepth_batch_complete_max=64
job 1
1 | read_throughput: (groupid=0, jobs=1): err= 0: pid=1563: Thu May 25 14:17:27 2023 |
job 16
1 | read_throughput: (groupid=0, jobs=16): err= 0: pid=1517: Thu May 25 14:15:44 2023 |