Hi All :),
I would like to get your feedback about the components below to build a PoC
OSD Node (I will build 3 of these).
SSD for OS.
NVMe for cache.
HDD for storage.
The Supermicro motherboard has 2 10Gb cards, and I will use ECC memories.
[image: image.png]
Thanks for your feedback!
--
Ignacio Ocampo
I have been converting ssd's osd's to dmcrypt, and I have noticed that
pg's of pools are migrated that should be (and are?) on hdd class.
On a healthy ok cluster I am getting, when I set the crush reweight to
0.0 of a ssd osd this:
17.35 10415 0 0 9907 0
36001743890 0 0 3045 3045
active+remapped+backfilling 2020-09-27 12:55:49.093054 83758'20725398
83758:100379720 [8,14,23] 8 [3,14,23] 3
83636'20718129 2020-09-27 00:58:07.098096 83300'20689151 2020-09-24
21:42:07.385360 0
However osds 3,14,23,8 are all hdd osd's
Since this is a cluster from Kraken/Luminous, I am not sure if the
device class of the replicated_ruleset[1] was set when the pool 17 was
created.
Weird thing is that all pg's of this pool seem to be on hdd osd[2]
Q. How can I display the definition of 'crush_rule 0' at the time of the
pool creation? (To be sure it had already this device class hdd
configured)
[1]
[@~]# ceph osd pool ls detail | grep 'pool 17'
pool 17 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 64 pgp_num 64 autoscale_mode warn last_change 83712
flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
[@~]# ceph osd crush rule dump replicated_ruleset
{
"rule_id": 0,
"rule_name": "replicated_ruleset",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -10,
"item_name": "default~hdd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
[2]
[@~]# for osd in `ceph pg dump pgs| grep '^17' | awk '{print $17" "$19}'
| grep -oE '[0-9]{1,2}'| sort -u -n`; do ceph osd crush get-device-class
osd.$osd ; done | sort -u
dumped pgs
hdd
Den mån 21 sep. 2020 kl 16:15 skrev Marc Roos <M.Roos(a)f1-outsourcing.eu>:
> When I create a new encrypted osd with ceph volume[1]
>
> Q4: Where is this luks passphrase stored?
>
I think the OSD asks the mon for it after auth:ing, so "in the mon DBs"
somewhere.
--
May the most significant bit of your life be positive.
Dear All,
After adding 10 new nodes, each with 10 OSDs to a cluster, we are unable
to get "objects misplaced" back to zero.
The cluster successfully re-balanced from ~35% to 5% misplaced, however
every time "objects misplaced" drops below 5%, a number of pgs start to
backfill, increasing the "objects misplaced" to 5.1%
I do not believe the balancer is active:
[root@ceph7 ceph]# ceph balancer status
{
"last_optimize_duration": "",
"plans": [],
"mode": "upmap",
"active": false,
"optimize_result": "",
"last_optimize_started": ""
}
The cluster has now been stuck at ~5% misplaced for a couple of weeks.
The recovery is using ~1GiB/s bandwidth, and is preventing any scrubs.
The cluster contains 2.6PB of cephfs, that is still read/write usable.
Cluster originally had 10 nodes, each with 45 8TB drives. The new nodes
have 10 x 16TB drives.
To show the cluster before and immediately after an "episode"
***************************************************
[root@ceph7 ceph]# ceph -s
cluster:
id: 36ed7113-080c-49b8-80e2-4947cc456f2a
health: HEALTH_WARN
7 nearfull osd(s)
2 pool(s) nearfull
Low space hindering backfill (add storage if this doesn't
resolve itself): 11 pgs backfill_toofull
16372 pgs not deep-scrubbed in time
16372 pgs not scrubbed in time
1/3 mons down, quorum ceph1b,ceph3b
services:
mon: 3 daemons, quorum ceph1b,ceph3b (age 6d), out of quorum: ceph2b
mgr: ceph3(active, since 3d), standbys: ceph1
mds: cephfs:1 {0=ceph1=up:active} 1 up:standby-replay
osd: 554 osds: 554 up (since 4d), 554 in (since 5w); 848 remapped pgs
task status:
scrub status:
mds.ceph1: idle
mds.ceph2: idle
data:
pools: 3 pools, 16417 pgs
objects: 937.39M objects, 2.6 PiB
usage: 3.2 PiB used, 1.4 PiB / 4.6 PiB avail
pgs: 467620187/9352502650 objects misplaced (5.000%)
7893 active+clean
7294 active+clean+snaptrim_wait
785 active+remapped+backfill_wait
382 active+clean+snaptrim
52 active+remapped+backfilling
11 active+remapped+backfill_wait+backfill_toofull
io:
client: 129 KiB/s rd, 82 MiB/s wr, 3 op/s rd, 53 op/s wr
recovery: 1.1 GiB/s, 364 objects/s
***************************************************
and then seconds later:
***************************************************
[root@ceph7 ceph]# ceph -s
cluster:
id: 36ed7113-080c-49b8-80e2-4947cc456f2a
health: HEALTH_WARN
7 nearfull osd(s)
2 pool(s) nearfull
Low space hindering backfill (add storage if this doesn't
resolve itself): 11 pgs backfill_toofull
16372 pgs not deep-scrubbed in time
16372 pgs not scrubbed in time
1/3 mons down, quorum ceph1b,ceph3b
services:
mon: 3 daemons, quorum ceph1b,ceph3b (age 6d), out of quorum: ceph2b
mgr: ceph3(active, since 3d), standbys: ceph1
mds: cephfs:1 {0=ceph1=up:active} 1 up:standby-replay
osd: 554 osds: 554 up (since 5d), 554 in (since 5w); 854 remapped pgs
task status:
scrub status:
mds.ceph1: idle
mds.ceph2: idle
data:
pools: 3 pools, 16417 pgs
objects: 937.40M objects, 2.6 PiB
usage: 3.2 PiB used, 1.4 PiB / 4.6 PiB avail
pgs: 470821753/9352518510 objects misplaced (5.034%)
7892 active+clean
7290 active+clean+snaptrim_wait
791 active+remapped+backfill_wait
381 active+clean+snaptrim
52 active+remapped+backfilling
11 active+remapped+backfill_wait+backfill_toofull
io:
client: 155 KiB/s rd, 125 MiB/s wr, 2 op/s rd, 53 op/s wr
recovery: 969 MiB/s, 330 objects/s
***************************************************
If it helps, I've tried capturing 1/5 debug logs from an OSD.
Not sure, but I think this is the way to follow a thread handling one pg
as it decides to rebalance:
[root@ceph7 ceph]# grep 7f2e569e9700 ceph-osd.312.log | less
2020-09-24 14:44:36.844 7f2e569e9700 1 osd.312 pg_epoch: 106808
pg[5.157ds0( v 106803'6043528 (103919'6040524,106803'6043528]
local-lis/les=102671/102672 n=56293 ec=85890/1818 lis/c 102671/10
2671 les/c/f 102672/102672/0 106808/106808/106808)
[148,508,398,457,256,533,137,469,357,306]p148(0) r=-1 lpr=106808
pi=[102671,106808)/1 luod=0'0 crt=106803'6043528 lcod 106801'6043526 active
mbc={} ps=104] start_peering_interval up
[312,424,369,461,546,525,498,169,251,127] ->
[148,508,398,457,256,533,137,469,357,306], acting
[312,424,369,461,546,525,498,169,251,127] -> [148,508,39
8,457,256,533,137,469,357,306], acting_primary 312(0) -> 148, up_primary
312(0) -> 148, role 0 -> -1, features acting 4611087854031667199
upacting 4611087854031667199
2020-09-24 14:44:36.847 7f2e569e9700 1 osd.312 pg_epoch: 106808
pg[5.157ds0( v 106803'6043528 (103919'6040524,106803'6043528]
local-lis/les=102671/102672 n=56293 ec=85890/1818 lis/c 102671/102671
les/c/f 102672/102672/0 106808/106808/106808)
[148,508,398,457,256,533,137,469,357,306]p148(0) r=-1 lpr=106808
pi=[102671,106808)/1 crt=106803'6043528 lcod 106801'6043526 unknown
NOTIFY mbc={} ps=104] state<Start>: transitioning to Stray
2020-09-24 14:44:37.792 7f2e569e9700 1 osd.312 pg_epoch: 106809
pg[5.157ds0( v 106803'6043528 (103919'6040524,106803'6043528]
local-lis/les=102671/102672 n=56293 ec=85890/1818 lis/c 102671/102671
les/c/f 102672/102672/0 106808/106809/106809)
[148,508,398,457,256,533,137,469,357,306]/[312,424,369,461,546,525,498,169,251,127]p312(0)
r=0 lpr=106809 pi=[102671,106809)/1 crt=106803'6043528 lcod
106801'6043526 mlcod 0'0 remapped NOTIFY mbc={} ps=104]
start_peering_interval up [148,508,398,457,256,533,137,469,357,306] ->
[148,508,398,457,256,533,137,469,357,306], acting
[148,508,398,457,256,533,137,469,357,306] ->
[312,424,369,461,546,525,498,169,251,127], acting_primary 148(0) -> 312,
up_primary 148(0) -> 148, role -1 -> 0, features acting
4611087854031667199 upacting 4611087854031667199
2020-09-24 14:44:37.793 7f2e569e9700 1 osd.312 pg_epoch: 106809
pg[5.157ds0( v 106803'6043528 (103919'6040524,106803'6043528]
local-lis/les=102671/102672 n=56293 ec=85890/1818 lis/c 102671/102671
les/c/f 102672/102672/0 106808/106809/106809)
[148,508,398,457,256,533,137,469,357,306]/[312,424,369,461,546,525,498,169,251,127]p312(0)
r=0 lpr=106809 pi=[102671,106809)/1 crt=106803'6043528 lcod
106801'6043526 mlcod 0'0 remapped mbc={} ps=104] state<Start>:
transitioning to Primary
2020-09-24 14:44:38.832 7f2e569e9700 0 log_channel(cluster) log [DBG] :
5.157ds0 starting backfill to osd.137(6) from (0'0,0'0] MAX to
106803'6043528
2020-09-24 14:44:38.861 7f2e569e9700 0 log_channel(cluster) log [DBG] :
5.157ds0 starting backfill to osd.148(0) from (0'0,0'0] MAX to
106803'6043528
2020-09-24 14:44:38.879 7f2e569e9700 0 log_channel(cluster) log [DBG] :
5.157ds0 starting backfill to osd.256(4) from (0'0,0'0] MAX to
106803'6043528
2020-09-24 14:44:38.894 7f2e569e9700 0 log_channel(cluster) log [DBG] :
5.157ds0 starting backfill to osd.306(9) from (0'0,0'0] MAX to
106803'6043528
2020-09-24 14:44:38.902 7f2e569e9700 0 log_channel(cluster) log [DBG] :
5.157ds0 starting backfill to osd.357(8) from (0'0,0'0] MAX to
106803'6043528
2020-09-24 14:44:38.912 7f2e569e9700 0 log_channel(cluster) log [DBG] :
5.157ds0 starting backfill to osd.398(2) from (0'0,0'0] MAX to
106803'6043528
2020-09-24 14:44:38.923 7f2e569e9700 0 log_channel(cluster) log [DBG] :
5.157ds0 starting backfill to osd.457(3) from (0'0,0'0] MAX to
106803'6043528
2020-09-24 14:44:38.931 7f2e569e9700 0 log_channel(cluster) log [DBG] :
5.157ds0 starting backfill to osd.469(7) from (0'0,0'0] MAX to
106803'6043528
2020-09-24 14:44:38.938 7f2e569e9700 0 log_channel(cluster) log [DBG] :
5.157ds0 starting backfill to osd.508(1) from (0'0,0'0] MAX to
106803'6043528
2020-09-24 14:44:38.947 7f2e569e9700 0 log_channel(cluster) log [DBG] :
5.157ds0 starting backfill to osd.533(5) from (0'0,0'0] MAX to
106803'6043528
***************************************************
any advice appreciated,
Jake
--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.
Hi all,
I'm trying to troubleshoot an interesting problem with RBD performance for VMs. Tests were done using fio both outside and inside the VMs shows that random read/write is 20-30% slower than bulk read/write at QD=1. However, at QD=16/32/64, random read/write is sometimes 3X faster than bulk read/write. Inside the VMs, tests were done with -direct=1 -sync=1 using libaio. Outside VMs, test were done with -direct=1 -sync=1 with both librbd and libaio.
The gap between random and bulk I/O narrows with increasing QD to 128. However, there's always a 20-30% difference with random I/O being faster. Read and write tests show similar results both inside and outside VMs.
Typically, the random I/O performance would be less (or much less) than bulk. Any idea as to what I should be looking at? Thanks.
Tri Hoang
Inside VM
========
At QD=8, randread is around 2.8X read @ 64k
---------------------------------------------------------------
tri@ansible:~$ fio -name=read -ioengine=libaio -iodepth=8 -direct=1 -sync=1 -rw=randread -bs=64k -size=4G -runtime=120 --filename=test.fio
read: (g=0): rw=randread, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=8
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=762MiB/s][r=12.2k IOPS][eta 00m:00s]
read: (groupid=0, jobs=1): err= 0: pid=1551: Wed Sep 30 08:19:56 2020
read: IOPS=11.9k, BW=743MiB/s (779MB/s)(4096MiB/5515msec)
tri@ansible:~$ fio -name=read -ioengine=libaio -iodepth=8 -direct=1 -sync=1 -rw=read -bs=64k -size=4G -runtime=120 --filename=test.fio
read: (g=0): rw=read, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=8
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=268MiB/s][r=4289 IOPS][eta 00m:00s]
read: (groupid=0, jobs=1): err= 0: pid=1554: Wed Sep 30 08:22:03 2020
read: IOPS=4374, BW=273MiB/s (287MB/s)(4096MiB/14981msec)
At QD=128, randread is around 1.4X read
---------------------------------------------------------
tri@ansible:~$ fio -name=read -ioengine=libaio -iodepth=128 -direct=1 -sync=1 -rw=randread -bs=64k -size=4G -runtime=120 --filename=test.fio
read: (g=0): rw=randread, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=128
fio-3.12
Starting 1 process
Jobs: 1 (f=1)
read: (groupid=0, jobs=1): err= 0: pid=1548: Wed Sep 30 08:18:59 2020
read: IOPS=23.1k, BW=1441MiB/s (1511MB/s)(4096MiB/2843msec)
tri@ansible:~$ fio -name=read -ioengine=libaio -iodepth=128 -direct=1 -sync=1 -rw=read -bs=64k -size=4G -runtime=120 --filename=test.fio
read: (g=0): rw=read, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=128
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=974MiB/s][r=15.6k IOPS][eta 00m:00s]
read: (groupid=0, jobs=1): err= 0: pid=1545: Wed Sep 30 08:17:38 2020
read: IOPS=15.9k, BW=997MiB/s (1045MB/s)(4096MiB/4110msec)
I have no idea why ceph-volume keeps failing so much. I keep zapping and
creating and all of a sudden it works. I am not having pvs or links left
in /dev/mapper. I am checking that with lsblk, dmsetup ls --tree and
ceph-volume inventory.
These are the stdout/err I am having, every time ceph-volume fails it
ends with the same stderr output.
stdout: Physical volume "/dev/sdh" successfully created.
stdout: Volume group "ceph-0fbb2736-5cb1-4f87-aef2-7591fe979360"
successfully created
stdout: Logical volume "osd-block-3ff3e59c-e752-4560-91b8-b53f38db5c85"
created.
stderr: got monmap epoch 20
stdout: creating /var/lib/ceph/osd/ceph-21/keyring
stdout: creating /var/lib/ceph/osd/ceph-21/lockbox.keyring
stderr: Device i7jC8B-0Z5c-z95F-Cewj-3Jz2-60If-jucHBi already exists.
stderr: failed to read label for
/dev/mapper/i7jC8B-0Z5c-z95F-Cewj-3Jz2-60If-jucHBi: (2) No such file or
directory
stderr: purged osd.21
Our technical experts have many years of troubleshooting experience and great technical skills of handling different types of issues associated to HP printer. HP Printer Support team is technically experienced to identify the main reasons of HP printer not working issue and apply the permanent resolutions to fix this error within a few seconds. Our tech-geeks have the potential to clear user’s doubts and give the best chance to work on your HP printer.
https://www.hpprintersupportpro.us/
Bought a new printer and ready to set it up? But do you even know how to get started? You can find the printer setup guides on our website and learn how to set up a new HP printer from 123.HP.com/Setup. From unboxing the HP printer to learning how to print and perform other tasks, all the solutions are available on our portal. Still, if you feel like you need more help, we are here to assist you 24/7. You can simply dial our HP printer helpline to get on a call with our printer experts and get the remedy of your problem. https://www.hpprintersupportpro.com/blog/123-hp-com-setup/