Hi,
We see that we have 5 'remapped' PGs, but are unclear why/what to do about
it. We shifted some target ratios for the autobalancer and it resulted in
this state. When adjusting ratio, we noticed two OSDs go down, but we just
restarted the container for those OSDs with podman, and they came back up.
Here's status output:
###################
root@ceph01:~# ceph status
INFO:cephadm:Inferring fsid x
INFO:cephadm:Inferring config x
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
cluster:
id: 41bb9256-c3bf-11ea-85b9-9e07b0435492
health: HEALTH_OK
services:
mon: 5 daemons, quorum ceph01,ceph04,ceph02,ceph03,ceph05 (age 2w)
mgr: ceph03.ytkuyr(active, since 2w), standbys: ceph01.aqkgbl,
ceph02.gcglcg, ceph04.smbdew, ceph05.yropto
osd: 168 osds: 168 up (since 2d), 168 in (since 2d); 5 remapped pgs
data:
pools: 3 pools, 1057 pgs
objects: 18.00M objects, 69 TiB
usage: 119 TiB used, 2.0 PiB / 2.1 PiB avail
pgs: 1056 active+clean
1 active+clean+scrubbing+deep
io:
client: 859 KiB/s rd, 212 MiB/s wr, 644 op/s rd, 391 op/s wr
root@ceph01:~#
###################
When I look at ceph pg dump, I don't see any marked as remapped:
###################
root@ceph01:~# ceph pg dump |grep remapped
INFO:cephadm:Inferring fsid x
INFO:cephadm:Inferring config x
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
dumped all
root@ceph01:~#
###################
Any idea what might be going on/how to recover? All OSDs are up. Health is
'OK'. This is Ceph 15.2.4 deployed using Cephadm in containers, on Podman
2.0.3.
Hi all,
Following up on a previous issue.
My cephfs MDS is reporting damaged metadata following the addition (and
remapping) of 12 new OSDs.
`ceph tell mds.database-0 damage ls` reports ~85 files damaged. All of type
"backtrace".
` ceph tell mds.database-0 scrub start / recursive repair` seems to have no
effect on the damage.
` ceph tell mds.database-0 scrub start / recursive repair force` also has no
effect.
I understand this seems to be an issue with mapping the file to a filesystem
path. Is there anything I can do to recover these files? Any manual methods?
> ceph status reports:
cluster:
id: 692905c0-f271-4cd8-9e43-1c32ef8abd13
health: HEALTH_ERR
1 MDSs report damaged metadata
300 pgs not deep-scrubbed in time
300 pgs not scrubbed in time
services:
mon: 3 daemons, quorum database-0,file-server,webhost (age 37m)
mgr: webhost(active, since 3d), standbys: file-server, database-0
mds: cephfs:1 {0=database-0=up:active} 2 up:standby
osd: 48 osds: 48 up (since 56m), 48 in (since 13d); 10 remapped pgs
task status:
scrub status:
mds.database-0: idle
data:
pools: 7 pools, 633 pgs
objects: 60.82M objects, 231 TiB
usage: 336 TiB used, 246 TiB / 582 TiB avail
pgs: 623 active+clean
6 active+remapped+backfilling
4 active+remapped+backfill_wait
Thanks for the help.
Best,
Ricardo
Hi,
installation of cluster/osds went "by the book" https://docs.ceph.com/, but
now I want to setup Ceph Object Gateway, but documentation on
https://docs.ceph.com/en/latest/radosgw/ seems to lack information about
what and where to restart for example when setting [client.rgw.gateway-node1]
in /etc/ceph/ceph.conf. Also where should we set this? In cephadm shell or
on the host ...?
Is there some tutorial howto setup gateway from the beginning?
Kind regards,
Rok
Dear all,
ceph version: mimic 13.2.10
I'm facing a serious bug with devices converted from "ceph-disk" to "ceph-volume simple". I "converted" all ceph-disk devices using "ceph-volume simple scan ..." And everything worked fine at the beginning. Today I needed to reboot an OSD host and since then most ceph-disk OSDs are screwed up. Apparently, "ceph-volume simple scan ..." creates symlinks to the block partition /dev/sd?2 using the "/dev/sd?2" name for the link target. These names are not stable and are expected to change after every reboot. Now I have a bunch of OSDs with new /dev/sd?2" names that won't boot any more, because this link points to the wrong block partition. Doing another "ceph-volume simple scan ..." doesn't help, it just "rediscovers" the wrong location. Here is what a broken OSD looks like (fresh "ceph-volume simple scan --stdout ..." output):
{
"active": "ok",
"block": {
"path": "/dev/sda2",
"uuid": "b5ac1462-510a-4483-8f42-604e6adc5c9d"
},
"block_uuid": "1d9d89a2-18c7-4610-9dcd-167d44ce1879",
"bluefs": 1,
"ceph_fsid": "e4ece518-f2cb-4708-b00f-b6bf511e91d9",
"cluster_name": "ceph",
"data": {
"path": "/dev/sdb1",
"uuid": "c35a7efb-8c1c-42a1-8027-cf422d7e7ecb"
},
"fsid": "c35a7efb-8c1c-42a1-8027-cf422d7e7ecb",
"keyring": "AQAZJ6ddedALDxAAJI7NLJ2CRFoQWK5STRpHuw==",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"none": "",
"ready": "ready",
"require_osd_release": "",
"type": "bluestore",
"whoami": 241
}
OSD 241's data partition looks like this (after mount /dev/sdb1 /var/lib/ceph/osd/ceph-241):
[root@ceph-adm:ceph-18 ceph-241]# ls -l /var/lib/ceph/osd/ceph-241
total 56
-rw-r--r--. 1 root root 411 Oct 16 2019 activate.monmap
-rw-r--r--. 1 ceph ceph 3 Oct 16 2019 active
lrwxrwxrwx. 1 root root 9 Mar 2 14:19 block -> /dev/sda2
-rw-r--r--. 1 ceph ceph 37 Oct 16 2019 block_uuid
-rw-r--r--. 1 ceph disk 2 Oct 16 2019 bluefs
-rw-r--r--. 1 ceph ceph 37 Oct 16 2019 ceph_fsid
-rw-r--r--. 1 ceph ceph 37 Oct 16 2019 fsid
-rw-------. 1 ceph ceph 58 Oct 16 2019 keyring
-rw-r--r--. 1 ceph disk 8 Oct 16 2019 kv_backend
-rw-r--r--. 1 ceph ceph 21 Oct 16 2019 magic
-rw-r--r--. 1 ceph disk 4 Oct 16 2019 mkfs_done
-rw-r--r--. 1 ceph ceph 0 Nov 23 14:58 none
-rw-r--r--. 1 ceph disk 6 Oct 16 2019 ready
-rw-r--r--. 1 ceph disk 2 Jan 31 2020 require_osd_release
-rw-r--r--. 1 ceph ceph 10 Oct 16 2019 type
-rw-r--r--. 1 ceph ceph 4 Oct 16 2019 whoami
The symlink "block -> /dev/sda2" goes to the wrong disk. How can I fix that in a stable way? Also, why are not stable "/dev/disk/by-uuid/..." link targets created instead? Can I change that myself?
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
hello!
I'm trying to understand how Bluestore cooperates with RBD image clones, so
my test is simple
1. create an image (2G) and fill with data
2. create a snapshot
3. protect it
4. create a clone of the image
5. write a small portion of data (4K) to clone
6. check how it changed and if just 4K are used to prove CoW allocated new
extent instead of copying out snapped data.
Unfortunately it occurs that at least rbd du reports that 4M was changed
and the clone consumes 4M of data instead of expected 4K...
'''
rbd du rbd/clone1
NAME PROVISIONED USED
clone1 2 GiB 4 MiB
'''
How can I trace/prove Bluestore CoW really works in this case, and prevent
copying the rest of the 4M stripe like Filestore did ?
p.s tested on Luminous/Octopus, ssd devices, min_alloc_size: 16k,
block_size: 4k
best regards!
--
Pawel S.
Hi All,
I2d like to install a Ceph Nautilus on Ubuntu 18.04 LTS and give the storage to 2 windows server via ISCSI. I choose the Nautilus because of the deploy function I don't want to another VM to cephadm. So I can isntall the ceph and it is working properly but can't setup the icsi gateway. The services running like tcmu-runner, ebd-target-gw and rbd-target-api. I can going into gwcli but I can't create the first gw I get this msessage:
/iscsi-target...-igw/gateways> create cf01 192.168.203.51 skipchecks=true
OS version/package checks have been bypassed
Get gateway hostname failed : 403 Forbidden
Please check api_host setting and make sure host cf01 IP is listening on port 5000
In the syslog at the same time:
Mar 1 15:43:02 cf01 there is no tcmu-runner data avaliable
Mar 1 15:43:06 cf01 ::ffff:127.0.0.1 - - [01/Mar/2021 15:43:06] "GET /api/config HTTP/1.1" 200 -
I can see the python listening on port 5000 (mybe this is my problem)
netstat -tulpn | grep 5000
tcp6 0 0 :::5000 :::* LISTEN 1976/python
I cannot find anything about this error and I can't figure out what is solution.
Ubuntu 18.04.5 LTS
4.15.0-136-generic
I also tried with 4.20.0-042000-generic but the erorr was the same.
jansz0
I'm in the middle of increasing PG count for one of our pools by making small increments, waiting for the process to complete, rinse and repeat. I'm doing it this way so I can control when all this activity is happening and keeping it away from the busier production traffic times.
I'm expecting some inbalance as PGs get created on already unbalanced OSDs, however our monitoring picked up something today that I'm not really understanding. Our total utilization is just over 50% and about 96% of our total data is in this one pool. Due to there not being enough PGs, the amount of data in each is quite large and since they aren't evenly spread across the OSDs, there's a bit of inbalance. That's all cool and to be expected, which is the reason for increasing the PG count in the first place.
However, as some PGs are splitting, the new PGs are sometimes being created on OSDs that already have a disproportionate amount of data. Again, not totally unexpected. Our monitoring detected the usage of this pool to be >85% today as I neared the end of another increase in PG count. What I'm not understanding is how this value is determined. I've read other posts and the calculations suggested don't give a result that equals what shows in my %USED column. I'm suspecting that it's somehow related to the MAX AVAIL value (which I believe is somewhat indirectly related to the amount available based on the individual OSD utilization), but none of the posts I read mention this in their calculations and I've been unable to create a formula with any of the values I have to end up with the &USED value I have.
For the record, my current total utilization based on a 'ceph osd df' looks like this:
TOTAL 39507G(SIZE) 19931G(USE) 17568G(AVAIL) 50.45(%USE)
My most utilised OSD (currently in the process of moving some data off this OSD) is 81.58% used with 188G available and a variance of 1.62.
A cut-down output of 'ceph df' looks like this:
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
39507G 17569G 19930G 50.45
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
default.rgw.buckets.data 30 9552G 86.05 1548G 36285066
I suspect that as I get the utilization of my over-utilized OSDs down, this %USED value will drop. But, I'd just love to fully understand how this value is calculated.
Thanks,
Mark J