March 2021 - ceph-users - lists.ceph.io

by Szabo, Istvan (Agoda)

Hi, I've heard many time that to install multiple rados-gateway on the same server is possible, just need to create on a different port. However I've never managed to make it work. Today I gave another try like this: 1. Created a new keyring: ceph auth get-or-create client.rgw.servername.rgw1 mon 'allow rw' osd 'allow rwx' 2. Created the keyring file: /var/lib/ceph/radosgw/ceph-rgw.servername.rgw1/keyring 3. Added another entry in the ceph octopus configuration with different port: [client.rgw.servername.rgw1] host = servername keyring = /var/lib/ceph/radosgw/ceph-rgw.servername.rgw1/keyring log file = /var/log/ceph/ceph-rgw-servername.rgw1.log rgw frontends = beast endpoint=10.104.198.101:8081 rgw thread pool size = 512 rgw_zone=zone 1. Copied another RGW system file in centos 8: cp -pr /etc/systemd/system/ceph-radosgw.target.wants/ceph-radosgw\(a)rgw.servername.rgw0.service /etc/systemd/system/ceph-radosgw.target.wants/ceph-radosgw\(a)rgw.servername.rgw1.service 2. Restarted ceph.target. 3. Result is the same number of rados-gateways. So how is this actually should have been done? Thank you ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

3 years, 1 month

1
0
0 0

Need Clarification on Maintenance Shutdown Procedure

by Dave Hall

Hello, I've had a look at the instructions for clean shutdown given at https://ceph.io/planet/how-to-do-a-ceph-cluster-maintenance-shutdown/, but I'm not clear about some things on the steps about shutting down the various Ceph components. For my current 3-node cluster I have MONs, MDSs, MGRs, and OSDs all running on the same nodes. Also, this is a non-container installation. Since I don't have separate dedicated nodes, as described in the referenced web page, I think the instructions mean that I need to issue SystemD commands to stop the corresponding services/targets on each node for the Ceph components mentioned in each step. Since we want to bring services up in the right order, I should also use SystemD commands to disable these services/targets so they don't automatically restart when I power the nodes back on. After power-on, I would then re-enable and manually start services/targets in the order described. One other specific question: For step 4 it says to shut down my service nodes. Does this mean my MDSs? (I'm not running any Object Gateways or NFS, but I think these would go in this step as well?) Please let me know if I've got this right. The cluster contains 200TB of a researcher's data that has taken a year to collect, so caution is needed. Thanks. -Dave -- Dave Hall Binghamton University kdhall(a)binghamton.edu

3 years, 1 month

3
3
0 0

Remapped PGs

by David Orman

Hi, We see that we have 5 'remapped' PGs, but are unclear why/what to do about it. We shifted some target ratios for the autobalancer and it resulted in this state. When adjusting ratio, we noticed two OSDs go down, but we just restarted the container for those OSDs with podman, and they came back up. Here's status output: ################### root@ceph01:~# ceph status INFO:cephadm:Inferring fsid x INFO:cephadm:Inferring config x INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15 cluster: id: 41bb9256-c3bf-11ea-85b9-9e07b0435492 health: HEALTH_OK services: mon: 5 daemons, quorum ceph01,ceph04,ceph02,ceph03,ceph05 (age 2w) mgr: ceph03.ytkuyr(active, since 2w), standbys: ceph01.aqkgbl, ceph02.gcglcg, ceph04.smbdew, ceph05.yropto osd: 168 osds: 168 up (since 2d), 168 in (since 2d); 5 remapped pgs data: pools: 3 pools, 1057 pgs objects: 18.00M objects, 69 TiB usage: 119 TiB used, 2.0 PiB / 2.1 PiB avail pgs: 1056 active+clean 1 active+clean+scrubbing+deep io: client: 859 KiB/s rd, 212 MiB/s wr, 644 op/s rd, 391 op/s wr root@ceph01:~# ################### When I look at ceph pg dump, I don't see any marked as remapped: ################### root@ceph01:~# ceph pg dump |grep remapped INFO:cephadm:Inferring fsid x INFO:cephadm:Inferring config x INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15 dumped all root@ceph01:~# ################### Any idea what might be going on/how to recover? All OSDs are up. Health is 'OK'. This is Ceph 15.2.4 deployed using Cephadm in containers, on Podman 2.0.3.

3 years, 1 month

2
4
0 0

MDS is reporting damaged metadata damage- followup

by ricardo.re.azevedo＠gmail.com

Hi all, Following up on a previous issue. My cephfs MDS is reporting damaged metadata following the addition (and remapping) of 12 new OSDs. `ceph tell mds.database-0 damage ls` reports ~85 files damaged. All of type "backtrace". ` ceph tell mds.database-0 scrub start / recursive repair` seems to have no effect on the damage. ` ceph tell mds.database-0 scrub start / recursive repair force` also has no effect. I understand this seems to be an issue with mapping the file to a filesystem path. Is there anything I can do to recover these files? Any manual methods? > ceph status reports: cluster: id: 692905c0-f271-4cd8-9e43-1c32ef8abd13 health: HEALTH_ERR 1 MDSs report damaged metadata 300 pgs not deep-scrubbed in time 300 pgs not scrubbed in time services: mon: 3 daemons, quorum database-0,file-server,webhost (age 37m) mgr: webhost(active, since 3d), standbys: file-server, database-0 mds: cephfs:1 {0=database-0=up:active} 2 up:standby osd: 48 osds: 48 up (since 56m), 48 in (since 13d); 10 remapped pgs task status: scrub status: mds.database-0: idle data: pools: 7 pools, 633 pgs objects: 60.82M objects, 231 TiB usage: 336 TiB used, 246 TiB / 582 TiB avail pgs: 623 active+clean 6 active+remapped+backfilling 4 active+remapped+backfill_wait Thanks for the help. Best, Ricardo

3 years, 1 month

1
0
0 0

Ceph Object Gateway setup/tutorial

by Rok Jaklič

Hi, installation of cluster/osds went "by the book" https://docs.ceph.com/, but now I want to setup Ceph Object Gateway, but documentation on https://docs.ceph.com/en/latest/radosgw/ seems to lack information about what and where to restart for example when setting [client.rgw.gateway-node1] in /etc/ceph/ceph.conf. Also where should we set this? In cephadm shell or on the host ...? Is there some tutorial howto setup gateway from the beginning? Kind regards, Rok

3 years, 1 month

1
0
0 0

reboot breaks OSDs converted from ceph-disk to ceph-volume simple

by Frank Schilder

Dear all, ceph version: mimic 13.2.10 I'm facing a serious bug with devices converted from "ceph-disk" to "ceph-volume simple". I "converted" all ceph-disk devices using "ceph-volume simple scan ..." And everything worked fine at the beginning. Today I needed to reboot an OSD host and since then most ceph-disk OSDs are screwed up. Apparently, "ceph-volume simple scan ..." creates symlinks to the block partition /dev/sd?2 using the "/dev/sd?2" name for the link target. These names are not stable and are expected to change after every reboot. Now I have a bunch of OSDs with new /dev/sd?2" names that won't boot any more, because this link points to the wrong block partition. Doing another "ceph-volume simple scan ..." doesn't help, it just "rediscovers" the wrong location. Here is what a broken OSD looks like (fresh "ceph-volume simple scan --stdout ..." output): { "active": "ok", "block": { "path": "/dev/sda2", "uuid": "b5ac1462-510a-4483-8f42-604e6adc5c9d" }, "block_uuid": "1d9d89a2-18c7-4610-9dcd-167d44ce1879", "bluefs": 1, "ceph_fsid": "e4ece518-f2cb-4708-b00f-b6bf511e91d9", "cluster_name": "ceph", "data": { "path": "/dev/sdb1", "uuid": "c35a7efb-8c1c-42a1-8027-cf422d7e7ecb" }, "fsid": "c35a7efb-8c1c-42a1-8027-cf422d7e7ecb", "keyring": "AQAZJ6ddedALDxAAJI7NLJ2CRFoQWK5STRpHuw==", "kv_backend": "rocksdb", "magic": "ceph osd volume v026", "mkfs_done": "yes", "none": "", "ready": "ready", "require_osd_release": "", "type": "bluestore", "whoami": 241 } OSD 241's data partition looks like this (after mount /dev/sdb1 /var/lib/ceph/osd/ceph-241): [root@ceph-adm:ceph-18 ceph-241]# ls -l /var/lib/ceph/osd/ceph-241 total 56 -rw-r--r--. 1 root root 411 Oct 16 2019 activate.monmap -rw-r--r--. 1 ceph ceph 3 Oct 16 2019 active lrwxrwxrwx. 1 root root 9 Mar 2 14:19 block -> /dev/sda2 -rw-r--r--. 1 ceph ceph 37 Oct 16 2019 block_uuid -rw-r--r--. 1 ceph disk 2 Oct 16 2019 bluefs -rw-r--r--. 1 ceph ceph 37 Oct 16 2019 ceph_fsid -rw-r--r--. 1 ceph ceph 37 Oct 16 2019 fsid -rw-------. 1 ceph ceph 58 Oct 16 2019 keyring -rw-r--r--. 1 ceph disk 8 Oct 16 2019 kv_backend -rw-r--r--. 1 ceph ceph 21 Oct 16 2019 magic -rw-r--r--. 1 ceph disk 4 Oct 16 2019 mkfs_done -rw-r--r--. 1 ceph ceph 0 Nov 23 14:58 none -rw-r--r--. 1 ceph disk 6 Oct 16 2019 ready -rw-r--r--. 1 ceph disk 2 Jan 31 2020 require_osd_release -rw-r--r--. 1 ceph ceph 10 Oct 16 2019 type -rw-r--r--. 1 ceph ceph 4 Oct 16 2019 whoami The symlink "block -> /dev/sda2" goes to the wrong disk. How can I fix that in a stable way? Also, why are not stable "/dev/disk/by-uuid/..." link targets created instead? Can I change that myself? Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

3 years, 1 month

1
1
0 0

RBD clone on Bluestore

by Pawel S

hello! I'm trying to understand how Bluestore cooperates with RBD image clones, so my test is simple 1. create an image (2G) and fill with data 2. create a snapshot 3. protect it 4. create a clone of the image 5. write a small portion of data (4K) to clone 6. check how it changed and if just 4K are used to prove CoW allocated new extent instead of copying out snapped data. Unfortunately it occurs that at least rbd du reports that 4M was changed and the clone consumes 4M of data instead of expected 4K... ''' rbd du rbd/clone1 NAME PROVISIONED USED clone1 2 GiB 4 MiB ''' How can I trace/prove Bluestore CoW really works in this case, and prevent copying the rest of the 4M stripe like Filestore did ? p.s tested on Luminous/Octopus, ssd devices, min_alloc_size: 16k, block_size: 4k best regards! -- Pawel S.

3 years, 1 month

3
4
0 0

Ceph iscsi help

by Várkonyi János

Hi All, I2d like to install a Ceph Nautilus on Ubuntu 18.04 LTS and give the storage to 2 windows server via ISCSI. I choose the Nautilus because of the deploy function I don't want to another VM to cephadm. So I can isntall the ceph and it is working properly but can't setup the icsi gateway. The services running like tcmu-runner, ebd-target-gw and rbd-target-api. I can going into gwcli but I can't create the first gw I get this msessage: /iscsi-target...-igw/gateways> create cf01 192.168.203.51 skipchecks=true OS version/package checks have been bypassed Get gateway hostname failed : 403 Forbidden Please check api_host setting and make sure host cf01 IP is listening on port 5000 In the syslog at the same time: Mar 1 15:43:02 cf01 there is no tcmu-runner data avaliable Mar 1 15:43:06 cf01 ::ffff:127.0.0.1 - - [01/Mar/2021 15:43:06] "GET /api/config HTTP/1.1" 200 - I can see the python listening on port 5000 (mybe this is my problem) netstat -tulpn | grep 5000 tcp6 0 0 :::5000 :::* LISTEN 1976/python I cannot find anything about this error and I can't figure out what is solution. Ubuntu 18.04.5 LTS 4.15.0-136-generic I also tried with 4.20.0-042000-generic but the erorr was the same. jansz0

3 years, 1 month

3
2
0 0

15.2.8 mgr keep crashing every few days

by levin ng

Hi all, I’d recently deployed ceph 15.2.8 with 3(mon,mgr,rgw,mds) and 4 (osd) total 7 host, however I encountered mgr crash a few times a week, the crashing mgr can be any one of 3. I couldn’t identify the problem behind and here is the crash info, appreciate anyone if you have suggestions that I could narrow it down. Thank you very much. { "assert_condition": "ret == 0", "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.8/rpm/el8/BUILD/ceph-15.2.8/src/common/Thread.cc", "assert_func": "void Thread::create(const char*, size_t)", "assert_line": 157, "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.8/rpm/el8/BUILD/ceph-15.2.8/src/common/Thread.cc: In function 'void Thread::create(const char*, size_t)' thread 7f833addc700 time 2021-02-10T20:00:32.980508+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.8/rpm/el8/BUILD/ceph-15.2.8/src/common/Thread.cc: 157: FAILED ceph_assert(ret == 0)\n", "assert_thread_name": "mgr-fin", "backtrace": [ "(()+0x12b20) [0x7f835a51cb20]", "(gsignal()+0x10f) [0x7f8358f6d7ff]", "(abort()+0x127) [0x7f8358f57c35]", "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f835c07b735]", "(()+0x27a8fe) [0x7f835c07b8fe]", "(()+0x34cef6) [0x7f835c14def6]", "(DispatchQueue::start()+0x3a) [0x7f835c29697a]", "(AsyncMessenger::ready()+0xcd) [0x7f835c3340cd]", "(Messenger::add_dispatcher_head(Dispatcher*)+0x68) [0x7f835c3f8478]", "(MonClient::get_monmap_and_config()+0xbb) [0x7f835c3f66ab]", "(ceph_mount_info::init()+0x4d) [0x7f834298435d]", "(()+0x3680f) [0x7f8342cd280f]", "(()+0x19d421) [0x7f835ba5c421]", "(_PyEval_EvalFrameDefault()+0x498) [0x7f835ba5ce08]", "(()+0x179c78) [0x7f835ba38c78]", "(()+0x19d1c7) [0x7f835ba5c1c7]", "(_PyEval_EvalFrameDefault()+0x498) [0x7f835ba5ce08]", "(()+0x179c78) [0x7f835ba38c78]", "(()+0x19d1c7) [0x7f835ba5c1c7]", "(_PyEval_EvalFrameDefault()+0x498) [0x7f835ba5ce08]", "(()+0x1221d4) [0x7f835b9e11d4]", "(()+0x122c55) [0x7f835b9e1c55]", "(()+0x19cf27) [0x7f835ba5bf27]", "(_PyEval_EvalFrameDefault()+0x498) [0x7f835ba5ce08]", "(_PyFunction_FastCallDict()+0x122) [0x7f835b9b9ec2]", "(_PyObject_FastCallDict()+0x70e) [0x7f835b9bac9e]", "(()+0x10dc70) [0x7f835b9ccc70]", "(_PyObject_FastCallDict()+0x6ec) [0x7f835b9bac7c]", "(PyObject_CallFunctionObjArgs()+0xe8) [0x7f835b9dbd48]", "(_PyEval_EvalFrameDefault()+0x2588) [0x7f835ba5eef8]", "(()+0xf99b4) [0x7f835b9b89b4]", "(()+0x179e60) [0x7f835ba38e60]", "(()+0x19d1c7) [0x7f835ba5c1c7]", "(_PyEval_EvalFrameDefault()+0x10d5) [0x7f835ba5da45]", "(()+0x179c78) [0x7f835ba38c78]", "(()+0x19d1c7) [0x7f835ba5c1c7]", "(_PyEval_EvalFrameDefault()+0x498) [0x7f835ba5ce08]", "(()+0xfa326) [0x7f835b9b9326]", "(()+0x179e60) [0x7f835ba38e60]", "(()+0x19d1c7) [0x7f835ba5c1c7]", "(_PyEval_EvalFrameDefault()+0x498) [0x7f835ba5ce08]", "(()+0x179c78) [0x7f835ba38c78]", "(()+0x19d1c7) [0x7f835ba5c1c7]", "(_PyEval_EvalFrameDefault()+0x498) [0x7f835ba5ce08]", "(_PyFunction_FastCallDict()+0x122) [0x7f835b9b9ec2]", "(_PyObject_FastCallDict()+0x70e) [0x7f835b9bac9e]", "(()+0x10dc70) [0x7f835b9ccc70]", "(PyObject_Call()+0x4b) [0x7f835b9c1acb]", "(PyObject_CallMethod()+0x10b) [0x7f835ba5ac6b]", "(ActivePyModule::handle_command(ModuleCommand const&, MgrSession const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&, ceph::buffer::v15_2_0::list const&, std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >*, std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >*)+0x222) [0x55bc0b8a0cb2]", "(()+0x1b0fdd) [0x55bc0b8f5fdd]", "(Context::complete(int)+0xd) [0x55bc0b8b0bdd]", "(Finisher::finisher_thread_entry()+0x1a5) [0x7f835c10b465]", "(()+0x814a) [0x7f835a51214a]", "(clone()+0x43) [0x7f8359032f23]" ], "ceph_version": "15.2.8", "crash_id": "2021-02-10T20:00:32.989661Z_201fd5fb-6e0a-4b50-8a95-fdf9ed9aeb81", "entity_name": "mgr.sds01-cp.cwcxek", "os_id": "centos", "os_name": "CentOS Linux", "os_version": "8", "os_version_id": "8", "process_name": "ceph-mgr", "stack_sig": "e1c15d685283e7598b128a37a328ba86ec433dfef97597ac9453b5d52608feda", "timestamp": "2021-02-10T20:00:32.989661Z", "utsname_hostname": "sds01-cp", "utsname_machine": "x86_64", "utsname_release": "4.18.0-240.10.1.el8_3.x86_64", "utsname_sysname": "Linux", "utsname_version": "#1 SMP Wed Dec 16 03:30:52 EST 2020" }

3 years, 1 month

4
5
0 0

'ceph df' %USED explanation

by Mark Johnson

I'm in the middle of increasing PG count for one of our pools by making small increments, waiting for the process to complete, rinse and repeat. I'm doing it this way so I can control when all this activity is happening and keeping it away from the busier production traffic times. I'm expecting some inbalance as PGs get created on already unbalanced OSDs, however our monitoring picked up something today that I'm not really understanding. Our total utilization is just over 50% and about 96% of our total data is in this one pool. Due to there not being enough PGs, the amount of data in each is quite large and since they aren't evenly spread across the OSDs, there's a bit of inbalance. That's all cool and to be expected, which is the reason for increasing the PG count in the first place. However, as some PGs are splitting, the new PGs are sometimes being created on OSDs that already have a disproportionate amount of data. Again, not totally unexpected. Our monitoring detected the usage of this pool to be >85% today as I neared the end of another increase in PG count. What I'm not understanding is how this value is determined. I've read other posts and the calculations suggested don't give a result that equals what shows in my %USED column. I'm suspecting that it's somehow related to the MAX AVAIL value (which I believe is somewhat indirectly related to the amount available based on the individual OSD utilization), but none of the posts I read mention this in their calculations and I've been unable to create a formula with any of the values I have to end up with the &USED value I have. For the record, my current total utilization based on a 'ceph osd df' looks like this: TOTAL 39507G(SIZE) 19931G(USE) 17568G(AVAIL) 50.45(%USE) My most utilised OSD (currently in the process of moving some data off this OSD) is 81.58% used with 188G available and a variance of 1.62. A cut-down output of 'ceph df' looks like this: GLOBAL: SIZE AVAIL RAW USED %RAW USED 39507G 17569G 19930G 50.45 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS default.rgw.buckets.data 30 9552G 86.05 1548G 36285066 I suspect that as I get the utilization of my over-utilized OSDs down, this %USED value will drop. But, I'd just love to fully understand how this value is calculated. Thanks, Mark J

3 years, 1 month

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2021