December 2019 - ceph-users

by Frank R

Is the public_network definition in ceph.conf just used to determine which interface or IP to use for the public network or does it need to need to encompass the public ip addresses of all cluster nodes? Specifically, can the public_network be defined differently for different OSD nodes as long as they have layer 3 connectivity? thx Frank

4 years, 5 months

1
0
0 0

rbd_open_by_id crash when connection timeout

by yangjun＠cmss.chinamobile.com

Hi Jason, dongsheng I found a problem using rbd_open_by_id when connection timeout(errno = 110, ceph version 12.2.8, there is no change about rbd_open_by_id in master branch). int r = ictx->state->open(false); if (r < 0) { // r = -110 delete ictx; // crash，the stack is shown below： } else { *image = (rbd_image_t)ictx; } Stack is ： (gdb) bt Program terminated with signal 11, Segmentation fault. #0 Mutex::Lock (this=this@entry=0x8, no_lockdep=no_lockdep@entry=false) at /var/lib/jenkins/workspace/ceph_L/ceph/src/common/Mutex.cc:92 #1 0x0000ffff765eb814 in Locker (m=..., this=<synthetic pointer>) at /var/lib/jenkins/workspace/ceph_L/ceph/src/common/Mutex.h:115 #2 PerfCountersCollection::remove (this=0x0, l=0xffff85bf0588 <main_arena+88>) at /var/lib/jenkins/workspace/ceph_L/ceph/src/common/perf_counters.cc:62 #3 0x0000ffff757f26c4 in librbd::ImageCtx::perf_stop (this=0x2065d2f0) at /var/lib/jenkins/workspace/ceph_L/ceph/src/librbd/ImageCtx.cc:426 #4 0x0000ffff757f6e48 in librbd::ImageCtx::~ImageCtx (this=0x2065d2f0, __in_chrg=<optimized out>) at /var/lib/jenkins/workspace/ceph_L/ceph/src/librbd/ImageCtx.cc:239 #5 0x0000ffff757e8e70 in rbd_open_by_id (p=p@entry=0x2065e780, id=id@entry=0xffff85928bb4 "20376aa706c0", image=image@entry=0xffff75b9b0b8, snap_name=snap_name@entry=0x0) at /var/lib/jenkins/workspace/ceph_L/ceph/src/librbd/librbd.cc:2692 #6 0x0000ffff75aed524 in __pyx_pf_3rbd_5Image___init__ ( __pyx_v_read_only=0xffff85f02bb8 <_Py_ZeroStruct>, __pyx_v_snapshot=0xffff85f15938 <_Py_NoneStruct>, __pyx_v_by_id=0xffff85f02ba0 <_Py_TrueStruct>, __pyx_v_name=0xffff85928b90, __pyx_v_ioctx=0xffff7f0283d0, __pyx_v_self=0xffff75b9b0a8) at /var/lib/jenkins/workspace/ceph_L/ceph/build/src/pybind/rbd/pyrex/rbd.c:12662 #7 __pyx_pw_3rbd_5Image_1__init__ (__pyx_v_self=0xffff75b9b0a8, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at /var/lib/jenkins/workspace/ceph_L/ceph/build/src/pybind/rbd/pyrex/rbd.c:12378 #8 0x0000ffff85e02924 in type_call () from /lib64/libpython2.7.so.1.0 #9 0x0000ffff85dac254 in PyObject_Call () from /lib64/libpython2.7.so.1.0 Meanwhile, I have sone questions about rbd_open, if state->open failed（return value r < 0）, Is there a risk of memory leaks about ictx? extern "C" int rbd_open(rados_ioctx_t p, const char *name, rbd_image_t *image, const char *snap_name) { librados::IoCtx io_ctx; librados::IoCtx::from_rados_ioctx_t(p, io_ctx); TracepointProvider::initialize<tracepoint_traits>(get_cct(io_ctx)); librbd::ImageCtx *ictx = new librbd::ImageCtx(name, "", snap_name, io_ctx, false); tracepoint(librbd, open_image_enter, ictx, ictx->name.c_str(), ictx->id.c_str(), ictx->snap_name.c_str(), ictx->read_only); int r = ictx->state->open(false); if (r >= 0) { // if r < 0, Is there a risk of memory leaks? *image = (rbd_image_t)ictx; } tracepoint(librbd, open_image_exit, r); return r; } Can you give me some advice? many thanks. yangjun(a)cmss.chinamobile.com

4 years, 5 months

3
6
0 0

Upgrade from Jewel to Nautilus

by 徐蕴

Hello, We are planning to upgrade our cluster from Jewel to Nautilus. From my understanding, leveldb of monitor and filestore of OSDs will not be converted to rocketdb and bluestore automatically. So do you suggest to convert them manually after upgrading software? Is there any document or guidance available? Br, Xu Yun

4 years, 5 months

2
2
0 0

Starting service rbd-target-api fails

by Thomas Schneider

Hi, I want to setup Ceph iSCSI Gateway and I follow this <https://docs.ceph.com/docs/master/rbd/iscsi-overview/> documentation. In step "Setup" of process "Configuring the iSCSI target using the command line interface <https://docs.ceph.com/docs/master/rbd/iscsi-target-cli/>" I cannot start service rbd-target-api. There's no error message in status or anywhere else: root@ld5505:~# systemctl status rbd-target-api _ rbd-target-api.service - Ceph iscsi target configuration API Loaded: loaded (/lib/systemd/system/rbd-target-api.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Wed 2019-12-04 13:47:51 CET; 3min 16s ago Process: 4143457 ExecStart=/usr/bin/rbd-target-api (code=exited, status=1/FAILURE) Main PID: 4143457 (code=exited, status=1/FAILURE) Dec 04 13:47:51 ld5505 systemd[1]: rbd-target-api.service: Service RestartSec=100ms expired, scheduling restart. Dec 04 13:47:51 ld5505 systemd[1]: rbd-target-api.service: Scheduled restart job, restart counter is at 3. Dec 04 13:47:51 ld5505 systemd[1]: Stopped Ceph iscsi target configuration API. Dec 04 13:47:51 ld5505 systemd[1]: rbd-target-api.service: Start request repeated too quickly. Dec 04 13:47:51 ld5505 systemd[1]: rbd-target-api.service: Failed with result 'exit-code'. Dec 04 13:47:51 ld5505 systemd[1]: Failed to start Ceph iscsi target configuration API. Can you please advise how to troubleshoot this issue? THX

4 years, 5 months

2
4
0 0

What are the performance implications 'ceph fs set cephfs allow_new_snaps true'?

by Marc Roos

I recently enabled this and now my rsyncs are taking hours and hours longer.

4 years, 5 months

1
0
0 0

Crushmap format in nautilus: documentation out of date

by Rodrigo Severo - Fábrica

Hi, The crushmap produced by ceph osd getcrushmap in ceph version 14.2.4 has more info than defined in https://docs.ceph.com/docs/cuttlefish/rados/operations/crush-map/ There is a second id per bucket: host a1-df { id -3 # do not change unnecessarily id -4 class hdd # do not change unnecessarily # weight 1.819 alg straw2 hash 0 # rjenkins1 item osd.0 weight 1.819 } What exactly is this second id which, by the way, comes with a device class specified? Regards, Rodrigo Severo

4 years, 5 months

2
1
0 0

bluestore rocksdb behavior

by Frank R

Hi all, How is the following situation handled with bluestore: 1. You have a 200GB OSD (no separate DB/WAL devices) 2. The metadata grows past 30G for some reason and wants to create a 300GB level but can't? Where is the metadata over 30G stored?

4 years, 5 months

3
4
0 0

iSCSI Gateway reboots and permanent loss

by Wesley Dillingham

We utilize 4 iSCSI gateways in a cluster and have noticed the following during patching cycles when we sequentially reboot single iSCSI-gateways: "gwcli" often hangs on the still-up iSCSI GWs but sometimes still functions and gives the message: "1 gateway is inaccessible - updates will be disabled" This got me thinking about what the course of action would be should an iSCSI gateway fail permanently or semi-permanently, say a hardware issue. What would be the best course of action to instruct the remaining iSCSI gateways that one of them is no longer available and that they should allow updates again and take ownership of the now-defunct-node's LUNS? I'm guessing pulling down the RADOS config object and rewriting it and re-put'ing it followed by a rbd-target-api restart might do the trick but am hoping there is a more "in-band" and less potentially devastating way to do this. Thanks for any insights. Respectfully, *Wes Dillingham* wes(a)wesdillingham.com LinkedIn <http://www.linkedin.com/in/wesleydillingham>

4 years, 5 months

5
11
0 0

Re: 2 different ceph-users lists?

by Rodrigo Severo - Fábrica

Em qui., 5 de dez. de 2019 às 16:38, Marc Roos <M.Roos(a)f1-outsourcing.eu> escreveu: > > > > ceph-users(a)lists.ceph.com is old one, why this is, I also do not know Ok Marc. Thanks for your information. Rodrigo

4 years, 5 months

1
0
0 0

2 different ceph-users lists?

by Rodrigo Severo - Fábrica

Hi, Are there 2 different ceph-users list? ceph-users(a)lists.ceph.com and ceph-users(a)ceph.io Why? What's the difference? Regards, Rodrigo Severo

4 years, 5 months

2
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users December 2019