January 2020 - ceph-users

Poor performance after (incomplete?) upgrade to Nautilus

by Georg F

Hi all, since a few weeks our Nautilus cluster was struggling with severe performance issues. When an OSD would go down, the rebalancing was really slow. Long periods with no data transfer at all (client and rebalancing!) and times with rebalancing traffic only. However, client traffic was almost stalled for the whole period until all objects were in place again (VMs were frozen). PGs were stuck in peering or inactive for long times. Sometimes we had to restart the ceph-mon in order to get the whole process running again. The issues started all of a sudden, we don't remember doing any changes to the configuration. The whole cluster has been updated from Mimic to Nautilus (14.2.3) in September while the issue occurred just a few weeks ago. Updating it to 14.2.5 did not resolve the issue back then. Looking through mailing lists I found the following message: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-July/028035.html So I ran "ceph osd require-osd-release nautilus" and all of a sudden the problems where gone! I do not recall executing that command right after the upgrade because the documentation states "Complete the upgrade by disallowing pre-Nautilus OSDs and enabling all new Nautilus-only functionality.". As by that point in time all OSDs, MONs and MGRs were successfully updated there was no reason to believe this command would be necessary. Therefore I got two questions: 1. What exactly does the command do besides preventing old OSDs from joining? 2. What could have been the issue with the cluster and how did this command fix it? If it is really that important to run the command, the docs should state this more clearly. I appreciate any insight on this topic. Thanks, Georg

4 years, 3 months

1
0
0 0

ceph balancer <argument> runs for minutes or hangs

by Thomas Schneider

Hi, happy new year to you! I'm running a multinode cluster with 3 MGR nodes. The issue I'm facing now is that ceph balancer <argument> runs for minutes or, in worst case, hangs. I have documented the runtime of the following executions: root@ld3955:~# date && time ceph balancer status Mon Dec 23 10:06:12 CET 2019 { "active": true, "plans": [], "mode": "upmap" } real 1m45,045s user 0m0,315s sys 0m0,026s root@ld3955:~# date && time ceph balancer status Tue Jan 7 08:11:24 CET 2020 ^CInterrupted Traceback (most recent call last): File "/usr/bin/ceph", line 1263, in <module> retval = main() File "/usr/bin/ceph", line 1194, in main verbose) File "/usr/bin/ceph", line 619, in new_style_command ret, outbuf, outs = do_command(parsed_args, target, cmdargs, sigdict, inbuf, verbose) File "/usr/bin/ceph", line 593, in do_command return ret, '', '' UnboundLocalError: local variable 'ret' referenced before assignment real 102m44,084s user 0m2,404s sys 0m1,065s root@ld3955:~# date && time ceph balancer off Tue Jan 7 09:57:36 CET 2020 real 1m45,371s user 0m0,358s sys 0m0,013s root@ld3955:~# date && time ceph balancer on Tue Jan 7 14:57:03 CET 2020 real 0m0,452s user 0m0,284s sys 0m0,020s root@ld3955:~# date && time ceph balancer status Tue Jan 7 14:57:11 CET 2020 { "active": true, "plans": [], "mode": "upmap" } real 1m52,902s user 0m0,301s sys 0m0,042s root@ld3955:~# date && time ceph balancer off Wed Jan 8 08:49:26 CET 2020 ^CInterrupted Traceback (most recent call last): File "/usr/bin/ceph", line 1263, in <module> retval = main() File "/usr/bin/ceph", line 1194, in main verbose) File "/usr/bin/ceph", line 619, in new_style_command ret, outbuf, outs = do_command(parsed_args, target, cmdargs, sigdict, inbuf, verbose) File "/usr/bin/ceph", line 593, in do_command return ret, '', '' UnboundLocalError: local variable 'ret' referenced before assignment real 14m29,097s user 0m0,579s sys 0m0,157s In correlation with this finding I have identified that active MGR node is using +100% CPU, to be pricise 108-120%. To workaround this issue I must stop the active MRG node service and wait until another node becomes active. What's the issue with MGR service here? Should I open a bug report? Regards

4 years, 3 months

1
0
0 0

RBD Mirroring down+unknown

by miguel.castillo＠centro.net

Happy New Year Ceph Community! I'm in the process of figuring out RBD mirroring with Ceph and having a really tough time with it. I'm trying to set up just one way mirroring right now on some test systems (baremetal servers, all Debian 9). The first cluster is 3 nodes, and the 2nd cluster is 2 nodes (not worried about a properly performing setup, just the functionality of RBD mirroring right now). The purpose is to have a passive failover ceph cluster in a separate DC. Mirroring seems like the best solution, but if we can't get it working, we'll end up resorting to a scheduled rsync which is less than ideal. I've followed several guides, read through a lot of documentation, and nothing has worked for me thus far. If anyone can offer some troubleshooting help or insight into what I might have missed in this setup, I'd greatly appreciate it! I also don't fully understand the relationship between images and pools and how you're supposed to configure statically sized images for a pool that has a variable amount of data, but that's a question for afterwards, I think :) Once RBD mirroring is set up, the mirror test image status shows as down+unknown: On ceph1-dc2: rbd --cluster dc1ceph mirror pool status fs_data --verbose health: WARNING images: 1 total 1 unknown mirror_test: global_id: c335017c-9b8f-49ee-9bc1-888789537c47 state: down+unknown description: status not found last_update: Here are the commands I run using ceph-deploy on both clusters to get everything up and running (run from a deploy directory on the first node of each cluster). The clusters are created at the same time, and rbd setup commands are only run after the clusters are up and healthy, and the fs_data pool is created. ----------------------------------------------------------- Cluster 1 (dc1ceph): ceph-deploy new ceph1-dc1 ceph2-dc1 ceph3-dc1 sed -i '$ s,.*,public_network = *.*.*.0/24\n,g' ceph.conf ceph-deploy install ceph1-dc1 ceph2-dc1 ceph3-dc1 --release luminous ceph-deploy mon create-initial ceph-deploy admin ceph1-dc1 ceph2-dc1 ceph3-dc1 ceph-deploy mgr create ceph1-dc1 ceph2-dc1 ceph3-dc1 for x in b c d e f g h i j k; do ceph-deploy osd create --data /dev/sd${x}1 ceph1-dc1 ; done for x in b c d e f g h i j k; do ceph-deploy osd create --data /dev/sd${x}1 ceph2-dc1 ; done for x in b c d e f g h i j k; do ceph-deploy osd create --data /dev/sd${x}1 ceph3-dc1 ; done ceph-deploy mds create ceph1-dc1 ceph2-dc1 ceph3-dc1 ceph-deploy rgw create ceph1-dc1 ceph2-dc1 ceph3-dc1 for f in 1 2 ; do scp ceph.client.admin.keyring ceph$f-dc2:/etc/ceph/dc1ceph.client.admin.keyring ; done for f in 1 2 ; do scp ceph.conf ceph$f-dc2:/etc/ceph/dc1ceph.conf ; done for f in 1 2 ; do ssh ceph$f-dc2 "chown ceph.ceph /etc/ceph/dc1ceph*" ; done ceph osd pool create fs_data 512 512 replicated rbd --cluster ceph mirror pool enable fs_data image rbd --cluster dc2ceph mirror pool enable fs_data image rbd --cluster ceph mirror pool peer add fs_data client.admin@dc2ceph (generated id: b5e347b3-0515-4142-bc49-921a07636865) rbd create fs_data/mirror_test --size=1G rbd feature enable fs_data/mirror_test journaling rbd mirror image enable fs_data/mirror_test chown ceph.ceph ceph.client.admin.keyring Cluster 2 (dc2ceph): ceph-deploy new ceph1-dc2 ceph2-dc2 sed -i '$ s,.*,public_network = *.*.*.0/24\n,g' ceph.conf ceph-deploy install ceph1-dc2 ceph2-dc2 --release luminous ceph-deploy mon create-initial ceph-deploy admin ceph1-dc2 ceph2-dc2 ceph-deploy mgr create ceph1-dc2 ceph2-dc2 for x in b c d e f g h i j k; do ceph-deploy osd create --data /dev/sd${x}1 ceph1-dc2 ; done for x in b c d e f g h i j k; do ceph-deploy osd create --data /dev/sd${x}1 ceph2-dc2 ; done ceph-deploy mds create ceph1-dc2 ceph2-dc2 ceph-deploy rgw create ceph1-dc2 ceph2-dc2 apt install rbd-mirror for f in 1 2 3 ; do scp ceph.conf ceph$f-dc1:/etc/ceph/dc2ceph.conf ; done for f in 1 2 3 ; do scp ceph.client.admin.keyring ceph$f-dc1:/etc/ceph/dc2ceph.client.admin.keyring ; done for f in 1 2 3 ; do ssh ceph$f-dc1 "chown ceph.ceph /etc/ceph/dc2ceph*" ; done ceph osd pool create fs_data 512 512 replicated rbd --cluster ceph mirror pool peer add fs_data client.admin@dc1ceph (generated id: e486c401-e24d-49bc-9800-759760822282) systemctl enable ceph-rbd-mirror@admin systemctl start ceph-rbd-mirror@admin rbd --cluster dc1ceph mirror pool status fs_data --verbose Cluster 1: ls /etc/ceph: ceph.client.admin.keyring ceph.conf dc2ceph.client.admin.keyring dc2ceph.conf rbdmap tmpG36OYs cat /etc/ceph/ceph.conf: [global] fsid = 8fede407-50e1-4487-8356-3dc98b30c500 mon_initial_members = ceph1-dc1, ceph2-dc1, ceph3-dc1 mon_host = *.*.*.1,*.*.*.27,*.*.*.41 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx public_network = *.*.*.0/24 cat /etc/ceph/dc2ceph.conf [global] fsid = 813ff410-02dc-47bd-b678-38add38495bb mon_initial_members = ceph1-dc2, ceph2-dc2 mon_host = *.*.*.56,*.*.*.0 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx public_network = *.*.*.0/24 Cluster 2: ls /etc/ceph: ceph.client.admin.keyring ceph.conf dc1ceph.client.admin.keyring dc1ceph.conf rbdmap tmp_yxkPs cat /etc/ceph/ceph.conf [global] fsid = 813ff410-02dc-47bd-b678-38add38495bb mon_initial_members = ceph1-dc2, ceph2-dc2 mon_host = *.*.*.56,*.*.*.70 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx public_network = *.*.*.0/24 cat /etc/ceph/dc1ceph.conf [global] fsid = 8fede407-50e1-4487-8356-3dc98b30c500 mon_initial_members = ceph1-dc1, ceph2-dc1, ceph3-dc1 mon_host = *.*.*.1,*.*.*.27,*.*.*.41 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx public_network = *.*.*.0/24 RBD Mirror daemon status: ceph-rbd-mirror(a)admin.service - Ceph rbd mirror daemon Loaded: loaded (/lib/systemd/system/ceph-rbd-mirror@.service; enabled; vendor preset: enabled) Active: inactive (dead) since Mon 2020-01-06 16:21:44 EST; 3s ago Process: 910178 ExecStart=/usr/bin/rbd-mirror -f --cluster ${CLUSTER} --id admin --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS) Main PID: 910178 (code=exited, status=0/SUCCESS) Jan 06 16:21:44 ceph1-dc2 systemd[1]: Started Ceph rbd mirror daemon. Jan 06 16:21:44 ceph1-dc2 rbd-mirror[910178]: 2020-01-06 16:21:44.462916 7f76ecf88780 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory Jan 06 16:21:44 ceph1-dc2 rbd-mirror[910178]: 2020-01-06 16:21:44.462949 7f76ecf88780 -1 monclient: ERROR: missing keyring, cannot use cephx for authentication Jan 06 16:21:44 ceph1-dc2 rbd-mirror[910178]: failed to initialize: (2) No such file or directory2020-01-06 16:21:44.463874 7f76ecf88780 -1 rbd::mirror::Mirror: 0x558d3ce6ce20 init: error connecting to local cluster ------------------------------------------- I also tried running the ExecStart command manually, substituting in different values for the parameters, and just never got it to work. If more info is needed, please don’t hesitate to ask. Thanks in advance! -Miguel

4 years, 3 months

2
4
2 0

MDS failing under load with large cache sizes

by Janek Bevendorff

Hi, I am trying to copy the contents of our storage server into a CephFS, but am experiencing stability issues with my MDSs. The CephFS sits on top of an erasure-coded pool with 5 MONs, 5 MDSs and a max_mds setting of two. My Ceph cluster version is Nautilus, the client is Mimic and uses the kernel module to mount the FS. The index of filenames to copy is about 23GB and I am using 16 parallel rsync processes over a 10G link to copy the files over to Ceph. This works perfectly for a while, but then the MDSs start reporting oversized caches (between 20 and 50GB, sometimes more) and an inode count between 1 and 4 million. Particularly the Inode count seems quite high to me. Each rsync job has 25k files to work with, so if all 16 processes open all their files at the same time, I should not exceed 400k. Even if I double this number to account for the client's page cache, I should get nowhere near that number of inodes (a sync flush takes about 1 second). Then after a few hours, my MDSs start failing with messages like this: -21> 2019-07-22 14:00:05.877 7f67eacec700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 -20> 2019-07-22 14:00:05.877 7f67eacec700 0 mds.beacon.XXX Skipping beacon heartbeat to monitors (last acked 24.0042s ago); MDS internal heartbeat is not healthy! The standby nodes try to take over, but take forever to become active and will fail as well eventually. During my research, I found this related topic: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/015959.html, but I tried everything in there from increasing to lowering my cache size, the number of segments etc. I also played around with the number of active MDSs and two appears to work the best, whereas one cannot keep up with the load and three seems to be the worst of all choices. Do you have any ideas how I can improve the stability of my MDS damons to handle the load properly? single 10G link is a toy and we could query the cluster with a lot more requests per second, but it's already yielding to 16 rsync processes. Thanks

4 years, 3 months

4
33
0 0

Disk fail, some question...

by Marco Gaiarin

Happy new year to all! In these holidays i've suffered a disk failure, but hitted also an 'inconsistent pg' error, and i want to understand. Ceph 12.2.12, filestore. Starting from 27/12 i get classical disk error: Dec 27 20:52:21 capitanmarvel kernel: [345907.286795] ata1.00: exception Emask 0x0 SAct 0xfe00000 SErr 0x0 action 0x0 Dec 27 20:52:21 capitanmarvel kernel: [345907.286849] ata1.00: irq_stat 0x40000008 Dec 27 20:52:21 capitanmarvel kernel: [345907.286880] ata1.00: failed command: READ FPDMA QUEUED Dec 27 20:52:21 capitanmarvel kernel: [345907.286920] ata1.00: cmd 60/00:a8:20:87:3b/04:00:00:00:00/40 tag 21 ncq dma 524288 in Dec 27 20:52:21 capitanmarvel kernel: [345907.286920] res 41/40:00:46:8a:3b/00:00:00:00:00/40 Emask 0x409 (media error) <F> Dec 27 20:52:21 capitanmarvel kernel: [345907.287018] ata1.00: status: { DRDY ERR } Dec 27 20:52:21 capitanmarvel kernel: [345907.287046] ata1.00: error: { UNC } Dec 27 20:52:21 capitanmarvel kernel: [345907.288676] ata1.00: configured for UDMA/133 Dec 27 20:52:21 capitanmarvel kernel: [345907.288698] sd 1:0:0:0: [sdc] tag#21 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Dec 27 20:52:21 capitanmarvel kernel: [345907.288702] sd 1:0:0:0: [sdc] tag#21 Sense Key : Medium Error [current] Dec 27 20:52:21 capitanmarvel kernel: [345907.288705] sd 1:0:0:0: [sdc] tag#21 Add. Sense: Unrecovered read error - auto reallocate failed Dec 27 20:52:21 capitanmarvel kernel: [345907.288708] sd 1:0:0:0: [sdc] tag#21 CDB: Read(10) 28 00 00 3b 87 20 00 04 00 00 Dec 27 20:52:21 capitanmarvel kernel: [345907.288711] print_req_error: I/O error, dev sdc, sector 3902022 but also: Dec 27 20:52:24 capitanmarvel ceph-osd[3852]: 2019-12-27 20:52:24.714716 7f821fbfd700 -1 log_channel(cluster) log [ERR] : 4.9b missing primary copy of 4:d97871c4:::rbd_data.142b816b8b4567.0000000000012ae1:head, will try copies on 8,14 OSD 'flip-flop' a bit for some days. At first scrub, i got: cluster: id: 8794c124-c2ec-4e81-8631-742992159bd6 health: HEALTH_ERR 1 scrub errors Possible data damage: 1 pg inconsistent services: mon: 5 daemons, quorum blackpanther,capitanmarvel,4,2,3 mgr: hulk(active), standbys: blackpanther, deadpool, thor, capitanmarvel osd: 12 osds: 12 up, 12 in data: pools: 3 pools, 768 pgs objects: 671.04k objects, 2.54TiB usage: 7.62TiB used, 9.66TiB / 17.3TiB avail pgs: 766 active+clean 1 active+clean+inconsistent 1 active+clean+scrubbing+deep finally, OSD die, and so i got (after automatic remapping): cluster: id: 8794c124-c2ec-4e81-8631-742992159bd6 health: HEALTH_ERR 1 scrub errors Possible data damage: 1 pg inconsistent services: mon: 5 daemons, quorum blackpanther,capitanmarvel,4,2,3 mgr: hulk(active), standbys: blackpanther, deadpool, thor, capitanmarvel osd: 12 osds: 11 up, 11 in data: pools: 3 pools, 768 pgs objects: 674.26k objects, 2.55TiB usage: 7.65TiB used, 8.71TiB / 16.4TiB avail pgs: 767 active+clean 1 active+clean+inconsistent To fix the issue i've tried to read the docs (looking for 'OSD_SCRUB_ERRORS'), finding: https://docs.ceph.com/docs/doc-12.2.0-major-changes/rados/operations/health… but the link within is empty: https://docs.ceph.com/docs/doc-12.2.0-major-changes/rados/operations/pg-rep… and after fiddling a bit with google, i've found: https://ceph.io/geen-categorie/ceph-manually-repair-object/ that permit me to fix the issue easily with 'ceph pg repair'. Two question: 1) the missing page on 'pg-repair' is a bug of documentation? There's something i can do? 2) what happens? - While, if the OSD was not able to write data to the OSD, they are not automatically relocated to other OSD? This violate the crushmap? - while, when the failing OSD get out, the inconsistent PG get not automatically fixed? I've count=3, the other 2 copies are not coherent? But, if so, how ceph was able to fix them? Sorry... and thanks. ;) -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)

4 years, 3 months

1
0
0 0

RESEND: Re: PG Balancer Upmap mode not working

by David Zafman

Please file a tracker with the symptom and examples. Please attach your OSDMap (ceph osd getmap > osdmap.bin). Note that https://github.com/ceph/ceph/pull/31956 has the Nautilus version of improved upmap code. It also changes osdmaptool to match the mgr behavior, so that one can observe the behavior of the upmap balancer offline. Thanks David On 12/8/19 11:04 AM, Philippe D'Anjou wrote: > It's only getting worse after raising PGs now. > > Anything between: > 96 hdd 9.09470 1.00000 9.1 TiB 4.9 TiB 4.9 TiB 97 KiB 13 GiB 4.2 > TiB 53.62 0.76 54 up > > and > > 89 hdd 9.09470 1.00000 9.1 TiB 8.1 TiB 8.1 TiB 88 KiB 21 GiB 1001 > GiB 89.25 1.27 87 up > > How is that possible? I dont know how much more proof I need to > present that there's a bug. > > > > _______________________________________________ > ceph-users mailing list > ceph-users(a)lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

4 years, 3 months

5
20
0 0

RBD Mirroring down+unknown

by Miguel Castillo

Happy New Year Ceph Community! I'm in the process of figuring out RBD mirroring with Ceph and having a really tough time with it. I'm trying to set up just one way mirroring right now on some test systems (baremetal servers, all Debian 9). The first cluster is 3 nodes, and the 2nd cluster is 2 nodes (not worried about a properly performing setup, just the functionality of RBD mirroring right now). The purpose is to have a passive failover ceph cluster in a separate DC. Mirroring seems like the best solution, but if we can't get it working, we'll end up resorting to a scheduled rsync which is less than ideal. I've followed several guides, read through a lot of documentation, and nothing has worked for me thus far. If anyone can offer some troubleshooting help or insight into what I might have missed in this setup, I'd greatly appreciate it! I also don't fully understand the relationship between images and pools and how you're supposed to configure statically sized images for a pool that has a variable amount of data, but that's a question for afterwards, I think :) Once RBD mirroring is set up, the mirror test image status shows as down+unknown: On ceph1-dc2: rbd --cluster dc1ceph mirror pool status fs_data --verbose health: WARNING images: 1 total 1 unknown mirror_test: global_id: c335017c-9b8f-49ee-9bc1-888789537c47 state: down+unknown description: status not found last_update: Here are the commands I run using ceph-deploy on both clusters to get everything up and running (run from a deploy directory on the first node of each cluster). The clusters are created at the same time, and rbd setup commands are only run after the clusters are up and healthy, and the fs_data pool is created. ----------------------------------------------------------- Cluster 1 (dc1ceph): ceph-deploy new ceph1-dc1 ceph2-dc1 ceph3-dc1 sed -i '$ s,.*,public_network = *.*.*.0/24\n,g' ceph.conf ceph-deploy install ceph1-dc1 ceph2-dc1 ceph3-dc1 --release luminous ceph-deploy mon create-initial ceph-deploy admin ceph1-dc1 ceph2-dc1 ceph3-dc1 ceph-deploy mgr create ceph1-dc1 ceph2-dc1 ceph3-dc1 for x in b c d e f g h i j k; do ceph-deploy osd create --data /dev/sd${x}1 ceph1-dc1 ; done for x in b c d e f g h i j k; do ceph-deploy osd create --data /dev/sd${x}1 ceph2-dc1 ; done for x in b c d e f g h i j k; do ceph-deploy osd create --data /dev/sd${x}1 ceph3-dc1 ; done ceph-deploy mds create ceph1-dc1 ceph2-dc1 ceph3-dc1 ceph-deploy rgw create ceph1-dc1 ceph2-dc1 ceph3-dc1 for f in 1 2 ; do scp ceph.client.admin.keyring ceph$f-dc2:/etc/ceph/dc1ceph.client.admin.keyring ; done for f in 1 2 ; do scp ceph.conf ceph$f-dc2:/etc/ceph/dc1ceph.conf ; done for f in 1 2 ; do ssh ceph$f-dc2 "chown ceph.ceph /etc/ceph/dc1ceph*" ; done ceph osd pool create fs_data 512 512 replicated rbd --cluster ceph mirror pool enable fs_data image rbd --cluster dc2ceph mirror pool enable fs_data image rbd --cluster ceph mirror pool peer add fs_data client.admin@dc2ceph (generated id: b5e347b3-0515-4142-bc49-921a07636865) rbd create fs_data/mirror_test --size=1G rbd feature enable fs_data/mirror_test journaling rbd mirror image enable fs_data/mirror_test chown ceph.ceph ceph.client.admin.keyring Cluster 2 (dc2ceph): ceph-deploy new ceph1-dc2 ceph2-dc2 sed -i '$ s,.*,public_network = *.*.*.0/24\n,g' ceph.conf ceph-deploy install ceph1-dc2 ceph2-dc2 --release luminous ceph-deploy mon create-initial ceph-deploy admin ceph1-dc2 ceph2-dc2 ceph-deploy mgr create ceph1-dc2 ceph2-dc2 for x in b c d e f g h i j k; do ceph-deploy osd create --data /dev/sd${x}1 ceph1-dc2 ; done for x in b c d e f g h i j k; do ceph-deploy osd create --data /dev/sd${x}1 ceph2-dc2 ; done ceph-deploy mds create ceph1-dc2 ceph2-dc2 ceph-deploy rgw create ceph1-dc2 ceph2-dc2 apt install rbd-mirror for f in 1 2 3 ; do scp ceph.conf ceph$f-dc1:/etc/ceph/dc2ceph.conf ; done for f in 1 2 3 ; do scp ceph.client.admin.keyring ceph$f-dc1:/etc/ceph/dc2ceph.client.admin.keyring ; done for f in 1 2 3 ; do ssh ceph$f-dc1 "chown ceph.ceph /etc/ceph/dc2ceph*" ; done ceph osd pool create fs_data 512 512 replicated rbd --cluster ceph mirror pool peer add fs_data client.admin@dc1ceph (generated id: e486c401-e24d-49bc-9800-759760822282) systemctl enable ceph-rbd-mirror@admin systemctl start ceph-rbd-mirror@admin rbd --cluster dc1ceph mirror pool status fs_data --verbose Cluster 1: ls /etc/ceph: ceph.client.admin.keyring ceph.conf dc2ceph.client.admin.keyring dc2ceph.conf rbdmap tmpG36OYs cat /etc/ceph/ceph.conf: [global] fsid = 8fede407-50e1-4487-8356-3dc98b30c500 mon_initial_members = ceph1-dc1, ceph2-dc1, ceph3-dc1 mon_host = *.*.*.1,*.*.*.27,*.*.*.41 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx public_network = *.*.*.0/24 cat /etc/ceph/dc2ceph.conf [global] fsid = 813ff410-02dc-47bd-b678-38add38495bb mon_initial_members = ceph1-dc2, ceph2-dc2 mon_host = *.*.*.56,*.*.*.0 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx public_network = *.*.*.0/24 Cluster 2: ls /etc/ceph: ceph.client.admin.keyring ceph.conf dc1ceph.client.admin.keyring dc1ceph.conf rbdmap tmp_yxkPs cat /etc/ceph/ceph.conf [global] fsid = 813ff410-02dc-47bd-b678-38add38495bb mon_initial_members = ceph1-dc2, ceph2-dc2 mon_host = *.*.*.56,*.*.*.70 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx public_network = *.*.*.0/24 cat /etc/ceph/dc1ceph.conf [global] fsid = 8fede407-50e1-4487-8356-3dc98b30c500 mon_initial_members = ceph1-dc1, ceph2-dc1, ceph3-dc1 mon_host = *.*.*.1,*.*.*.27,*.*.*.41 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx public_network = *.*.*.0/24 RBD Mirror daemon status: ceph-rbd-mirror(a)admin.service - Ceph rbd mirror daemon Loaded: loaded (/lib/systemd/system/ceph-rbd-mirror@.service; enabled; vendor preset: enabled) Active: inactive (dead) since Mon 2020-01-06 16:21:44 EST; 3s ago Process: 910178 ExecStart=/usr/bin/rbd-mirror -f --cluster ${CLUSTER} --id admin --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS) Main PID: 910178 (code=exited, status=0/SUCCESS) Jan 06 16:21:44 ceph1-dc2 systemd[1]: Started Ceph rbd mirror daemon. Jan 06 16:21:44 ceph1-dc2 rbd-mirror[910178]: 2020-01-06 16:21:44.462916 7f76ecf88780 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory Jan 06 16:21:44 ceph1-dc2 rbd-mirror[910178]: 2020-01-06 16:21:44.462949 7f76ecf88780 -1 monclient: ERROR: missing keyring, cannot use cephx for authentication Jan 06 16:21:44 ceph1-dc2 rbd-mirror[910178]: failed to initialize: (2) No such file or directory2020-01-06 16:21:44.463874 7f76ecf88780 -1 rbd::mirror::Mirror: 0x558d3ce6ce20 init: error connecting to local cluster ------------------------------------------- I also tried running the ExecStart command manually, substituting in different values for the parameters, and just never got it to work. If more info is needed, please don't hesitate to ask. Thanks in advance! -Miguel

4 years, 3 months

1
0
0 0

Balancing PGs across OSDs

by Thomas Schneider

Hi, in this <https://ceph.io/community/the-first-telemetry-results-are-in/> blog post I find this statement: "So, in our ideal world so far (assuming equal size OSDs), every OSD now has the same number of PGs assigned." My issue is that accross all pools the number of PGs per OSD is not equal. And I conclude that this is causing very unbalanced data placement. As a matter of fact the data stored on my 1.6TB HDD in specific pool "hdb_backup" is in a range starting with osd.228 size: 1.6 usage: 52.61 reweight: 1.00000 and ending with osd.145 size: 1.6 usage: 81.11 reweight: 1.00000 This impacts the amount of data that can be stored in the cluster heavily. Ceph balancer is enabled, but this is not solving this issue. root@ld3955:~# ceph balancer status { "active": true, "plans": [], "mode": "upmap" } Therefore I would ask you for suggestions how to work on this unbalanced data distribution. I have attached pastebin for - ceph osd df sorted by usage <https://pastebin.com/QLQHjA9g> - ceph osd df tree <https://pastebin.com/SvhP2hp5> My cluster has multiple crush roots respresenting different disks. In addition I have defined multiple pools, one pool for each disk type: hdd, ssd, nvme. THX

4 years, 3 months

6
23
0 0

Re: report librbd bug export-diff

by Jason Dillaman

On Sun, Dec 29, 2019 at 9:21 PM zhengyin(a)cmss.chinamobile.com <zhengyin(a)cmss.chinamobile.com> wrote: > > Hello dillaman: > > Problem: create a clone from a parent image, create a snapshot on the clone, when I use command "rbd export-diff <pool>/clone@snap <file> --whole-object", it can't diff parent image data. But I use this command without "--whole-object", it is ok. > > steps: > 1、rbd create volumes/test1 -s 1G > > 2、write data to volumes/test1 (offset=0, len=8388608) > > 3、rbd snap create volumes/test1@snap > > 4、rbd snap protect volumes/test1@snap > > 5、rbd clone volumes/test1@snap volumes/clone1 > > 6、write data to volumes/clone1 (offset=16777216, len=4194304) > > 7、rbd snap create volumes/clone1@snap1 > > 8、rbd export-diff volumes/clone1@snap1 /root/diff1 --whole-object > It only diffs data [16777216L, 20971520L] > > 9、rbd export-diff volumes/clone1@snap1 /root/diff2 > It is ok and can diffs data [0L, 8388608L], [16777216L, 20971520L] > > If you confirm that this is a bug or fix it, please let me know, thank you very much. Is this the same as this ticket [1] that was recently fixed in master and is pending backport? > > ________________________________ > zhengyin(a)cmss.chinamobile.com [1] https://tracker.ceph.com/issues/42248 -- Jason

4 years, 3 months

2
2
0 0

acting_primary is an osd with primary-affinity of 0, which seems wrong

by Wesley Dillingham

In an exploration of trying to speedup the long tail of backfills resulting from marking a failing OSD out I began looking at my PGs to see if i could tune some settings and noticed the following: Scenario: on a 12.2.12 Cluster, I am alerted of an inconsistent PG and am alerted of SMART failures on that OSD. I inspect that PG and notice it is a read_error from the SMART-failing osd. Steps I take: Set the primary affinity of the failing OSD to 0 (thought process being, I dont want a failing drive to be responsible for backfilling data), wait for peering to complete, then mark the OSD out. At this point backfill begins. 90% of the PGs complete backfill very quickly. Towards the tail end of the backfill I have 20 PGs or so in backfill_wait and 1 backfilling (presuming because of osd_max_backfills = 1). I do a `ceph pg ls backfill_wait` and notice that 100% of the tail end PGs are such that all OSDs in the up_set are different than those of acting_set and that the acting_primary is the OSD that was set with primary affinity 0 and marked out. My questions are the following: - Upon learning a disk has failed smart and has an inconsistent PG I want to prevent its potentially-corrupt data from being replicated out to other OSDs, even for PGs which may not have been discovered to be inconsistent yet so I set primary affinity to 0. At this step shouldn't the acting_primary be another OSD from the acting_set and backfill be copied out of a different OSD? - Should I be additionally marking the OSD as down, which would cause the PGs to go degraded until backfill finishes but would presumably finish faster as more OSDs would become the acting_primary and I wouldnt be throttled by osd_max_backfills. My thought here is its best to avoid degraded PGs as I do not want to drop below min_size. I recognize some of these things may be different in Nautilus but I am waiting on the 14.2.6 release as i am aware of some bugs I do not want to contend with. Thanks. Respectfully, *Wes Dillingham* wes(a)wesdillingham.com LinkedIn <http://www.linkedin.com/in/wesleydillingham>

4 years, 3 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users January 2020