September 2020 - ceph-users

Benchmark WAL/DB on SSD and HDD for RGW RBD CephFS

by Danni Setiawan

Hi all, I'm trying to find performance penalty with OSD HDD when using WAL/DB in faster device (SSD/NVMe) vs WAL/DB in same device (HDD) for different workload (RBD, RGW with index bucket in SSD pool, and CephFS with metadata in SSD pool). I want to know if giving up disk slot for WAL/DB device is worth vs adding more OSD. Unfortunately I cannot find the benchmark for these kind workload. Has anyone ever done this benchmark? Thank you.

3 years, 7 months

8
10
0 0

Using cephadm shell/ceph-volume

by tri＠postix.net

Hi all, I'm having problem creating an osd using ceph-volume (by the way of cephadm). This is on an octopus installation with cephadm. So I use "cephadm shell" and then "ceph-volume" but got the following error: root@furry:/var/lib# ceph-volume lvm prepare --data /dev/sda --block.db /dev/vg/sda.db --dmcrypt Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 29d95564-733d-4f2a-a2c8-1bb9ceb5a14b stderr: [errno 13] RADOS permission denied (error connecting to the cluster) --> RuntimeError: Unable to create a new OSD id if I pass the cluster ID, and use the correct key ring (using the client.admin keyring), I got a bit further root@furry:/var/lib# ceph-volume --cluster c258000c-f3e4-11ea-9ebe-c3c75e8e9028 lvm prepare --data /dev/sda --block.db /dev/vg/sda.db --dmcrypt Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph --cluster c258000c-f3e4-11ea-9ebe-c3c75e8e9028 --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/c258000c-f3e4-11ea-9ebe-c3c75e8e9028.keyring -i - osd new aa13c362-c9cf-4d03-9a86-d6118fbc312c stderr: Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',) --> RuntimeError: Unable to create a new OSD id Any idea on how to get pass these errors? Thanks. --Tri Hoang

3 years, 7 months

2
1
0 0

It is an advantage to use assignment help in Australia

by john seena

Do you want to connect with Australian assignment helpers to feel the low stress of submission? Are you seeking the best source of help for composing your academic documents? In this context, use Assignment Help Services and get completed papers without any delay. As I can understand that, whenever you have issues in writing your assignments, it gets hard for you to write flawless papers. When you are busy with lots of activities, you can’t focus on assignment writing and collecting requisite information for it. So, if you require a reliable way of finishing your homework, choosing the assistance of professionals is an advantage. You can manage your effort and save your time using the services of online academic writing. I can say so because I already used these services when I was studying Australia. You don’t have to think a lot because using online assignment help services is the best choice for completing your work without any issue. Don’t face any delay if you know about online academic writing service in Australia. https://www.greatassignmenthelp.com/au/

3 years, 7 months

1
0
0 0

September Ceph Science User Group Virtual Meeting

by Kevin Hrpcek

Hey all, We will be having a Ceph science/research/big cluster call on Wednesday September 23rd. If anyone wants to discuss something specific they can add it to the pad linked below. If you have questions or comments you can contact me. This is an informal open call of community members mostly from hpc/htc/research environments where we discuss whatever is on our minds regarding ceph. Updates, outages, features, maintenance, etc...there is no set presenter but I do attempt to keep the conversation lively. https://pad.ceph.com/p/Ceph_Science_User_Group_20200923 <https://pad.ceph.com/p/Ceph_Science_User_Group_20200923> We try to keep it to an hour or less. Ceph calendar event details: September 23, 2020 14:00 UTC 4pm Central European 9am Central US Description: Main pad for discussions: https://pad.ceph.com/p/Ceph_Science_User_Group_Index Meetings will be recorded and posted to the Ceph Youtube channel. To join the meeting on a computer or mobile phone: https://bluejeans.com/908675367?src=calendarLink To join from a Red Hat Deskphone or Softphone, dial: 84336. Connecting directly from a room system? 1.) Dial: 199.48.152.152 or bjn.vc 2.) Enter Meeting ID: 908675367 Just want to dial in on your phone? 1.) Dial one of the following numbers: 408-915-6466 (US) See all numbers: https://www.redhat.com/en/conference-numbers 2.) Enter Meeting ID: 908675367 3.) Press # Want to test your video connection? https://bluejeans.com/111 Kevin -- Kevin Hrpcek NASA VIIRS Atmosphere SIPS Space Science & Engineering Center University of Wisconsin-Madison

3 years, 7 months

1
0
0 0

disk scheduler for SSD

by George Shuklin

I start to wonder (again) which scheduler is better for ceph on SSD. My reasoning. None: 1. Reduces latency for requests. The lower latency is, the higher is perceived performance for unbounded workload with fixed queue depth (hello, benchmarks). 2. Causes possible spikes in latency for requests because of the 'unfair' request ordering (hello, deep scrub). Deadline-mq: 1. Reduce size of nr_requests (queue size) to 256 (noop shows me 916???). Make introduce latency. 2. May reduce latency spikes due to different rates for different types of workloads. I'm doing some benchmarks, and they, but of course, gives higher marks for 'none' scheduler. Nevertheless, I believe most of normal workload on Ceph does not utilize it with unbounded rate, so bounded (f.e. app making IO based on external independed events) workload can be hurt by lack of disk scheduler in presence of unbounded workload. Any ideas?

3 years, 7 months

1
0
0 0

Problem with manual deep-scrubbing PGs on EC pools

by Osiński Piotr

Hi, We have a little problem with deep-scrubbing on PGs on EC pool. [root@mon-1 ~]# ceph health detail HEALTH_WARN 1 pgs not deep-scrubbed in time PG_NOT_DEEP_SCRUBBED 1 pgs not deep-scrubbed in time pg 14.d4 not deep-scrubbed since 2020-09-05 20:26:02.696191 [root@mon-1 ~]# ceph pg deep-scrub 14.d4 instructing pg 14.d4s0 on osd.113 to deep-scrub [root@mon-1 ~]# grep deep-scrub /var/log/ceph/ceph.log |grep 14.d4 There is nothing about pg 14.d4. I checked, that pg 14.d4 belongs to pool default.rgw.buckets.data-ec [root@mon-1 ~]# ceph osd pool ls detail |grep default.rgw.buckets.data-ec pool 14 'default.rgw.buckets.data-ec' erasure size 8 min_size 6 crush_rule 1 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode warn last_change 74563 flags hashpspool stripe_width 4160 application rgw [root@mon-1 ~]# ceph pg ls-by-pool default.rgw.buckets.data-ec |grep 14.d4 14.d4 0 0 0 0 0 0 0 0 active+clean 67m 0'0 74562:10673 [113,40,125,80,16,24,95,32]p113 [113,40,125,80,16,24,95,32]p113 2020- 09-12 14:47:40.603264 2020-09-05 20:26:02.696191 When I try to run a manual scrub or deep-scrub on any PG that belongs to the EC pool it doesn't work. For other PGs in replicated pools, it works fine. Is it possible to manually run pg deep-scrub on an EC pool? Spółki Grupy Wirtualna Polska: Wirtualna Polska Holding Spółka Akcyjna z siedzibą w Warszawie, ul. Żwirki i Wigury 16, 02-092 Warszawa, wpisana do Krajowego Rejestru Sądowego - Rejestru Przedsiębiorców prowadzonego przez Sąd Rejonowy dla m.st. Warszawy w Warszawie pod nr KRS: 0000407130, kapitał zakładowy: 1 454 218,50 zł (w całości wpłacony), Numer Identyfikacji Podatkowej (NIP): 521-31-11-513 Wirtualna Polska Media Spółka Akcyjna z siedzibą w Warszawie, ul. Żwirki i Wigury 16, 02-092 Warszawa, wpisana do Krajowego Rejestru Sądowego - Rejestru Przedsiębiorców prowadzonego przez Sąd Rejonowy dla m.st. Warszawy w Warszawie pod nr KRS: 0000580004, kapitał zakładowy: 320 005 950,00 zł (w całości wpłacony), Numer Identyfikacji Podatkowej (NIP): 527-26-45-593 Administratorem udostępnionych danych osobowych jest Wirtualna Polska Media S.A. z siedzibą w Warszawie (dalej „WPM”). WPM przetwarza Twoje dane osobowe, które zostały podane przez Ciebie dobrowolnie w trakcie dotychczasowej współpracy, w związku z zawarciem umowy lub zostały zebrane ze źródeł powszechnie dostępnych, w szczególności: imię i nazwisko, adres email, numer telefonu. Przetwarzamy te dane w celach opisanych w polityce prywatności<https://onas.wp.pl/poufnosc.html>, między innymi w celu realizacji współpracy, realizacji obowiązków przewidzianych prawem, w celach marketingowych WP. Podstawą prawną przetwarzania Twoich danych osobowych w celach marketingowych jest prawnie uzasadniony interes jakim jest m.in. przesyłanie informacji marketingowych o usługach WP, w tym zaproszeń na konferencje branżowe, informacje o publikacjach. Twoje dane możemy przekazywać podmiotom przetwarzającym je na nasze zlecenie oraz podmiotom uprawnionym do uzyskania danych na podstawie obowiązującego prawa. Masz prawo m.in. do żądania dostępu do danych, sprostowania, usunięcia lub ograniczenia ich przetwarzania, jak również prawo do zgłoszenia sprzeciwu w przewidzianych w prawie sytuacjach. Prawa te oraz sposób ich realizacji opisaliśmy w polityce prywatności<https://onas.wp.pl/poufnosc.html>. Tam też znajdziesz informacje jak zakomunikować nam Twoją wolę skorzystania z tych praw.

3 years, 7 months

1
0
0 0

RGW multisite replication doesn't start

by Eugen Block

Hi *, I have 2 virtual one-node-clusters configured for multisite RGW. In the beginning the replication actually worked for some hundred MB or so, and then it stopped. In the meantime I wiped both RGWs twice to make sure the configuration is right (including wiping all pools clean). I don't see any errors in the logs but nothing happens on the secondary site. Both clusters are healthy, RGWs run with https. Uploading data directly to the secondary site also works, so the configuration seems ok to me. These is the current rgw status: ---snip--- primary:~ # radosgw-admin sync status realm c7d5fd30-9c06-46a1-baf4-497f95bf3abc (hamburg) zonegroup 68adec15-aace-403d-bd63-f5182a6437b1 (zg-hamburg) zone 0fb33fa1-8110-4179-ae45-acf5f5f825c5 (z-primary) metadata sync no sync (zone is master) secondary:~ # radosgw-admin sync status 2020-09-17T09:34:59.593+0200 7fdd3e706a40 1 Cannot find zone id=93ece7a6-beef-4f4e-841a-60ba0405f192 (name=z-secondary), switching to local zonegroup configuration realm c7d5fd30-9c06-46a1-baf4-497f95bf3abc (hamburg) zonegroup 68adec15-aace-403d-bd63-f5182a6437b1 (zg-hamburg) zone 93ece7a6-beef-4f4e-841a-60ba0405f192 (z-secondary) metadata sync syncing full sync: 64/64 shards full sync: 3 entries to sync incremental sync: 0/64 shards metadata is behind on 64 shards behind shards: [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63] data sync source: 0fb33fa1-8110-4179-ae45-acf5f5f825c5 (z-primary) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source ---snip--- Since the data was not replicated I ran a 'radosgw-admin metadata sync run --source-zone=z-primary' but it never finishes. If I do the same with data it will show that all shards are behind on data but nothing will happen either. I also don't understand the 'Cannot find zone id=93ece7a6-beef-4f4e-841a-60ba0405f192 (name=z-secondary), switching to local zonegroup configuration' message but this didn't break the replication in the first attempt, so I ignored it. Or is this something I should fix first (if yes, how)? Can anyone point me to what's going on here? I can provide more details if necessary, just let me know. Thank you! Eugen

3 years, 7 months

1
0
0 0

Spanning OSDs over two drives

by Liam MacKenzie

Hi all I have a scenario where I'm upgrading to ceph octopus on hardware that groups its drives in trays which contain 2 devices each. Previously these drives were joined in a software RAID1 and the md devices were used as the OSDs. The logic behind this is that should one of those drives fail, both will need to be removed at the same time due to the design of the machine. For example: https://www.servethehome.com/supermicro-ssg-6047r-e1r72l-72x-35-drive-4u-st… As I understand that using RAID isn't recommended, how would I best deploy my cluster so it's smart enough to group drives according to the trays that they're in? Thanks!

3 years, 7 months

3
2
0 0

Introduce flash OSD's to Nautilus installation

by Mathias Lindberg

Hi, We have a 1.2PB Nautilus installation primarily using CephFS for our HPC-resources. Our OSD’s have spinning disks and NvME devices for WAL and DB in an LVM-setup. The CephFS metadata pool resides on spinning disks, and I wonder if there is any point from a performance perspective to put that on flash? Trying to google that does not provide for a single straight answer, some say its is heavily cached on the MDS and does not benefit that much from flash while others argue it is significantly faster putting it on flash. We would regardless like to introduce flash OSD’s. In order to do that and not have data moving on to the ssd’s OSD’s we add we need add crush map rules that place data on ssd and hdd exclusively . But from reading previous posts to the list adding rules like that would trigger PG’s to start migrating. But when I (reluctantly) manually edit the crush map, adding a new class ssd (using a previously unused id) and adding the new rules I need crushtool seems to think no data vill shuffle and that maps are equivalent. Is it that simple, or am i doing it wrong? I would have preferred to have used some sort of tool like the crushtool reclassify (if applicable) to make these changes, but I can’t get the hang of that at all. The plan is to then change the crush rule for the exiting pools to the new ones with device classes. Thank you for any pointers or advice. [root@cephyr-mon1 crushtest]# diff -u crush_comp crush_comp_corr --- crush_comp 2020-09-17 17:11:37.125310334 +0200 +++ crush_comp_corr 2020-09-17 17:14:28.022704876 +0200 @@ -574,6 +574,7 @@ root default { id -1 # do not change unnecessarily id -2 class hdd # do not change unnecessarily + id -29 class ssd # weight 2087.180 alg straw2 hash 0 # rjenkins1 @@ -710,5 +711,25 @@ step chooseleaf firstn 0 type host step emit } +rule replicated_ssd { + id 15 + type replicated + min_size 1 + max_size 10 + step take default class ssd + step chooseleaf firstn 0 type host + step emit +} +rule cephyrfs_data_hdd { + id 16 + type erasure + min_size 3 + max_size 12 + step set_chooseleaf_tries 5 + step set_choose_tries 100 + step take default class hdd + step chooseleaf indep 0 type host + step emit +} # end crush map [root@cephyr-mon1 crushtest]# crushtool -i crush_comp.c --compare crush_comp_corr.c rule 0 had 0/10240 mismatched mappings (0) rule 1 had 0/6144 mismatched mappings (0) rule 6 had 0/10240 mismatched mappings (0) rule 7 had 0/4096 mismatched mappings (0) rule 8 had 0/4096 mismatched mappings (0) rule 9 had 0/4096 mismatched mappings (0) rule 10 had 0/4096 mismatched mappings (0) rule 11 had 0/4096 mismatched mappings (0) rule 12 had 0/4096 mismatched mappings (0) rule 13 had 0/4096 mismatched mappings (0) rule 14 had 0/10240 mismatched mappings (0) maps appear equivalent Regards, Mathias Lindberg Tel: +46 (0)31 7723059 Mob: +46 (0)723 526107 Mathias Lindberg mathlin(a)chalmers.se

3 years, 8 months

3
3
0 0

Nautilus Scrub and deep-Scrub execution order

by Johannes L

Hello Ceph-Users after upgrading one of our clusters to Nautilus we noticed the x pgs not scrubbed/deep-scrubbed in time warnings. Through some digging we found out that it seems like the scrubbing takes place at random and doesn't take the age of the last scrub/deep-scrub into consideration. I dumped the time of the last scrub with a 90 min gap in between: ceph pg dump | grep active | awk '{print $22}' | sort | uniq -c dumped all 2434 2020-08-30 5935 2020-08-31 1782 2020-09-01 2 2020-09-02 2 2020-09-03 5 2020-09-06 3 2020-09-08 5 2020-09-09 17 2020-09-10 259 2020-09-12 26672 2020-09-13 12036 2020-09-14 dumped all 2434 2020-08-30 5933 2020-08-31 1782 2020-09-01 2 2020-09-02 2 2020-09-03 5 2020-09-06 3 2020-09-08 5 2020-09-09 17 2020-09-10 51 2020-09-12 24862 2020-09-13 14056 2020-09-14 It is pretty obvious that the PGs that have been scrubbed a day ago have been scrubbed again for some reason while ones that are 2 weeks old are basically left untouched. One way we are currently dealing with this issue is setting the osd_scrub_min_interval to 72h to force the cluster to scrub the older PGs. This can't be intentional. Has anyone else seen this behavior? Kind regards Johannes

3 years, 8 months

4
4
0 0

2024

2023

2022

2021

2020

2019

ceph-users September 2020