Hi all,
I'm trying to find performance penalty with OSD HDD when using WAL/DB in
faster device (SSD/NVMe) vs WAL/DB in same device (HDD) for different
workload (RBD, RGW with index bucket in SSD pool, and CephFS with
metadata in SSD pool). I want to know if giving up disk slot for WAL/DB
device is worth vs adding more OSD.
Unfortunately I cannot find the benchmark for these kind workload. Has
anyone ever done this benchmark?
Thank you.
Hi all,
I'm having problem creating an osd using ceph-volume (by the way of cephadm). This is on an octopus installation with cephadm. So I use "cephadm shell" and then "ceph-volume" but got the following error:
root@furry:/var/lib# ceph-volume lvm prepare --data /dev/sda --block.db /dev/vg/sda.db --dmcrypt
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 29d95564-733d-4f2a-a2c8-1bb9ceb5a14b
stderr: [errno 13] RADOS permission denied (error connecting to the cluster)
--> RuntimeError: Unable to create a new OSD id
if I pass the cluster ID, and use the correct key ring (using the client.admin keyring), I got a bit further
root@furry:/var/lib# ceph-volume --cluster c258000c-f3e4-11ea-9ebe-c3c75e8e9028 lvm prepare --data /dev/sda --block.db /dev/vg/sda.db --dmcrypt
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster c258000c-f3e4-11ea-9ebe-c3c75e8e9028 --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/c258000c-f3e4-11ea-9ebe-c3c75e8e9028.keyring -i - osd new aa13c362-c9cf-4d03-9a86-d6118fbc312c
stderr: Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',)
--> RuntimeError: Unable to create a new OSD id
Any idea on how to get pass these errors? Thanks.
--Tri Hoang
Do you want to connect with Australian assignment helpers to feel the low stress of submission? Are you seeking the best source of help for composing your academic documents? In this context, use Assignment Help Services and get completed papers without any delay. As I can understand that, whenever you have issues in writing your assignments, it gets hard for you to write flawless papers. When you are busy with lots of activities, you can’t focus on assignment writing and collecting requisite information for it. So, if you require a reliable way of finishing your homework, choosing the assistance of professionals is an advantage. You can manage your effort and save your time using the services of online academic writing. I can say so because I already used these services when I was studying Australia. You don’t have to think a lot because using online assignment help services is the best choice for completing your work without any issue. Don’t face any delay if you know about online academic writing service in Australia.
https://www.greatassignmenthelp.com/au/
Hey all,
We will be having a Ceph science/research/big cluster call on Wednesday
September 23rd. If anyone wants to discuss something specific they can
add it to the pad linked below. If you have questions or comments you
can contact me.
This is an informal open call of community members mostly from
hpc/htc/research environments where we discuss whatever is on our minds
regarding ceph. Updates, outages, features, maintenance, etc...there is
no set presenter but I do attempt to keep the conversation lively.
https://pad.ceph.com/p/Ceph_Science_User_Group_20200923
<https://pad.ceph.com/p/Ceph_Science_User_Group_20200923>
We try to keep it to an hour or less.
Ceph calendar event details:
September 23, 2020
14:00 UTC
4pm Central European
9am Central US
Description: Main pad for discussions:
https://pad.ceph.com/p/Ceph_Science_User_Group_Index
Meetings will be recorded and posted to the Ceph Youtube channel.
To join the meeting on a computer or mobile phone:
https://bluejeans.com/908675367?src=calendarLink
To join from a Red Hat Deskphone or Softphone, dial: 84336.
Connecting directly from a room system?
1.) Dial: 199.48.152.152 or bjn.vc
2.) Enter Meeting ID: 908675367
Just want to dial in on your phone?
1.) Dial one of the following numbers: 408-915-6466 (US)
See all numbers: https://www.redhat.com/en/conference-numbers
2.) Enter Meeting ID: 908675367
3.) Press #
Want to test your video connection? https://bluejeans.com/111
Kevin
--
Kevin Hrpcek
NASA VIIRS Atmosphere SIPS
Space Science & Engineering Center
University of Wisconsin-Madison
I start to wonder (again) which scheduler is better for ceph on SSD.
My reasoning.
None:
1. Reduces latency for requests. The lower latency is, the higher is
perceived performance for unbounded workload with fixed queue depth
(hello, benchmarks).
2. Causes possible spikes in latency for requests because of the
'unfair' request ordering (hello, deep scrub).
Deadline-mq:
1. Reduce size of nr_requests (queue size) to 256 (noop shows me
916???). Make introduce latency.
2. May reduce latency spikes due to different rates for different types
of workloads.
I'm doing some benchmarks, and they, but of course, gives higher marks
for 'none' scheduler. Nevertheless, I believe most of normal workload on
Ceph does not utilize it with unbounded rate, so bounded (f.e. app
making IO based on external independed events) workload can be hurt by
lack of disk scheduler in presence of unbounded workload.
Any ideas?
Hi,
We have a little problem with deep-scrubbing on PGs on EC pool.
[root@mon-1 ~]# ceph health detail
HEALTH_WARN 1 pgs not deep-scrubbed in time
PG_NOT_DEEP_SCRUBBED 1 pgs not deep-scrubbed in time
pg 14.d4 not deep-scrubbed since 2020-09-05 20:26:02.696191
[root@mon-1 ~]# ceph pg deep-scrub 14.d4
instructing pg 14.d4s0 on osd.113 to deep-scrub
[root@mon-1 ~]# grep deep-scrub /var/log/ceph/ceph.log |grep 14.d4
There is nothing about pg 14.d4. I checked, that pg 14.d4 belongs to
pool default.rgw.buckets.data-ec
[root@mon-1 ~]# ceph osd pool ls detail |grep
default.rgw.buckets.data-ec
pool 14 'default.rgw.buckets.data-ec' erasure size 8 min_size 6
crush_rule 1 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode
warn last_change 74563 flags hashpspool stripe_width 4160 application
rgw
[root@mon-1 ~]# ceph pg ls-by-pool default.rgw.buckets.data-ec |grep
14.d4
14.d4 0 0 0 0 0 0 0
0 active+clean 67m 0'0 74562:10673
[113,40,125,80,16,24,95,32]p113 [113,40,125,80,16,24,95,32]p113 2020-
09-12 14:47:40.603264 2020-09-05 20:26:02.696191
When I try to run a manual scrub or deep-scrub on any PG that belongs
to the EC pool it doesn't work. For other PGs in replicated pools, it
works fine.
Is it possible to manually run pg deep-scrub on an EC pool?
Spółki Grupy Wirtualna Polska:
Wirtualna Polska Holding Spółka Akcyjna z siedzibą w Warszawie, ul. Żwirki i Wigury 16, 02-092 Warszawa, wpisana do Krajowego Rejestru Sądowego - Rejestru Przedsiębiorców prowadzonego przez Sąd Rejonowy dla m.st. Warszawy w Warszawie pod nr KRS: 0000407130, kapitał zakładowy: 1 454 218,50 zł (w całości wpłacony), Numer Identyfikacji Podatkowej (NIP): 521-31-11-513
Wirtualna Polska Media Spółka Akcyjna z siedzibą w Warszawie, ul. Żwirki i Wigury 16, 02-092 Warszawa, wpisana do Krajowego Rejestru Sądowego - Rejestru Przedsiębiorców prowadzonego przez Sąd Rejonowy dla m.st. Warszawy w Warszawie pod nr KRS: 0000580004, kapitał zakładowy: 320 005 950,00 zł (w całości wpłacony), Numer Identyfikacji Podatkowej (NIP): 527-26-45-593
Administratorem udostępnionych danych osobowych jest Wirtualna Polska Media S.A. z siedzibą w Warszawie (dalej „WPM”). WPM przetwarza Twoje dane osobowe, które zostały podane przez Ciebie dobrowolnie w trakcie dotychczasowej współpracy, w związku z zawarciem umowy lub zostały zebrane ze źródeł powszechnie dostępnych, w szczególności: imię i nazwisko, adres email, numer telefonu. Przetwarzamy te dane w celach opisanych w polityce prywatności<https://onas.wp.pl/poufnosc.html>, między innymi w celu realizacji współpracy, realizacji obowiązków przewidzianych prawem, w celach marketingowych WP. Podstawą prawną przetwarzania Twoich danych osobowych w celach marketingowych jest prawnie uzasadniony interes jakim jest m.in. przesyłanie informacji marketingowych o usługach WP, w tym zaproszeń na konferencje branżowe, informacje o publikacjach. Twoje dane możemy przekazywać podmiotom przetwarzającym je na nasze zlecenie oraz podmiotom uprawnionym do uzyskania danych na podstawie obowiązującego prawa. Masz prawo m.in. do żądania dostępu do danych, sprostowania, usunięcia lub ograniczenia ich przetwarzania, jak również prawo do zgłoszenia sprzeciwu w przewidzianych w prawie sytuacjach. Prawa te oraz sposób ich realizacji opisaliśmy w polityce prywatności<https://onas.wp.pl/poufnosc.html>. Tam też znajdziesz informacje jak zakomunikować nam Twoją wolę skorzystania z tych praw.
Hi *,
I have 2 virtual one-node-clusters configured for multisite RGW. In
the beginning the replication actually worked for some hundred MB or
so, and then it stopped. In the meantime I wiped both RGWs twice to
make sure the configuration is right (including wiping all pools
clean). I don't see any errors in the logs but nothing happens on the
secondary site. Both clusters are healthy, RGWs run with https.
Uploading data directly to the secondary site also works, so the
configuration seems ok to me.
These is the current rgw status:
---snip---
primary:~ # radosgw-admin sync status
realm c7d5fd30-9c06-46a1-baf4-497f95bf3abc (hamburg)
zonegroup 68adec15-aace-403d-bd63-f5182a6437b1 (zg-hamburg)
zone 0fb33fa1-8110-4179-ae45-acf5f5f825c5 (z-primary)
metadata sync no sync (zone is master)
secondary:~ # radosgw-admin sync status
2020-09-17T09:34:59.593+0200 7fdd3e706a40 1 Cannot find zone
id=93ece7a6-beef-4f4e-841a-60ba0405f192 (name=z-secondary), switching
to local zonegroup configuration
realm c7d5fd30-9c06-46a1-baf4-497f95bf3abc (hamburg)
zonegroup 68adec15-aace-403d-bd63-f5182a6437b1 (zg-hamburg)
zone 93ece7a6-beef-4f4e-841a-60ba0405f192 (z-secondary)
metadata sync syncing
full sync: 64/64 shards
full sync: 3 entries to sync
incremental sync: 0/64 shards
metadata is behind on 64 shards
behind shards:
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63]
data sync source: 0fb33fa1-8110-4179-ae45-acf5f5f825c5 (z-primary)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
---snip---
Since the data was not replicated I ran a 'radosgw-admin metadata sync
run --source-zone=z-primary' but it never finishes. If I do the same
with data it will show that all shards are behind on data but nothing
will happen either.
I also don't understand the 'Cannot find zone
id=93ece7a6-beef-4f4e-841a-60ba0405f192 (name=z-secondary), switching
to local zonegroup configuration' message but this didn't break the
replication in the first attempt, so I ignored it. Or is this
something I should fix first (if yes, how)?
Can anyone point me to what's going on here? I can provide more
details if necessary, just let me know.
Thank you!
Eugen
Hi all
I have a scenario where I'm upgrading to ceph octopus on hardware that groups its drives in trays which contain 2 devices each. Previously these drives were joined in a software RAID1 and the md devices were used as the OSDs. The logic behind this is that should one of those drives fail, both will need to be removed at the same time due to the design of the machine.
For example:
https://www.servethehome.com/supermicro-ssg-6047r-e1r72l-72x-35-drive-4u-st…
As I understand that using RAID isn't recommended, how would I best deploy my cluster so it's smart enough to group drives according to the trays that they're in?
Thanks!
Hi,
We have a 1.2PB Nautilus installation primarily using CephFS for our HPC-resources.
Our OSD’s have spinning disks and NvME devices for WAL and DB in an LVM-setup.
The CephFS metadata pool resides on spinning disks, and I wonder if there is any point from a performance perspective to put that on flash?
Trying to google that does not provide for a single straight answer, some say its is heavily cached on the MDS and does not benefit that much from flash while others argue it is significantly faster putting it on flash.
We would regardless like to introduce flash OSD’s.
In order to do that and not have data moving on to the ssd’s OSD’s we add we need add crush map rules that place data on ssd and hdd exclusively .
But from reading previous posts to the list adding rules like that would trigger PG’s to start migrating.
But when I (reluctantly) manually edit the crush map, adding a new class ssd (using a previously unused id) and adding the new rules I need crushtool seems to think no data vill shuffle and that maps are equivalent. Is it that simple, or am i doing it wrong?
I would have preferred to have used some sort of tool like the crushtool reclassify (if applicable) to make these changes, but I can’t get the hang of that at all.
The plan is to then change the crush rule for the exiting pools to the new ones with device classes.
Thank you for any pointers or advice.
[root@cephyr-mon1 crushtest]# diff -u crush_comp crush_comp_corr
--- crush_comp 2020-09-17 17:11:37.125310334 +0200
+++ crush_comp_corr 2020-09-17 17:14:28.022704876 +0200
@@ -574,6 +574,7 @@
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
+ id -29 class ssd
# weight 2087.180
alg straw2
hash 0 # rjenkins1
@@ -710,5 +711,25 @@
step chooseleaf firstn 0 type host
step emit
}
+rule replicated_ssd {
+ id 15
+ type replicated
+ min_size 1
+ max_size 10
+ step take default class ssd
+ step chooseleaf firstn 0 type host
+ step emit
+}
+rule cephyrfs_data_hdd {
+ id 16
+ type erasure
+ min_size 3
+ max_size 12
+ step set_chooseleaf_tries 5
+ step set_choose_tries 100
+ step take default class hdd
+ step chooseleaf indep 0 type host
+ step emit
+}
# end crush map
[root@cephyr-mon1 crushtest]# crushtool -i crush_comp.c --compare crush_comp_corr.c
rule 0 had 0/10240 mismatched mappings (0)
rule 1 had 0/6144 mismatched mappings (0)
rule 6 had 0/10240 mismatched mappings (0)
rule 7 had 0/4096 mismatched mappings (0)
rule 8 had 0/4096 mismatched mappings (0)
rule 9 had 0/4096 mismatched mappings (0)
rule 10 had 0/4096 mismatched mappings (0)
rule 11 had 0/4096 mismatched mappings (0)
rule 12 had 0/4096 mismatched mappings (0)
rule 13 had 0/4096 mismatched mappings (0)
rule 14 had 0/10240 mismatched mappings (0)
maps appear equivalent
Regards,
Mathias Lindberg
Tel: +46 (0)31 7723059
Mob: +46 (0)723 526107
Mathias Lindberg
mathlin(a)chalmers.se
Hello Ceph-Users
after upgrading one of our clusters to Nautilus we noticed the x pgs not scrubbed/deep-scrubbed in time warnings.
Through some digging we found out that it seems like the scrubbing takes place at random and doesn't take the age of the last scrub/deep-scrub into consideration.
I dumped the time of the last scrub with a 90 min gap in between:
ceph pg dump | grep active | awk '{print $22}' | sort | uniq -c
dumped all
2434 2020-08-30
5935 2020-08-31
1782 2020-09-01
2 2020-09-02
2 2020-09-03
5 2020-09-06
3 2020-09-08
5 2020-09-09
17 2020-09-10
259 2020-09-12
26672 2020-09-13
12036 2020-09-14
dumped all
2434 2020-08-30
5933 2020-08-31
1782 2020-09-01
2 2020-09-02
2 2020-09-03
5 2020-09-06
3 2020-09-08
5 2020-09-09
17 2020-09-10
51 2020-09-12
24862 2020-09-13
14056 2020-09-14
It is pretty obvious that the PGs that have been scrubbed a day ago have been scrubbed again for some reason while ones that are 2 weeks old are basically left untouched.
One way we are currently dealing with this issue is setting the osd_scrub_min_interval to 72h to force the cluster to scrub the older PGs.
This can't be intentional.
Has anyone else seen this behavior?
Kind regards
Johannes