Hi all,
I need to list the contents of the stray buckets on one of our MDSes. The MDS reports 772674 stray entries. However, if I dump its cache and grep for stray I get only 216 hits.
How can I get to the contents of the stray buckets?
Please note that Octopus is still hit by https://tracker.ceph.com/issues/57059 so a "dump tree" will not work. In addition, I clearly don't just need the entries in cache, I need a listing of everything. How can I get that? I'm willing to run rados commands and pipe through ceph-encoder if necessary.
Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hi everyone,
I like to know how many pool should I create for multiple cephfs ?
Knowing I got two class of osd (hdd and ssd), and I have a need of ~ 20/30
cephfs (currently and that number will increase with time).
Should I create
one cephfs_metadata_replicated
one cephfs_data_replicated
few cephfs_data_erasure_coding (depending of k/m)
and put all my cephfs inside two of them. Or should I create for each
cephfs a couple of pool metadata/data ?
Il will also need to have ceph S3 storage, same question, should I have a
designated pool for S3 storage or can/should I use the same
cephfs_data_replicated/erasure pool ?
Regards
--
Albert SHIH 🦫 🐸
France
Heure locale/Local time:
mer. 24 janv. 2024 09:33:09 CET
Hi,
We are considering BlueStore compression test in our cluster. For this we have created rbd image on our EC pool.
While we are executing "ceph daemon osd.X perf dump | grep -E '(compress_.*_count|bluestore_compressed_)'", we are not locate below parameters, even we tried with ceph tell command.
"bluestore_compressed_allocated"
"bluestore_compressed_original"
As a result, we are unable to determine the extent of compression for specific RBD images.
Are there any specific configurations required to locate the given parameters or any alternate method to assess Bluestore compression.
Any guidance or insight would be greatly appreciated.
Thanks
Mohammad Saif
Ceph Enthusiast
Hi All,
In regards to the monitoring services on a Ceph Cluster (ie Prometheus,
Grafana, Alertmanager, Loki, Node-Exported, Promtail, etc) how many
instances should/can we run for fault tolerance purposes? I can't seem
to recall that advice being in the doco anywhere (but of course, I
probably missed it).
I'm concerned about HA on those services - will they continue to run if
the Ceph Node they're on fails?
At the moment we're running only 1 instance of each in the cluster, but
several Ceph Nodes are capable of running each - ie/eg 3 nodes
configured but only count:1.
This is on the latest version of Reef using cephadmin (if it makes a
huge difference :-) ).
So any advice, etc, would be greatly appreciated, including if we should
be running any services not mentioned (not Mgr, Mon, OSD, or iSCSI,
obviously :-) )
Cheers
Dulux-Oz
Am I doing something weird when I do on a ceph node (nautilus, el7):
rbd snap ls vps-test -p rbd
rbd map vps-test(a)vps-test.snap1 -p rbd
mount -o ro /dev/mapper/VGnew-LVnew /mnt/disk <--- reset/reboot ceph node
Hello,
Could someone please explain how mclock works regarding reads and writes? Does mclock intervene on both read and write iops? Or only on reads or only on writes?
And what type of underlying hardware performance is calculated and considered by mclock? Seems to be only write performance.
The mclock documentation shows HDDs and SSDs specific configuration options (capacity and sequential bandwidth) but nothing regarding hybrid setups and these configuration options do not distinguish reads and writes. But read and write performance are often not in par for a single drive and even less when using hybrid setups.
With hybrid setups (RocksDB+WAL on SSDs or NVMes and Data on HDD), if mclock only considers write performance, it may fail to properly schedule read iops (does mclock schedule read iops?) as the calculated iops capacity would be way too high for reads.
With HDD only setups (RocksDB+WAL+Data on HDD), if mclock only considers write performance, the OSD may not take advantage of higher read performance.
Can someone please shed some light on this?
Best regards,
Frédéric Nass
Sous-direction Infrastructures et Services
Direction du Numérique
Université de Lorraine
Tél : +33 3 72 74 11 35
Hello, Ceph users,
what is the correct location of keyring for ceph-crash?
I tried to follow this document:
https://docs.ceph.com/en/latest/mgr/crash/
# ceph auth get-or-create client.crash mon 'profile crash' mgr 'profile crash' > /etc/ceph/ceph.client.crash.keyring
and copy this file to all nodes. When I run ceph-crash.service on the node
where client.admin.keyring is available, it works. But on the rest of nodes,
it tries to access the admin keyring anyway. Journalctl -u ceph-crash
says this:
Jan 18 18:03:35 my.node.name systemd[1]: Started Ceph crash dump collector.
Jan 18 18:03:35 my.node.name ceph-crash[2973164]: INFO:ceph-crash:pinging cluster to exercise our key
Jan 18 18:03:35 my.node.name ceph-crash[2973166]: 2024-01-18T18:03:35.786+0100 7eff34016640 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
Jan 18 18:03:35 my.node.name ceph-crash[2973166]: 2024-01-18T18:03:35.786+0100 7eff34016640 -1 AuthRegistry(0x7eff2c063ce8) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, disabling cephx
Jan 18 18:03:35 my.node.name ceph-crash[2973166]: 2024-01-18T18:03:35.787+0100 7eff34016640 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
Jan 18 18:03:35 my.node.name ceph-crash[2973166]: 2024-01-18T18:03:35.787+0100 7eff34016640 -1 AuthRegistry(0x7eff2c067de0) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, disabling cephx
Jan 18 18:03:35 my.node.name ceph-crash[2973166]: 2024-01-18T18:03:35.788+0100 7eff34016640 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
Jan 18 18:03:35 my.node.name ceph-crash[2973166]: 2024-01-18T18:03:35.788+0100 7eff34016640 -1 AuthRegistry(0x7eff340150c0) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, disabling cephx
Jan 18 18:03:35 my.node.name ceph-crash[2973166]: [errno 2] RADOS object not found (error connecting to the cluster)
Jan 18 18:03:35 my.node.name ceph-crash[2973164]: INFO:ceph-crash:monitoring path /var/lib/ceph/crash, delay 600s
Is the documentation outdated, or am I doing anything wrong? Thanks for
any hint. This is non-containerized Ceph 18.2.1 on AlmaLinux 9.
-Yenya
--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| https://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
We all agree on the necessity of compromise. We just can't agree on
when it's necessary to compromise. --Larry Wall
I am considering Indexless bucket to increase the small file performance
and maximum number of files using just HDDs.
When I checked the constraints of indexless bucket in ceph-docs, it
indicates the possible for bucket list problem, so I just know it makes the
problem of versioning and sync.
I don't understand what means of it. and I just guessed it's similar with
db index feature. but it's not.
Could you tell me the indexless bucket constraints?
Hi Ceph users and developers,
You are invited to join us at the User + Dev meeting this week Thursday,
January 18th at 10:00 AM Eastern Time! See below for more meeting details.
The focus topic, "Ceph Feature Request from the DKIST Data Center: Add a
service backed by tape that is analogous to AWS Glacier", will be presented
by Joel Davidow, a Ceph operator from the National Solar Observatory. In
his talk, he will propose a feature request to add support for tape as a
storage class with lifecycle management for object storage.
Feel free to add questions or additional topics under the "Open Discussion"
section on the agenda: https://pad.ceph.com/p/ceph-user-dev-monthly-minutes
If you have an idea for a focus topic you'd like to present at a future
meeting, you are welcome to submit it to this Google Form:
https://docs.google.com/forms/d/e/1FAIpQLSdboBhxVoBZoaHm8xSmeBoemuXoV_rmh4v…
Any Ceph user or developer is eligible to submit!
Thanks,
Laura Flores
Meeting link: https://meet.jit.si/ceph-user-dev-monthly
Time Conversions:
UTC: Thursday, January 18, 15:00 UTC
Mountain View, CA, US: Thursday, January 18, 7:00 PST
Phoenix, AZ, US: Thursday, January 18, 8:00 MST
Denver, CO, US: Thursday, January 18, 8:00 MST
Huntsville, AL, US: Thursday, January 18, 9:00 CST
Raleigh, NC, US: Thursday, January 18, 10:00 EST
London, England: Thursday, January 18, 15:00 GMT
Paris, France: Thursday, January 18, 16:00 CET
Helsinki, Finland: Thursday, January 18, 17:00 EET
Tel Aviv, Israel: Thursday, January 18, 17:00 IST
Pune, India: Thursday, January 18, 20:30 IST
Brisbane, Australia: Friday, January 19, 1:00 AEST
Singapore, Asia: Thursday, January 18, 23:00 +08
Auckland, New Zealand: Friday, January 19, 4:00 NZDT
--
Laura Flores
She/Her/Hers
Software Engineer, Ceph Storage <https://ceph.io>
Chicago, IL
lflores(a)ibm.com | lflores(a)redhat.com <lflores(a)redhat.com>
M: +17087388804
Hi
Our 17.2.7 cluster:
"
-33 886.00842 datacenter 714
-7 209.93135 host ceph-hdd1
-69 69.86389 host ceph-flash1
-6 188.09579 host ceph-hdd2
-3 233.57649 host ceph-hdd3
-12 184.54091 host ceph-hdd4
-34 824.47168 datacenter DCN
-73 69.86389 host ceph-flash2
-5 252.27127 host ceph-hdd14
-2 201.78067 host ceph-hdd5
-81 288.26501 host ceph-hdd6
-31 264.56207 host ceph-hdd7
-36 1284.48621 datacenter TBA
-77 69.86389 host ceph-flash3
-21 190.83224 host ceph-hdd8
-29 199.08838 host ceph-hdd9
-11 193.85382 host ceph-hdd10
-9 237.28154 host ceph-hdd11
-26 187.19536 host ceph-hdd12
-4 206.37102 host ceph-hdd13
"
We recently created an EC 4+5 pool with failure domain datacenter. The
DCN datacenter only had 2 hdd hosts so we added one more to make it
possible at all, since each DC needs 3 shards, as I understand it.
Backfill was really slow though, so we just added another host to the
DCN datacenter. Backfill looks like this:
"
data:
volumes: 1/1 healthy
pools: 13 pools, 11153 pgs
objects: 311.53M objects, 1000 TiB
usage: 1.6 PiB used, 1.6 PiB / 3.2 PiB avail
pgs: 60/1669775060 objects degraded (0.000%)
373356926/1669775060 objects misplaced (22.360%)
5944 active+clean
5177 active+remapped+backfill_wait
22 active+remapped+backfilling
4 active+recovery_wait+degraded+remapped
3 active+recovery_wait+remapped
2 active+recovery_wait+degraded
1 active+recovering+degraded+remapped
io:
client: 73 MiB/s rd, 339 MiB/s wr, 1.06k op/s rd, 561 op/s wr
recovery: 1.2 GiB/s, 313 objects/s
"
Given that the first host added had 19 OSDs, with none of them anywhere
near the target capacity, and the one we just added has 22 empty OSDs,
having just 22 PGs backfilling and 1 recovering seems somewhat
underwhelming.
Is this to be expected with such a pool? Mclock profile is
high_recovery_ops.
Mvh.
Torkil
--
Torkil Svensgaard
Sysadmin
MR-Forskningssektionen, afs. 714
DRCMR, Danish Research Centre for Magnetic Resonance
Hvidovre Hospital
Kettegård Allé 30
DK-2650 Hvidovre
Denmark
Tel: +45 386 22828
E-mail: torkil(a)drcmr.dk