Hello all,
I`ve a problem with an undeletable Image.
Let my try to explain.
There is a storage with a Snapshot, and the snapshot thinks he has a Children:
rbd snap unprotect delete-me-please@995cc2e3-c636-4c43-87c3-dbc729173c09
2020-07-30 09:25:08.492 7f15f77fe700 -1 librbd::SnapshotUnprotectRequest: cannot unprotect: at least 1 child(ren) [f907bc6b8b4567] in pool 'rbd'
2020-07-30 09:25:08.492 7f15f77fe700 -1 librbd::SnapshotUnprotectRequest: encountered error: (16) Device or resource busy
2020-07-30 09:25:08.492 7f15f77fe700 -1 librbd::SnapshotUnprotectRequest: 0x559ca6dd7b80 should_complete_error: ret_val=-16
2020-07-30 09:25:08.496 7f15f77fe700 -1 librbd::SnapshotUnprotectRequest: 0x559ca6dd7b80 should_complete_error: ret_val=-16rbd: unprotecting snap failed:
(16) Device or resource busy
rbd children delete-me-please@995cc2e3-c636-4c43-87c3-dbc729173c09
2020-07-30 09:19:36.650 7f39563deb00 -1 librbd::api::Image: list_descendants: error looking up name for image id f907bc6b8b4567 in pool rbd
rbd: listing children failed: (2) No such file or directory
So i am not able to unprotect and delete the Snapshot.
Is there a way to fix this issue?
I know this happens if deep-flatten is not enabled, and this is the root cause I think. We`ve activated this feature now but this s artifact from a time where it was disabled.
Best regards & thx for your help.
Torsten
We are seeking information on configuring Ceph to work with Noobaa and
NextCloud.
Randy
--
Randy Morgan
CSR
Department of Chemistry/BioChemistry
Brigham Young University
randym(a)chem.byu.edu
Hi list,
We're wondering if Ceph Nautilus packages will be provided for Ubuntu
Focal Fossa (20.04)?
You might wonder why one would not just use Ubuntu Bionic (18.04)
instead of using the latest LTS. Here is why: a glibc bug in Ubuntu
Bionic that *might* affect Open vSwitch (OVS) users [1].
We had quite a few issues with OVS deadlocks on hypervisors, and do not
want to risk experiencing the same issues on our Ceph cluster(s). I'm
not sure how many of you use OVS for bridging / bonding, but for those
who do, running Ceph (Nautlilus / Octopus) on 20.04 would be preferred.
Gr. Stefan
[1]: https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1839592
Hi everyone,
Get ready for July 23rd at 17:00 UTC another Ceph Tech Talk but a
different scale: running small Ceph clusters in multiple data centers.
Thanks to Yuval Freund for giving time to provide us content this month.
You can find the calendar invite and archive here:
https://ceph.io/ceph-tech-talks/
--
Mike Perez
He/Him
Ceph Community Manager
Red Hat Los Angeles <https://www.redhat.com>
thingee(a)redhat.com <mailto:thingee@redhat.com>
M: 1-951-572-2633 <tel:1-951-572-2633> IM: IRC Freenode/OFTC: thingee
494C 5D25 2968 D361 65FB 3829 94BC D781 ADA8 8AEA
@Thingee <https://twitter.com/thingee>
<https://www.redhat.com>
<https://redhat.com/summit>
Get it completed from the Top notch Assignment Writers for the Students across Universities across various countries. With the best in class turnaround time and also avail amazing value-added service, And Get in touch with the Best Programming Assignment Help who are very talented in providing solutions on various topics on subjects. You Read it Right! We are here to help you with your queries on your assignments. We are providing you with a world-class and Highly Talented Assignment expert. These Assignment Writers who are capable of doing any writing services, and many other writing services such as Assignment Writing Services, Assignment Help for various subjects such as Engineering Assignment, Statistics for the students across Australia, and also for students across UK, USA, Malaysia and various other countries.
https://www.myassignmentservices.com/programming-assignment-help.html
Hi,
since some days I try to debug a problem with snaptrimming under
nautilus.
I have a cluster with Nautilus (v14.2.10) , 44 Nodes á 24 OSDs á 14 TB
I create every day a snapshot for 7 days.
Every time the old snapshot is deleting I have bad IO performcance and blocked requests for several seconds until the snaptrim is done.
Settings like snaptrim_sleep and osd_pg_max_concurrent_snap_trims don't affect this behavior.
In the debug_osd 10/10 log I see the following:
2020-07-27 11:45:49.976 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557886edda20 prio 196 cost 0 latency 0.019545 osd_repop_reply(client.22731418.0:615257 3.636 e22457/22372) v2 pg pg[3.636( v 22457'100855 (21737'97756,22457'100855] local-lis/les=22372/22374 n=27762 ec=2842/2839 lis/c 22372/22372 les/c/f 22374/22374/0 22372/22372/22343) [411,36,956,763] r=0 lpr=22372 luod=22457'100854 crt=22457'100855 lcod 22457'100853 mlcod 22457'100853 active+clean+snaptrim_wait trimq=[1d~1]]
2020-07-27 11:45:49.976 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557886edda20 finish
2020-07-27 11:45:49.976 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557886edc2c0 prio 127 cost 0 latency 0.043165 MOSDScrubReserve(2.2645 RELEASE e22457) v1 pg pg[2.2645( empty local-lis/les=22359/22364 n=0 ec=2403/2403 lis/c 22359/22359 les/c/f 22364/22367/0 22359/22359/22359) [379,411,884,975] r=1 lpr=22359 crt=0'0 active mbc={}]
2020-07-27 11:45:49.976 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557886edc2c0 finish
2020-07-27 11:45:50.039 7fd8b8404700 10 osd.411 pg_epoch: 22457 pg[3.278e( v 22457'99491 (21594'96426,22457'99491] local-lis/les=22359/22362 n=27669 ec=2859/2839 lis/c 22359/22359 les/c/f 22362/22365/0 22359/22359/22343) [411,379,848,924] r=0 lpr=22359 crt=22457'99491 lcod 22457'99489 mlcod 22457'99489 active+clean+snaptrim trimq=[1d~1]] snap_trimmer posting
2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 pg_epoch: 22457 pg[3.278e( v 22457'99493 (21594'96426,22457'99493] local-lis/les=22359/22362 n=27669 ec=2859/2839 lis/c 22359/22359 les/c/f 22362/22365/0 22359/22359/22343) [411,379,848,924] r=0 lpr=22359 luod=22457'99491 crt=22457'99493 lcod 22457'99489 mlcod 22457'99489 active+clean+snaptrim trimq=[1d~1]] snap_trimmer complete
2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557880ac3760 prio 127 cost 663 latency 7.761823 osd_repop(osd.217.0:3025 3.1ca5 e22457/22378) v2 pg pg[3.1ca5( v 22457'100370 (21716'97357,22457'100370] local-lis/les=22378/22379 n=27532 ec=2855/2839 lis/c 22378/22378 les/c/f 22379/22379/0 22378/22378/22378) [217,411,551,1055] r=1 lpr=22378 luod=0'0 lua=22294'100006 crt=22457'100370 lcod 22457'100369 active mbc={}]
2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557880ac3760 finish
2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 22457 dequeue_op 0x5578813e1e40 prio 127 cost 0 latency 7.494296 MOSDScrubReserve(2.37e2 REQUEST e22457) v1 pg pg[2.37e2( empty local-lis/les=22355/22356 n=0 ec=2412/2412 lis/c 22355/22355 les/c/f 22356/22356/0 22355/22355/22355) [245,411,834,768] r=1 lpr=22355 crt=0'0 active mbc={}]
2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 22457 dequeue_op 0x5578813e1e40 finish
the dequeueing of ops works without pauses until the „snap_trimmer posting“ and „snap_trimmer complete“ loglines. This task takes in this example about 7 Seconds. The following operations which are dequeued have now a latency of about this time.
I tried to drill down this in the code. (Developers are asked here)
It seems, that the PG will be locked for every operation.
The snap_trimmer posting and complete message comes from „osd/PrimaryLogPG.cc“ on line 4700. This indicates me, that the process of deleting a snapshot object will sometimes take some time.
After further poking around. I see in „osd/SnapMapper.cc“ the method „SnapMapper::get_next_objects_to_trim“ which takes several seconds to get finished. I followed this further to the „common/map_cacher.hpp“ to the line 94: „int r = driver->get_next(key, &store);“
From there I lost the path.
The slowness is not on all OSDs at the same time. Somteime, this few OSDs are affected, sometimes some others. Restart of an OSD does not help.
With luminous and filestore, snapshot deletion was not an issue at all.
With nautilus and bluestore this is not acceptable for my usecase.
I don‘t know so far, if this is a bluestore specific problem or some general issue.
I wonder a bit why there are no other who have this problem.
Regards
Manuel
Hi,
I've read that Ceph has some InfluxDB reporting capabilities inbuilt (https://docs.ceph.com/docs/master/mgr/influx/).
However, Telegraf, which is the system reporting daemon for InfluxDB, also has a Ceph plugin (https://github.com/influxdata/telegraf/tree/master/plugins/inputs/ceph).
Just curious what people's thoughts on the two are, or what they are using in production?
Which is easier to deploy/maintain, have you found? Or more useful for alerting, or tracking performance gremlins?
Thanks,
Victor
I have an RGW bucket (backups) that is versioned. A nightly job creates
a new version of a few objects. There is a lifecycle policy (see below)
that keeps 18 days of versions. This has been working perfectly and has
not been changed. Until I upgraded Octopus...
The nightly job creates separate log files, including a listing of the
object versions. From these I can see that:
13/7 02:14 versions from 13/7 01:13 back to 24/6 01:17 (correct)
14/7 02:14 versions from 14/7 01:13 back to 25/6 01:14 (correct)
14/7 10:00 upgrade Octopus 15.2.3 -> 15.2.4
15/7 02:14 versions from 15/7 01:13 back to 25/6 01:14 (would have
expected 25/6 to have expired)
16/7 02:14 versions from 16/7 01:13 back to 15/7 01:13 (now all
pre-upgrade versions have wrongly disappeared)
It's not a big deal for me as they are only backups, providing it
continues to work correctly from now on. However it may affect some
other people much more.
Any ideas on the root cause? And if it is likely to be stable again now?
Thanks, Chris
{
"Rules": [
{
"Expiration": {
"ExpiredObjectDeleteMarker": true
},
"ID": "Expiration & incomplete uploads",
"Prefix": "",
"Status": "Enabled",
"NoncurrentVersionExpiration": {
"NoncurrentDays": 18
},
"AbortIncompleteMultipartUpload": {
"DaysAfterInitiation": 1
}
}
]
}
Hello,
I am looking into connecting my rados gateway to LDAP and found the
following documentation.
https://docs.ceph.com/docs/master/radosgw/ldap-auth/
I would like to allow an LDAP group to have access to create and manage
buckets.
The questions I still have are the following:
-Do the LDAP users need to log in to some sort of portal before their
corresponding ceph user is created? If so, where do they go to do so? Or
does the creation of ceph users and keys happen automatically?
-How can you access the ldap users key and secret after they are
integrated?
Thanks in advance for any information you can provide.
Regards,
Jared