Hi Mark,
While trying to figure out a random failure in the mempool tests[0] introduced when fixing a bug in how mempool selects shards holding the byte count of a given pool[1] earlier this year, I was intrigued by this "cache line ping pong" problem[2]. And I wonder if you have some kind of benchmark, somewhere in your toolbox, that someone could use to demonstrate the problem. Maybe such a code could be adapted to show the benefit of the optimization implemented in mempool?
Cheers
[0] https://tracker.ceph.com/issues/49781#note-9
[1] https://github.com/ceph/ceph/pull/39057/files
[2] https://www.drdobbs.com/parallel/understanding-and-avoiding-memory-issues/2…
--
Loïc Dachary, Artisan Logiciel Libre
DefineHomz is one of the best interior designers as we have a high level of passion for what we do, our designs are highly regarded and have reached the top by filling the splash of colors at our clients' homes.
Visit us - https://www.definehomz.com
DefineHomz is the best interior design company as we formulate full design ideas from scratch based on specific client requirements. We keep our clients involved and engaged in the pre-stages or during the process to make sure our clients are happy and satisfied with the implementation of the ideas.
Visit us - https://www.definehomz.com/home-interior
Hi Folks,
The performance meeting will be starting in about 10 minutes at 8AM
PST! Today we'll hopefully talk a bit about bluestore write path
locking, omap performance with prefetch and buffered IO, and bluestore
cache trimming. Please feel free to add your own topic if you'd like!
Etherpad:
https://pad.ceph.com/p/performance_weekly
Bluejeans:
https://bluejeans.com/908675367
Mark
Hi, I would like to report the serious problem about data loss in customer environment.
An OSD data loss has been propagated to other OSDs. If backfill is performed when shard is missing in a primary OSD, the shard that is corresponding to the shard in a primary OSD is also missing in the OSD to which the backfill is directed.
In case of 4+2 erasure coding, if copies are occurred against two OSDs during one backfill, three shards are missing(primary + two copies), making data recovery impossible. This data loss depends on setting of erasure coding and the number of copies during backfill.
In fact, I could reproduce this situation. This is the actual data loss, and we need to fix this problem.
I will verify this with the latest version of ceph, and issue a ticket to redmine later and also report detail information.
In this mail, I share simple information of environment and procedure to reproduce at first.
Environment:
- Ceph version: Nautilus
- Erasure coding: 4+2
- Type: filestore
Step to Reproduce:
1. Setup more than 6 OSDs (with leaving some extra OSD out).
2. Store some object to pool.
3. Delete a file from a primary OSD in the PG.
(In fact, the shard on the primary OSD was unrecognized due to medium error of the primary OSD in the customer environment. To simulate this situation, run `rm`.)
e.g.) rm -f /var/lib/ceph/osd/ceph-7/current/1.0s0_head/<some file>.04.21.09\:55\:*
4. Cause backfill in the PG.
This time, I could occur backfill by setting OSD to `in` from `out`.
e.g.) ceph osd in osd.5
5. ceph -s show active+clean status but object is lost on both primary and backfilled OSDs.
Hey folks, next week will be APAC-friendly time - 5 May @ 2100 EST.
That's May 6 @ 0100 UTC - join at:
https://bluejeans.com/908675367
We've got one topic so far - osdmaps specialized for client consumption.
This would reduce the cpu requirements for monitors and clients,
particularly for large clusters.
Please add more topics here:
https://tracker.ceph.com/projects/ceph/wiki/CDM_05-may-2021
Josh
Hi all,
These are the highlights of this week's CLT meeting:[0]
- Next Ceph Developer Monthly (CDM) is approaching (May 5,
APAC-friendly) [1]. There's already a topic proposed (client-side OSDMap),
and another one might come from the RGW team.
- Identify Ceph-volume maintainters: Guillaume Aubrioux and Andrew
Schoen as back-up.
- Analyzing Ceph User Survey results: [2]
- Some interesting findings arose (like only 10% using Filestore).
- In order to publish the collected open feedback, for the sake of
privacy some anonymization will be required.
- Concerns about the clarity or interpretation of some questions
(e.g: net promoter score), which seem to contradict other
answers (perhaps
translation issues in the survey platform?).
- Next DocuBetter meeting (APAC-friendly) in ~10 hours (1:00 AM UTC).
Thanks!
Kind Regards,
Ernesto
[0] https://pad.ceph.com/p/clt-weekly-minutes
[1] https://tracker.ceph.com/projects/ceph/wiki/CDM_05-may-2021
[2] https://drive.google.com/file/d/1YjK8Wha6C5lJjQlHIoSpSbt3_Y55XNPl/view
Hi, everyone.
Recently, one of our online cluster experienced a whole cluster power
outage, and after the power recovered, many osd started to log the
following error:
2021-04-27 15:38:05.503 2b372b957700 -1
bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975,
device location [0xa7e76000~1000], logical extent 0x1b6000~1000,
object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head#
2021-04-27 15:38:05.504 2b372b957700 -1
bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975,
device location [0xa7e76000~1000], logical extent 0x1b6000~1000,
object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head#
2021-04-27 15:38:05.505 2b372b957700 -1
bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975,
device location [0xa7e76000~1000], logical extent 0x1b6000~1000,
object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head#
2021-04-27 15:38:05.506 2b372b957700 -1
bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975,
device location [0xa7e76000~1000], logical extent 0x1b6000~1000,
object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head#
2021-04-27 15:38:28.379 2b372c158700 -1
bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x40000, got 0xce935e16, expected 0x9b502da7,
device location [0xa9a80000~1000], logical extent 0x80000~1000, object
#9:c2a6d9ae:::rbd_data.3b35df93038d.0000000000000696:head#
We are using Nautilus 14.2.10 version, and we put rocksdb on top of
SSDs while bluestore data on SATA disks. It seems that the BlueStore
didn't survive the power outage, is it supposed to behave this way? Is
there any way to prevent it?
Thanks:-)