April 2021 - Dev - lists.ceph.io

by Loïc Dachary

Hi Mark, While trying to figure out a random failure in the mempool tests[0] introduced when fixing a bug in how mempool selects shards holding the byte count of a given pool[1] earlier this year, I was intrigued by this "cache line ping pong" problem[2]. And I wonder if you have some kind of benchmark, somewhere in your toolbox, that someone could use to demonstrate the problem. Maybe such a code could be adapted to show the benefit of the optimization implemented in mempool? Cheers [0] https://tracker.ceph.com/issues/49781#note-9 [1] https://github.com/ceph/ceph/pull/39057/files [2] https://www.drdobbs.com/parallel/understanding-and-avoiding-memory-issues/2… -- Loïc Dachary, Artisan Logiciel Libre

2 years, 12 months

7
44
0 0

Orchestrator CDS follow-up

by Sebastian Wagner

Hi, we're going to re-use the Orchestrator weekly for the CDS follow-up. See https://pad.ceph.com/p/orchestration-weekly for details Best, Sebastian

2 years, 12 months

2
1
0 0

Best Interior Designers in Gurgaon - DefineHomz

by definehomz2020＠gmail.com

DefineHomz is one of the best interior designers as we have a high level of passion for what we do, our designs are highly regarded and have reached the top by filling the splash of colors at our clients' homes. Visit us - https://www.definehomz.com

2 years, 12 months

1
0
0 0

Interior Design Company in Gurgaon - DefineHomz

by definehomz2020＠gmail.com

DefineHomz is the best interior design company as we formulate full design ideas from scratch based on specific client requirements. We keep our clients involved and engaged in the pre-stages or during the process to make sure our clients are happy and satisfied with the implementation of the ideas. Visit us - https://www.definehomz.com/home-interior

2 years, 12 months

1
0
0 0

04/29/2021 perf meeting is on!

by Mark Nelson

Hi Folks, The performance meeting will be starting in about 10 minutes at 8AM PST! Today we'll hopefully talk a bit about bluestore write path locking, omap performance with prefetch and buffered IO, and bluestore cache trimming. Please feel free to add your own topic if you'd like! Etherpad: https://pad.ceph.com/p/performance_weekly Bluejeans: https://bluejeans.com/908675367 Mark

2 years, 12 months

1
0
0 0

Data loss after backfill

by hase.jin＠fujitsu.com

Hi, I would like to report the serious problem about data loss in customer environment. An OSD data loss has been propagated to other OSDs. If backfill is performed when shard is missing in a primary OSD, the shard that is corresponding to the shard in a primary OSD is also missing in the OSD to which the backfill is directed. In case of 4+2 erasure coding, if copies are occurred against two OSDs during one backfill, three shards are missing(primary + two copies), making data recovery impossible. This data loss depends on setting of erasure coding and the number of copies during backfill. In fact, I could reproduce this situation. This is the actual data loss, and we need to fix this problem. I will verify this with the latest version of ceph, and issue a ticket to redmine later and also report detail information. In this mail, I share simple information of environment and procedure to reproduce at first. Environment: - Ceph version: Nautilus - Erasure coding: 4+2 - Type: filestore Step to Reproduce: 1. Setup more than 6 OSDs (with leaving some extra OSD out). 2. Store some object to pool. 3. Delete a file from a primary OSD in the PG. (In fact, the shard on the primary OSD was unrecognized due to medium error of the primary OSD in the customer environment. To simulate this situation, run `rm`.) e.g.) rm -f /var/lib/ceph/osd/ceph-7/current/1.0s0_head/<some file>.04.21.09\:55\:* 4. Cause backfill in the PG. This time, I could occur backfill by setting OSD to `in` from `out`. e.g.) ceph osd in osd.5 5. ceph -s show active+clean status but object is lost on both primary and backfilled OSDs.

2 years, 12 months

3
10
0 0

CDM next week

by Josh Durgin

Hey folks, next week will be APAC-friendly time - 5 May @ 2100 EST. That's May 6 @ 0100 UTC - join at: https://bluejeans.com/908675367 We've got one topic so far - osdmaps specialized for client consumption. This would reduce the cpu requirements for monitors and clients, particularly for large clusters. Please add more topics here: https://tracker.ceph.com/projects/ceph/wiki/CDM_05-may-2021 Josh

2 years, 12 months

1
0
0 0

pacific repo is locked

by Yuri Weinstein

pacific repo is locked and we will start QE validation for 16.2.2 release. Thx YuriW

2 years, 12 months

1
0
0 0

Ceph Leadership Team meeting 2021-04-28

by Ernesto Puerta

Hi all, These are the highlights of this week's CLT meeting:[0] - Next Ceph Developer Monthly (CDM) is approaching (May 5, APAC-friendly) [1]. There's already a topic proposed (client-side OSDMap), and another one might come from the RGW team. - Identify Ceph-volume maintainters: Guillaume Aubrioux and Andrew Schoen as back-up. - Analyzing Ceph User Survey results: [2] - Some interesting findings arose (like only 10% using Filestore). - In order to publish the collected open feedback, for the sake of privacy some anonymization will be required. - Concerns about the clarity or interpretation of some questions (e.g: net promoter score), which seem to contradict other answers (perhaps translation issues in the survey platform?). - Next DocuBetter meeting (APAC-friendly) in ~10 hours (1:00 AM UTC). Thanks! Kind Regards, Ernesto [0] https://pad.ceph.com/p/clt-weekly-minutes [1] https://tracker.ceph.com/projects/ceph/wiki/CDM_05-may-2021 [2] https://drive.google.com/file/d/1YjK8Wha6C5lJjQlHIoSpSbt3_Y55XNPl/view

2 years, 12 months

1
0
0 0

BlueStore not surviving power outage

by Xuehan Xu

Hi, everyone. Recently, one of our online cluster experienced a whole cluster power outage, and after the power recovered, many osd started to log the following error: 2021-04-27 15:38:05.503 2b372b957700 -1 bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975, device location [0xa7e76000~1000], logical extent 0x1b6000~1000, object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head# 2021-04-27 15:38:05.504 2b372b957700 -1 bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975, device location [0xa7e76000~1000], logical extent 0x1b6000~1000, object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head# 2021-04-27 15:38:05.505 2b372b957700 -1 bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975, device location [0xa7e76000~1000], logical extent 0x1b6000~1000, object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head# 2021-04-27 15:38:05.506 2b372b957700 -1 bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975, device location [0xa7e76000~1000], logical extent 0x1b6000~1000, object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head# 2021-04-27 15:38:28.379 2b372c158700 -1 bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x40000, got 0xce935e16, expected 0x9b502da7, device location [0xa9a80000~1000], logical extent 0x80000~1000, object #9:c2a6d9ae:::rbd_data.3b35df93038d.0000000000000696:head# We are using Nautilus 14.2.10 version, and we put rocksdb on top of SSDs while bluestore data on SATA disks. It seems that the BlueStore didn't survive the power outage, is it supposed to behave this way? Is there any way to prevent it? Thanks:-)

2 years, 12 months

3
3
0 0

2024

2023

2022

2021

2020

2019

Dev April 2021