February 2021 - ceph-users

Outreachy May 2021

by Mike Perez

Hi everyone, I want to invite you to apply to an internship program called Outreachy! Outreachy provides three-month internships to work in Free and Open Source Software (FOSS). Outreachy internship projects may include programming, user experience, documentation, illustration, graphical design, or data science. Interns often find employment after their internship with Outreachy sponsors or jobs that use the skills they learned during their internship. Ceph has had ten projects submitted in Outreachy since 2018. Now we can submit more projects for our May-August 2021 round! The project can be coordinated on the etherpad: https://pad.ceph.com/p/project-ideas Projects need to be submitted by the mentor here for approval: https://www.outreachy.org/communities/cfp/ceph/ Outreachy internships run twice a year. The internships run from May to August and December to March. Interns are paid a stipend of $6,000 USD for the three months of work. Outreachy internships are entirely remote and are open to applicants around the world. Interns work remotely with experienced mentors. We expressly invite women (both cis and trans), trans men, and genderqueer people to apply. We also expressly invite applications from residents and nationals of the United States of any gender. They are Black/African American, Hispanic/Latin@, Native American/American Indian, Alaska Native, Native Hawaiian, or Pacific Islander. Anyone who faces under-representation, systematic bias, or discrimination in their country's technology industry is invited to apply. More details and eligibility criteria can be found here: https://www.outreachy.org/apply/eligibility/ The next Outreachy internship round is from May 24, 2021, until Aug. 24, 2021. Initial applications are currently open. Initial applications are due on Feb. 22, 2021, at 4 pm UTC. Apply today: https://www.outreachy.org/apply/ Applying to Outreachy is a little different than other internship programs. You'll fill out an initial application. If your initial application is approved, you'll move onto the five-week contribution phase. During the contribution phase, you'll make contact with project mentors and contribute to the project. Outreachy organizers have found that the most vital applicants contact mentors early, ask many questions, and continually submit contributions throughout the contribution phase. Please let Ali or I know if you have any questions about the program. The Outreachy organizers (Karen Sandler, Sage Sharp, Marina Zhurakhinskaya, Cindy Pallares, and Tony Sebro) can all be reached through our contact form: https://www.outreachy.org/contact/contact-us/. We hope you'll help us spread the word about Outreachy internships! -- Mike Perez

3 years, 1 month

1
0
0 0

Ceph 14.2.8 OSD/Pool Nearfull

by Matt Dunavant

Hi all, We've recently run into an issue where our single ceph rbd pool is throwing errors for nearfull osds. The OSDs themselves vary in PGs/%full with a low of 64/78% and a high of 73/86%. Is there any suggestions on how to get this to balance a little more cleanly? Currently we have 360 drives in a single pool with 8192 PGs. I think we may be able to double the PG num and that will balance things a bit cleaner but I just wanted to see if there may be anything the community suggests other than that. Let me know if there's any further info I forgot to provide if that'll help sort this out. Thanks, RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED ssd 741 TiB 135 TiB 606 TiB 607 TiB 81.85 TOTAL 741 TiB 135 TiB 606 TiB 607 TiB 81.85 POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL pool 1 162 TiB 46.81M 494 TiB 89.02 20 TiB cluster: health: HEALTH_WARN 85 nearfull osd(s) 1 pool(s) nearfull services: osd: 360 osds: 360 up (since 7d), 360 in (since 7d) data: pools: 1 pools, 8192 pgs objects: 46.81M objects, 169 TiB usage: 607 TiB used, 135 TiB / 741 TiB avail pgs: 8192 active+clean

3 years, 1 month

1
0
0 0

Gradually Increasing PG/PGP

by Mark Johnson

Hi, Probably a basic/stupid question but I'm asking anyway. Through lack of knowledge and experience at the time, when we set up our pools, our pool that holds the majority of our data was created with a PG/PGP num of 64. As the amount of data has grown, this has started causing issues with balance of data across OSDs. I want to increase the PG count to at least 512, or maybe 1024 - obviously, I want to do this incrementally. However, rather than going from 64 to 128, then 256 etc, I'm considering doing this in much smaller increments over a longer period of time so that it will hopefully be doing the majority of moving around of data during the quieter time of day. So, may start by going in increments of 4 until I get up to 128 and then go in jumps of 8 and so on. My question is, will I still end up with the same net result going in increments of 4 until I hit 128 as I would if I were to go straight to 128 in one hit. What I mean by that is that once I reach 128, would I have the exact same level of data balance across PGs as I would if I went straight to 128? Are there any drawbacks in going up in small increments over a long period of time? I know that I'll have uneven PG sizes until I get to that exponent of 2 but that should be OK as long as the end result is the desired result. I suspect I may have a greater amount of data moving around overall doing it this way but given my goal is to reduce the amount of intensive data moves during higher traffic times, that's not a huge concern in the grand scheme of things. Thanks in advance, Mark

3 years, 1 month

2
1
0 0

RBD Image can't be formatted - blk_error

by Gaël THEROND

Hi everyone! I'm facing a weird issue with one of my CEPH clusters: OS: CentOS - 8.2.2004 (Core) CEPH: Nautilus 14.2.11 - stable RBD using erasure code profile (K=3; m=2) When I want to format one of my RBD image (client side) I've got the following kernel messages multiple time with different sector IDs: *[2417011.790154] blk_update_request: I/O error, dev rbd23, sector 164743869184 op 0x3:(DISCARD) flags 0x4000 phys_seg 1 prio class 0[2417011.791404] rbd: rbd23: discard at objno 20110336 2490368~1703936 result -1 * At first I thought about a faulty disk BUT the monitoring system is not showing anything faulty so I decided to run manual tests on all my OSDs to look at disk health using smartctl etc. None of them is marked as not healthy and actually they don't get any counter with faulty sectors/read or writes and the Wear Level is 99% So, the only particularity of this image is it is a 80Tb image, but it shouldn't be an issue as we already have that kind of image size used on another pool. If anyone have a clue at how I could sort this out, I'll be more than happy ^^ Kind regards!

3 years, 1 month

2
5
0 0

How to get ceph-volume to take pre-existing, working auth?

by Philip Brown

I'm trying to use ceph-volume to do various things. It works fine locally, for things like ceph-volume lvm zap But when I want it to do OSD level things, it is unhappy. To use a trivial example, it wants to do things like /usr/bin/ceph --cluster ceph --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json but then dies saying, [errno 13] RADOS permission denied (error connecting to the cluster) and if I directly run that long command myself, it indeed dies. (which is not too surprising, since /var/lib/ceph/bootstrap-osd/ceph.keyring does not exist) However, if I just run from the same command prompt, /usr/bin/ceph osd tree -f json it works fine. How can i get ceph-volume to just use the creds that are already working somewhere? -- Philip Brown| Sr. Linux System Administrator | Medata, Inc. 5 Peters Canyon Rd Suite 250 Irvine CA 92606 Office 714.918.1310| Fax 714.918.1325 pbrown(a)medata.com| www.medata.com

3 years, 1 month

1
1
0 0

Re: Data Missing with RBD-Mirror

by Vikas Rana

Friends, Any help or suggestion here for missing data? Thanks, -Vikas From: Vikas Rana <vrana(a)vtiersys.com> Sent: Tuesday, February 16, 2021 12:20 PM To: 'ceph-users(a)ceph.io' <ceph-users(a)ceph.io> Subject: Data Missing with RBD-Mirror Hi Friends, We have a very weird issue with rbd-mirror replication. As per the command output, we are in sync but the OSD usage on DR side doesn't match the Prod Side. On Prod, we are using close to 52TB but on DR side we are only 22TB. We took a snap on Prod and mounted the snap on DR side and compared the data and we found lot of missing data. Please see the output below. Please help us resolve this issue or point us in right direction. Thanks, -Vikas DR# rbd --cluster cephdr mirror pool status cifs --verbose health: OK images: 1 total 1 replaying research_data: global_id: 69656449-61b8-446e-8b1e-6cf9bd57d94a state: up+replaying description: replaying, master_position=[object_number=390133, tag_tid=4, entry_tid=447832541], mirror_position=[object_number=390133, tag_tid=4, entry_tid=447832541], entries_behind_master=0 last_update: 2021-01-29 15:10:13 DR# ceph osd pool ls detail pool 5 'cifs' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 1294 flags hashpspool stripe_width 0 application rbd removed_snaps [1~5] PROD# ceph df detail POOLS: NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED cifs 17 N/A N/A 26.0TiB 30.10 60.4TiB 6860550 6.86M 873MiB 509MiB 52.1TiB DR# ceph df detail POOLS: NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED cifs 5 N/A N/A 11.4TiB 15.78 60.9TiB 3043260 3.04M 2.65MiB 431MiB 22.8TiB PROD#:/vol/research_data# du -sh * 11T Flab1 346G KLab 1.5T More 4.4T ReLabs 4.0T WLab DR#:/vol/research_data# du -sh * 2.6T Flab1 14G KLab 52K More 8.0K RLabs 202M WLab

3 years, 2 months

2
1
0 0

ceph orch and mixed SSD/rotating disks

by Philip Brown

I'm coming back to trying mixed SSD+spinning disks after maybe a year. It was my vague recollection, that if you told ceph "go auto configure all the disks", it would actually automatically carve up the SSDs into the appropriate number of LVM segments, and use them as WAL devices for each hdd based OSD on the system. Was I wrong? Because when I tried to bring up a brand new cluster (Octopus, cephadm bootstrapped), with multiple nodes and multiple disks per node... it seemed to bring up the SSDS as just another set of OSDs. it clearly recognized them as ssd. The output of "ceph orch device ls" showed them as ssd vs hdd for the others. It just...didnt use them as I expected. ? Maybe I was thinking of ceph ansible. Is there not a nice way to do this with the new cephadm based "ceph orch"? I would rather not have to go write json files or whatever by hand, when a computer should be perfectly capable of auto generating this stuff itself -- Philip Brown| Sr. Linux System Administrator | Medata, Inc. 5 Peters Canyon Rd Suite 250 Irvine CA 92606 Office 714.918.1310| Fax 714.918.1325 pbrown(a)medata.com| www.medata.com

3 years, 2 months

2
2
0 0

how to turn off lingering all-available-devices

by Philip Brown

At one point in the life cycle of my test ceph cluster, I used the --all-available-devices flag of ceph orch. Which will always attempt to bring up any new autodetected disks. I now see in the docs, "If you want to avoid this behavior (disable automatic creation of OSD on available devices), use the unmanaged parameter:" But... I believe I did run it with the --unmanaged flag afterwards. Unfortunately, the original still seems to persist, and it keeps auto creating. How can I get it to stop? I also see mention that, "When the parameter all-available-devices or a DriveGroup specification is used, a cephadm service is created" However, using "ceph orch ps", I dont see any relevantly named service. Where else should I be looking? -- Philip Brown| Sr. Linux System Administrator | Medata, Inc. 5 Peters Canyon Rd Suite 250 Irvine CA 92606 Office 714.918.1310| Fax 714.918.1325 pbrown(a)medata.com| www.medata.com

3 years, 2 months

1
0
0 0

rbd move between pools

by Marc

What is the best way to move an rbd image to a different pool. I want to move some 'old' images (some have snapshots) to backup pool. For some there is also a difference in device class.

3 years, 2 months

3
6
0 0

leave

by Hamidreza Hosseini

3 years, 2 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users February 2021