October 2019 - ceph-users

by Mike Kelly

Hi, I have a fresh Nautilus Ceph cluster with radosgw as a front end. I've been testing with a slightly modified version of https://github.com/wasabi-tech/s3-benchmark/ I have 5 storage nodes with 4 osds each, for a total of 20 osds. I am testing locally on a single rgw node. First, I uploaded a bunch of 1GB objects. Now I'm attempting to download them in random order and measure the time it takes to fetch them. My problem is that during the download phase rgw will hang and the process will suck up 100% CPU on the civitwed-worker thread (according to top). The logs show that it downloads segments of the object but then stops part way though and never continues. I tried using beast instead of civitweb as a front-end, but it still hangs in the same way, leading me to believe that this is a back-end issue. This is the end of the logs, as you can see the first three lines show a successful read, and the last line show that it starts a read attempt but never completes: 2019-10-08 13:35:42.673 7fc6cec40700 20 rados->get_obj_iterate_cb oid=2217f6c8-5a9f-4cfc-a1a7-1ced740afb81.127425.2__shadow_.SCoV2VuKnMkiOqi2n3FcWgveOJYu4Io_18 obj-ofs=75497472 read_ofs=0 len=4194304 2019-10-08 13:35:42.673 7fc6cec40700 20 RGWObjManifest::operator++(): rule->part_size=0 rules.size()=1 2019-10-08 13:35:42.673 7fc6cec40700 20 RGWObjManifest::operator++(): result: ofs=79691776 stripe_ofs=79691776 part_ofs=0 rule->part_size=0 2019-10-08 13:35:42.673 7fc6cec40700 20 rados->get_obj_iterate_cb oid=2217f6c8-5a9f-4cfc-a1a7-1ced740afb81.127425.2__shadow_.SCoV2VuKnMkiOqi2n3FcWgveOJYu4Io_19 obj-ofs=79691776 read_ofs=0 len=4194304 Can someone advise me if I've misconfigured something, or happened to find a bug? Thanks, Mike

4 years, 6 months

1
0
0 0

Radosgw Usage Show Issue

by Mac Wynkoop

Hi Everyone, So it recently came to my attention that on one of our clusters, running the command "radosgw-admin usage show" returns a blank response. What is going on behind the scenes with this command, and why might it not be seeing any of the buckets properly? The data is still accessible over S3 via the rgw service, it's just not showing us either the Index or Metadata of the buckets. Greatly appreciate everyone's help in advance. Thanks, Mac

4 years, 6 months

1
0
0 0

CephFS exposing public storage network

by Jaan Vaks

Hi all, I'm evaluation cephfs to serve our business as a file share that span across our 3 datacenters. One concern that I have is that when using cephfs and OpenStack Manila is that all guest vms needs access to the public storage net. This to me feels like a security concern. I've seen one suggestion is to put NFS gateways in between to prevent this, I would prefer not having to use NFS. Is there another way to solve this or is this a no concern to others, both the network and NFS? We are a small cloud provider and having different customers exposed to each other on the same storage net seems risky to me. Regards Jaan

4 years, 6 months

4
4
0 0

Re: PG is stuck in repmapped and degraded

by 展荣臻（信泰）

>If the journal is no longer readable: the safe variant is to >completely re-create the OSDs after replacing the journal disk. (The >unsafe way to go is to just skip the --flush-journal part, not >recommended) Hello paul, Thank for your reply.we has replaced the journal disk. Last week we were on vacation,so this email is delayed. My confusion is that why pg was stuck? PG should to be repaired automatically,when osd is down isn't it?

4 years, 6 months

2
1
0 0

Nautilus: BlueFS spillover

by Eugen Block

Hi, I'm following the discussion for a tracker issue [1] about spillover warnings that affects our upgraded Nautilus cluster. Just to clarify, would a resize of the rocksDB volume (and expanding with 'ceph-bluestore-tool bluefs-bdev-expand...') resolve that or do we have to recreate every OSD? Regards, Eugen [1] https://tracker.ceph.com/issues/38745

4 years, 7 months

4
8
0 0

Re: cephx user performance impact

by iwesley＠mail.de

For performance stuff you’d better setup a running environment to benchmark by yourself. Regards, Wesley Peng > Am Oct 7, 2019 - 1:17 AM schrieb frankaritchie(a)gmail.com: > > > Would RBD performance be hurt by having thousands of cephx users defined? >

4 years, 7 months

1
0
0 0

cephx user performance impact

by Frank R

Would RBD performance be hurt by having thousands of cephx users defined?

4 years, 7 months

1
0
0 0

Comercial support - Brazil

by Gesiel Galvão Bernardes

Hi everyone, I searching to consulting and support of Ceph in Brazil. Does anyone on the list provide consulting in Brazil? Regards, Gesiel

4 years, 7 months

2
1
0 0

Re: RAM recommendation with large OSDs?

by Anthony D'Atri

It’s not that the limit is *ignored*; sometimes the failure of the subtree isn’t *detected*. Eg., I’ve seen this happen when a node experienced kernel weirdness or OOM conditions such that the OSDs didn’t all get marked down at the same time, so the PGs all started recovering. Admitedly it’s been a while since I’ve seen this, my sense is that with Luminous the detection became a *lot* better. > On Oct 3, 2019, at 9:55 AM, Darrell Enns <darrelle(a)knowledge.ca> wrote: > > Thanks for the reply Anthony. > > Those are all considerations I am very much aware of. I'm very curious about this though: > >> mon_osd_down_out_subtree_limit. There are cases where it doesn’t kick in and a whole node will attempt to rebalance > > In what cases is the limit ignored? Do these exceptions also apply to mon_osd_min_in_ratio? Is this in the docs somewhere? > [ good Cephers trim their quoted text ]

4 years, 7 months

1
0
0 0

Re: RAM recommendation with large OSDs?

by Anthony D'Atri

This is in part a question of *how many* of those dense OSD nodes you have. If you have a hundred of them, then most likely they’re spread across a decent number of racks and the loss of one or two is a tolerable *fraction* of the whole cluster. If you have a cluster of just, say, 3-4 of these dense nodes, component failure, network glitches, and even maintenance become problematic. You can *mostly* forestall whole-node rebalancing by careful alignment of fault domains with the value of mon_osd_down_out_subtree_limit. There are cases where it doesn’t kick in and a whole node will attempt to rebalance, which — assuming the CRUSH rules and topology are fault-tolerant — may cause surviving OSDs to reach full or backfillfull states, potentially resulting in an outage. If the limit does kick in, you’ll have reduced or no redundancy until you either bring the host/OSDs back up, or manually cause the recovery to proceed. As was already mentioned as well, having a small number of fault domains also limits the EC strategies you can safely use. > Thanks Paul. I was speaking more about total OSDs and RAM, rather than a single node. However, I am considering building a cluster with a large OSD/node count. This would be for archival use, with reduced performance and availability requirements. What issues would you anticipate with a large OSD/node count? Is the concern just the large rebalance if a node fails and takes out a large portion of the OSDs at once?

4 years, 7 months

2
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users October 2019