June 2020 - ceph-users - lists.ceph.io

upgrade ceph and use cephadm - rgw issue

by Andy Goldschmidt

Hi I am trying to upgrade from Mimic (13.2.10) to Octopus (15.x). Im also tryin to sue cephadm and am following this guide.https://docs.ceph.com/docs/master/cephadm/adoption/ It was all going fine until step 11 and deploying the new RGW's. I don't have any realms set for my cluster, so how do I do it? This is also a single site cluster. # radosgw-admin realm list{ "default_info": "", "realms": []} # radosgw-admin zone list{ "default_info": "a15e2aec-a0da-4cad-a1bd-f448f25bbe3d", "zones": [ "default" ]} The step below is what I don't know what to do, as it says I need to specify the realm. 11. Redeploy RGW daemons. Cephadm manages RGW daemons by zone. For each zone, deploy new RGW daemons with cephadm: # ceph orch apply rgw <realm> <zone> <placement> [--port <port>] [--ssl] my ceph.conf has this in it about the rgw's[client.rgw.ceph-mgmt0]host = ceph-mgmt0keyring = /var/lib/ceph/radosgw/ceph-rgw.ceph-mgmt0/keyringlog file = /var/log/ceph/ceph-rgw-ceph-mgmt0.logrgw frontends = civetweb port=10.92.135.40:8080 num_threads=100rgw dns name = library.xxxxx.com [client.rgw.ceph-mgmt1]host = ceph-mgmt1keyring = /var/lib/ceph/radosgw/ceph-rgw.ceph-mgmt1/keyringlog file = /var/log/ceph/ceph-rgw-ceph-mgmt1.logrgw frontends = civetweb port=10.92.135.41:8080 num_threads=100rgw dns name = library.xxxxx.com [client.rgw.ceph-mgmt2]host = ceph-mgmt2keyring = /var/lib/ceph/radosgw/ceph-rgw.ceph-mgmt2/keyringlog file = /var/log/ceph/ceph-rgw-ceph-mgmt2.logrgw frontends = civetweb port=10.92.135.42:8080 num_threads=100rgw dns name = library.xxxxx.com RegardsAndy

3 years, 10 months

1
0
0 0

Radosgw PubSub Traffic

by Dustin Guerrero

Hey all, We’ve been running some benchmarks against Ceph which we deployed using the Rook operator in Kubernetes. Everything seemed to scale linearly until a point where I see a single OSD receiving much higher CPU load than the other OSDs (nearly 100% saturation). After some investigation we noticed a ton of pubsub traffic in the strace coming from the RGW pods like so: [pid 22561] sendmsg(77, {msg_name(0)=NULL, msg_iov(3)=[{"\21\2)\0\0\0\10\0:\1\0\0\10\0\0\0\0\0\10\0\0\0\0\0\0\20\0\0-\321\211K"..., 73}, {"\200\0\0\0pubsub.user.ceph-user-wwITOk"..., 314}, {"\0\303\34[\360\314\233\2138\377\377\377\377\377\377\377\377", 17}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL|MSG_MORE <unfinished …> I’ve checked other OSDs and only a single OSD receives these messages. I suspect its creating a bottleneck. Does anyone have an idea on why these are being generated or how to stop them? The pubsub sync module doesn’t appear to be enabled, and our benchmark is doing simple gets/puts/deletes. We’re running Ceph 14.2.5 nautilus Thank you!

3 years, 10 months

2
2
0 0

Re: Thread::try_create(): pthread_create failed

by 展荣臻（信泰）

Thank your reply Our cluster are runing for two years in production,and it has no problem,so we don't upgrade. I check memory on host.Very little memory of free left.Does creating thread failure have anything to do with this? In addition to the kvm virtual machine, there are 22 osds on the host. free -m total used free shared buff/cache available Mem: 515420 178212 4323 729 332884 335360 Swap: 8191 8145 46 > sysctl: > kernel.pid_max=4194303 > kernel.threads-max=2097152 > vm.max_map_count=524288 > > But really, why are you still running Hammer? Later releases handle a large number of OSDs *much* better. > > > On Jun 1, 2020, at 7:08 PM, 展荣臻（信泰） <zhanrzh_xt(a)teamsun.com.cn> wrote: > > > > Hi all, > > We have a hammer ceph cluster with 3 monitor,324 osds. OSD daemon and kvm is collocated on node; > > The ceph cluster are runing 2 years.Recently we added ~700 osds to the cluster,as process: > > 1.ceph osd create > > 2. mkdir -p /var/lib/ceph/osd/ceph-$osd > > 3. mkfs.xfs -f /dev/$disk > > 4. mount -o inode64,notime /dev/$disk /var/lib/ceph/osd/ceph-$osd > > 5. ceph-osd -i 0 --mkfs --mkkey > > 6.ceph auth add osd.$osd osd 'allow *' mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-$osd/keyring > > 7.ceph osd crush create-or-move $osd host=kvm101 root=default > > Mabe we do that requently.After add 122 osds, osd.1-osd.8 failed > > > > 2020-05-14 16:48:29.881021 7f6727fb9700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f6727fb9700 time 2020-05-14 16:48:29.870051 > > common/Thread.cc: 129: FAILED assert(ret == 0) > > > > ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xbc8b55] > > 2: (Thread::create(unsigned long)+0x8a) [0xbac50a] > > 3: (Pipe::accept()+0x37fb) [0xca6c3b] > > 4: (Pipe::reader()+0x1a0f) [0xcaa75f] > > 5: (Pipe::Reader::entry()+0xd) [0xcb351d] > > 6: (()+0x7dc5) [0x7f67a45ebdc5] > > 7: (clone()+0x6d) [0x7f67a30cc1cd] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > > > ulimit -u > > 2061600 > > open files 32768 > > > > > > Does anyone know what's going on? Why create thread faild? > > > > > > > > > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io > > To unsubscribe send an email to ceph-users-leave(a)ceph.io

3 years, 10 months

1
0
0 0

Thread::try_create(): pthread_create failed

by 展荣臻（信泰）

Hi all, We have a hammer ceph cluster with 3 monitor,324 osds. OSD daemon and kvm is collocated on node; The ceph cluster are runing 2 years.Recently we added ~700 osds to the cluster,as process: 1.ceph osd create 2. mkdir -p /var/lib/ceph/osd/ceph-$osd 3. mkfs.xfs -f /dev/$disk 4. mount -o inode64,notime /dev/$disk /var/lib/ceph/osd/ceph-$osd 5. ceph-osd -i 0 --mkfs --mkkey 6.ceph auth add osd.$osd osd 'allow *' mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-$osd/keyring 7.ceph osd crush create-or-move $osd host=kvm101 root=default Mabe we do that requently.After add 122 osds, osd.1-osd.8 failed 2020-05-14 16:48:29.881021 7f6727fb9700 -1 common/Thread.cc: In function 'void Thread::create(size_t)' thread 7f6727fb9700 time 2020-05-14 16:48:29.870051 common/Thread.cc: 129: FAILED assert(ret == 0) ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xbc8b55] 2: (Thread::create(unsigned long)+0x8a) [0xbac50a] 3: (Pipe::accept()+0x37fb) [0xca6c3b] 4: (Pipe::reader()+0x1a0f) [0xcaa75f] 5: (Pipe::Reader::entry()+0xd) [0xcb351d] 6: (()+0x7dc5) [0x7f67a45ebdc5] 7: (clone()+0x6d) [0x7f67a30cc1cd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. ulimit -u 2061600 open files 32768 Does anyone know what's going on? Why create thread faild?

3 years, 10 months

1
0
0 0

Deploy Ceph on the secondary datacenter for DR

by Nghia Viet Tran

Hi everyone, Currently, our client application and Ceph cluster are running on the primary datacenter. We’re planning to deploy Ceph on the secondary datacenter for DR. The secondary datacenter is in the standby mode. If something went wrong with the primary datacenter, the secondary datacenter will take over. The possible way would work in this case is that adding hosts from the secondary datacenter into the existed Ceph cluster in the primary datacenter. By this way, it would add more latency for client requests since client from primary datacenter might connects to OSD hosts in the secondary datacenter) Are there any special configurations in Ceph that fulfill this requirement? [cid:image001.png@01D6380A.599C1D00] I truly appreciate any comments! Nghia.

3 years, 10 months

2
2
0 0

Ceph Orchestrator 2020-06-01 Meeting recording

by Mike Perez

Hi everyone, Our Ceph Orchestrator meeting recording for 2020-06-01 is now available: https://www.youtube.com/watch?v=4oGb86RNPRs&feature=youtu.be -- Mike Perez He/Him Ceph Community Manager Red Hat Los Angeles <https://www.redhat.com> thingee(a)redhat.com <mailto:thingee@redhat.com> M: 1-951-572-2633 <tel:1-951-572-2633> IM: IRC Freenode/OFTC: thingee 494C 5D25 2968 D361 65FB 3829 94BC D781 ADA8 8AEA @Thingee <https://twitter.com/thingee> <https://www.redhat.com> <https://redhat.com/summit>

3 years, 10 months

1
0
0 0

Using Ceph-ansible for a luminous -> nautilus upgrade?

by Matthew Vernon

Hi, For previous Ceph version upgrades, we've used the rolling_upgrade playbook from Ceph-ansible - for example, the stable-3.0 branch supports both Jewel and Luminous, so we used it to migrate our clusters from Jewel to Luminous. As I understand it, upgrading direct from Luminous to Nautilus is a supported operation. But there is no Ceph-ansible release that supports both versions. Indeed, stable-4.0 supports Nautilus but no other releases. Is the expected process to use stable-4.0 for the upgrade, or do we have to do the upgrade by hand and only then update our version of ceph-ansible? Thanks, Matthew -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

3 years, 10 months

2
1
0 0

Cache pools at or near target size but no evict happen

by icy chan

Hi, I had configured a cache tier with max object counts 500k. But no evict happens when the object counts hit the configured maximum. Anyone experienced this issue? What should I do? $ ceph health detail HEALTH_WARN 1 cache pools at or near target size CACHE_POOL_NEAR_FULL 1 cache pools at or near target size cache pool 'cached-hdd-cache' with 887.11k objects at/near target max 500k objects $ ceph df | grep -e "POOL\|cached-hdd" POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL cached-hdd 24 1.4 TiB 1.52M 1.4 TiB 0.60 78 TiB cached-hdd-cache 25 842 GiB 887.14k 842 GiB 15.97 1.4 TiB $ ceph osd pool get cached-hdd-cache all size: 3 min_size: 1 pg_num: 128 pgp_num: 128 crush_rule: nvme-repl-rule hashpspool: true nodelete: false nopgchange: false nosizechange: false write_fadvise_dontneed: false noscrub: false nodeep-scrub: false hit_set_type: bloom hit_set_period: 1200 hit_set_count: 4 hit_set_fpp: 0.05 use_gmt_hitset: 1 target_max_objects: 500000 target_max_bytes: 1099511627776 cache_target_dirty_ratio: 0 cache_target_dirty_high_ratio: 0.7 cache_target_full_ratio: 0.9 cache_min_flush_age: 0 cache_min_evict_age: 0 min_read_recency_for_promote: 1 min_write_recency_for_promote: 1 fast_read: 0 hit_set_grade_decay_rate: 20 hit_set_search_last_n: 1 pg_autoscale_mode: warn Regs, Icy

3 years, 10 months

2
4
0 0

2024

2023

2022

2021

2020

2019

ceph-users June 2020