Hi all,
I have a problem regarding upgrading Ceph cluster from Pacific to Quincy
version with cephadm. I have successfully upgraded the cluster to the
latest Pacific (16.2.11). But when I run the following command to upgrade
the cluster to 17.2.5, After upgrading 3/4 mgrs, the upgrade process stops
with "Unexpected error". (everything is on a private network)
ceph orch upgrade start my-private-repo/quay-io/ceph/ceph:v17.2.5
I also tried the 17.2.4 version.
cephadm fails to check the hosts' status and marks them as offline:
cephadm 2023-04-06T10:19:59.998510+0000 mgr.host9.arhpnd (mgr.4516356) 5782
: cephadm [DBG] host host4 (x.x.x.x) failed check
cephadm 2023-04-06T10:19:59.998553+0000 mgr.host9.arhpnd (mgr.4516356) 5783
: cephadm [DBG] Host "host4" marked as offline. Skipping daemon refresh
cephadm 2023-04-06T10:19:59.998581+0000 mgr.host9.arhpnd (mgr.4516356) 5784
: cephadm [DBG] Host "host4" marked as offline. Skipping gather facts
refresh
cephadm 2023-04-06T10:19:59.998609+0000 mgr.host9.arhpnd (mgr.4516356) 5785
: cephadm [DBG] Host "host4" marked as offline. Skipping network refresh
cephadm 2023-04-06T10:19:59.998633+0000 mgr.host9.arhpnd (mgr.4516356) 5786
: cephadm [DBG] Host "host4" marked as offline. Skipping device refresh
cephadm 2023-04-06T10:19:59.998659+0000 mgr.host9.arhpnd (mgr.4516356) 5787
: cephadm [DBG] Host "host4" marked as offline. Skipping osdspec preview
refresh
cephadm 2023-04-06T10:19:59.998682+0000 mgr.host9.arhpnd (mgr.4516356) 5788
: cephadm [DBG] Host "host4" marked as offline. Skipping autotune
cluster 2023-04-06T10:20:00.000151+0000 mon.host8 (mon.0) 158587 : cluster
[ERR] Health detail: HEALTH_ERR 9 hosts fail cephadm check; Upgrade: failed
due to an unexpected exception
cluster 2023-04-06T10:20:00.000191+0000 mon.host8 (mon.0) 158588 : cluster
[ERR] [WRN] CEPHADM_HOST_CHECK_FAILED: 9 hosts fail cephadm check
cluster 2023-04-06T10:20:00.000202+0000 mon.host8 (mon.0) 158589 : cluster
[ERR] host host7 (x.x.x.x) failed check: Unable to reach remote host
host7. Process exited with non-zero exit status 3
cluster 2023-04-06T10:20:00.000213+0000 mon.host8 (mon.0) 158590 : cluster
[ERR] host host2 (x.x.x.x) failed check: Unable to reach remote host
host2. Process exited with non-zero exit status 3
cluster 2023-04-06T10:20:00.000220+0000 mon.host8 (mon.0) 158591 : cluster
[ERR] host host8 (x.x.x.x) failed check: Unable to reach remote host
host8. Process exited with non-zero exit status 3
cluster 2023-04-06T10:20:00.000228+0000 mon.host8 (mon.0) 158592 : cluster
[ERR] host host4 (x.x.x.x) failed check: Unable to reach remote host
host4. Process exited with non-zero exit status 3
cluster 2023-04-06T10:20:00.000240+0000 mon.host8 (mon.0) 158593 : cluster
[ERR] host host3 (x.x.x.x) failed check: Unable to reach remote host
host3. Process exited with non-zero exit status 3
and here are some outputs of the commands:
[root@host8 ~]# ceph -s
cluster:
id: xxx
health: HEALTH_ERR
9 hosts fail cephadm check
Upgrade: failed due to an unexpected exception
services:
mon: 5 daemons, quorum host8,host1,host7,host2,host9 (age 2w)
mgr: host9.arhpnd(active, since 105m), standbys: host8.jowfih,
host1.warjsr, host2.qyavjj
mds: 1/1 daemons up, 3 standby
osd: 37 osds: 37 up (since 8h), 37 in (since 3w)
data:
io:
client:
progress:
Upgrade to 17.2.5 (0s)
[............................]
[root@host8 ~]# ceph orch upgrade status
{
"target_image": "my-private-repo/quay-io/ceph/ceph@sha256
:34c763383e3323c6bb35f3f2229af9f466518d9db926111277f5e27ed543c427",
"in_progress": true,
"which": "Upgrading all daemon types on all hosts",
"services_complete": [],
"progress": "3/59 daemons upgraded",
"message": "Error: UPGRADE_EXCEPTION: Upgrade: failed due to an
unexpected exception",
"is_paused": true
}
[root@host8 ~]# ceph cephadm check-host host7
check-host failed:
Host 'host7' not found. Use 'ceph orch host ls' to see all managed hosts.
[root@host8 ~]# ceph versions
{
"mon": {
"ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894)
pacific (stable)": 5
},
"mgr": {
"ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894)
pacific (stable)": 1,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757)
quincy (stable)": 3
},
"osd": {
"ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894)
pacific (stable)": 37
},
"mds": {
"ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894)
pacific (stable)": 4
},
"overall": {
"ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894)
pacific (stable)": 47,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757)
quincy (stable)": 3
}
}
The strange thing is I can rollback the cluster status by failing to
not-upgraded mgr like this:
ceph mgr fail
ceph orch upgrade start my-private-repo/quay-io/ceph/ceph:v16.2.11
Would you happen to have any idea about this?
Best regards,
Reza
Hi Ceph users
We are using Ceph Pacific (16) in this specific deployment.
In our use case we do not want our users to be able to generate signature v4 URLs because they bypass the policies that we set on buckets (e.g IP restrictions).
Currently we have a sidecar reverse proxy running that filters requests with signature URL specific request parameters.
This is obviously not very efficient and we are looking to replace this somehow in the future.
1. Is there an option in RGW to disable this signed URLs (e.g returning status 403)?
2. If not is this planned or would it make sense to add it as a configuration option?
3. Or is the behaviour of not respecting bucket policies in RGW with signature v4 URLs a bug and they should be actually applied?
Thanks you for your help and let me know if you have any questions
Marc Singer
Hi,
Other than get all objects of the pool and filter by image ID,
is there any easier way to get the number of allocated objects for
a RBD image?
What I really want to know is the actual usage of an image.
An allocated object could be used partially, but that's fine,
no need to be 100% accurate. To get the object count and
times object size, that should be sufficient.
"rbd export" exports actual used data, but to get the actual usage
by exporting the image seems too much. This brings up another
question, is there any way to know the export size before running it?
Thanks!
Tony
Hi Eugen
Please find the details below
root@meghdootctr1:/var/log/ceph# ceph -s
cluster:
id: c59da971-57d1-43bd-b2b7-865d392412a5
health: HEALTH_WARN
nodeep-scrub flag(s) set
544 pgs not deep-scrubbed in time
services:
mon: 3 daemons, quorum meghdootctr1,meghdootctr2,meghdootctr3 (age 5d)
mgr: meghdootctr1(active, since 5d), standbys: meghdootctr2, meghdootctr3
mds: 3 up:standby
osd: 36 osds: 36 up (since 34h), 36 in (since 34h)
flags nodeep-scrub
data:
pools: 2 pools, 544 pgs
objects: 10.14M objects, 39 TiB
usage: 116 TiB used, 63 TiB / 179 TiB avail
pgs: 544 active+clean
io:
client: 24 MiB/s rd, 16 MiB/s wr, 2.02k op/s rd, 907 op/s wr
Ceph Versions:
root@meghdootctr1:/var/log/ceph# ceph --version
ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus
(stable)
Ceph df -h
https://pastebin.com/1ffucyJg
Ceph OSD performance dump
https://pastebin.com/1R6YQksE
Ceph tell osd.XX bench (Out of 36 osds only 8 OSDs give High IOPS value of 250
+. Out of that 4 OSDs are from HP 3PAR and 4 OSDS from DELL EMC. We are using
only 4 OSDs from HP3 par and it is working fine without any latency and iops
issues from the beginning but the remaining 32 OSDs are from DELL EMC in which 4
OSDs are much better than the remaining 28 OSDs)
https://pastebin.com/CixaQmBi
Please help me to identify if the issue is with the DELL EMC Storage, Ceph
configuration parameter tuning or the Overload in the cloud setup
On November 1, 2023 at 9:48 PM Eugen Block <eblock(a)nde.ag> wrote:
> Hi,
>
> for starters please add more cluster details like 'ceph status', 'ceph
> versions', 'ceph osd df tree'. Increasing the to 10G was the right
> thing to do, you don't get far with 1G with real cluster load. How are
> the OSDs configured (HDD only, SSD only or HDD with rocksdb on SSD)?
> How is the disk utilization?
>
> Regards,
> Eugen
>
> Zitat von prabhav(a)cdac.in:
>
> > In a production setup of 36 OSDs( SAS disks) totalling 180 TB
> > allocated to a single Ceph Cluster with 3 monitors and 3 managers.
> > There were 830 volumes and VMs created in Openstack with Ceph as a
> > backend. On Sep 21, users reported slowness in accessing the VMs.
> > Analysing the logs lead us to problem with SAS , Network congestion
> > and Ceph configuration( as all default values were used). We updated
> > the Network from 1Gbps to 10Gbps for public and cluster networking.
> > There was no change.
> > The ceph benchmark performance showed that 28 OSDs out of 36 OSDs
> > reported very low IOPS of 30 to 50 while the remaining showed 300+
> > IOPS.
> > We gradually started reducing the load on the ceph cluster and now
> > the volumes count is 650. Now the slow operations has gradually
> > reduced but I am aware that this is not the solution.
> > Ceph configuration is updated with increasing the
> > osd_journal_size to 10 GB,
> > osd_max_backfills = 1
> > osd_recovery_max_active = 1
> > osd_recovery_op_priority = 1
> > bluestore_cache_trim_max_skip_pinned=10000
> >
> > After one month, now we faced another issue with Mgr daemon stopped
> > in all 3 quorums and 16 OSDs went down. From the
> > ceph-mon,ceph-mgr.log could not get the reason. Please guide me as
> > its a production setup
> > _______________________________________________
> > ceph-users mailing list -- ceph-users(a)ceph.io
> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
Thanks & Regards,
Ms V A Prabha / श्रीमती प्रभा वी ए
Joint Director / संयुक्त निदेशक
Centre for Development of Advanced Computing(C-DAC) / प्रगत संगणन विकास
केन्द्र(सी-डैक)
Tidel Park”, 8th Floor, “D” Block, (North &South) / “टाइडल पार्क”,8वीं मंजिल,
“डी” ब्लॉक, (उत्तर और दक्षिण)
No.4, Rajiv Gandhi Salai / नं.4, राजीव गांधी सलाई
Taramani / तारामणि
Chennai / चेन्नई – 600113
Ph.No.:044-22542226/27
Fax No.: 044-22542294
------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------
Hi,
I'm facing a rather new issue with our Ceph cluster: from time to time
ceph-mgr on one of the two mgr nodes gets oom-killed after consuming over
100 GB RAM:
[Nov21 15:02] tp_osd_tp invoked oom-killer:
gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[ +0.000010] oom_kill_process.cold+0xb/0x10
[ +0.000002] [ pid ] uid tgid total_vm rss pgtables_bytes
swapents oom_score_adj name
[ +0.000008]
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167
[ +0.000697] Out of memory: Killed process 3941610 (ceph-mgr)
total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB, shmem-rss:0kB,
UID:167 pgtables:260356kB oom_score_adj:0
[ +6.509769] oom_reaper: reaped process 3941610 (ceph-mgr), now
anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
The cluster is stable and operating normally, there's nothing unusual going
on before, during or after the kill, thus it's unclear what causes the mgr
to balloon, use all RAM and get killed. Systemd logs aren't very helpful:
they just show normal mgr operations until it fails to allocate memory and
gets killed: https://pastebin.com/MLyw9iVi
The mgr experienced this issue several times in the last 2 months, and the
events don't appear to correlate with any other events in the cluster
because basically nothing else happened at around those times. How can I
investigate this and figure out what's causing the mgr to consume all
memory and get killed?
I would very much appreciate any advice!
Best regards,
Zakhar
Hi,
As I’v read and thought a lot about the migration as this is a bigger project, I was wondering if anyone has done that already and might share some notes or playbooks, because in all readings there where some parts missing or miss understandable to me.
I do have some different approaches in mind, so may be you have some suggestions or hints.
a) upgrade nautilus on centos 7 with the few missing features like dashboard and prometheus. After that migrate one node after an other to ubuntu 20.04 with octopus and than upgrade ceph to the recent stable version.
b) migrate one node after an other to ubuntu 18.04 with nautilus and then upgrade to octupus and after that to ubuntu 20.04.
or
c) upgrade one node after an other to ubuntu 20.04 with octopus and join it to the cluster until all nodes are upgraded.
For test I tried c) with a mon node, but adding that to the cluster fails with some failed state, still probing for the other mons. (I dont have the right log at hand right now.)
So my questions are:
a) What would be the best (most stable) migration path and
b) is it in general possible to add a new octopus mon (not upgraded one) to a nautilus cluster, where the other mons are still on nautilus?
I hope my thoughts and questions are understandable :)
Thanks for any hint and suggestion. Best . Götz
Hello,
I would like to share a quite worrying experience I’ve just found on one of my production clusters.
User successfully created a bucket with name of a bucket that already exists!
He is not bucket owner - the original user is, but he is able to see it when he does ListBuckets over s3 api. (Both accounts are able to do it now - only the original owner is able to interact with it)
This bucket is also counted to the new users usage stats.
Has anyone noticed this before? This cluster is running on Quincy - 17.2.6.
Is there a way to detach the bucket from the new owner so he doesn’t have a bucket that doesn’t belong to him?
Regards,
Ondrej
Hey Ceph-Users,
RGW does have options [1] to rate limit ops or bandwidth per bucket or user.
But those only come into play when the request is authenticated.
I'd like to also protect the authentication subsystem from malicious or
invalid requests.
So in case e.g. some EC2 credentials are not valid (anymore) and clients
start hammering the RGW with those requests, I'd like to make it cheap
to deal with those requests. Especially in case some external
authentication like OpenStack Keystone [2] is used, valid access tokens
are cached within the RGW. But requests with invalid credentials end up
being sent at full rate to the external API [3] as there is no negative
caching. And even if there was, that would only limit the external auth
requests for the same set of invalid credentials, but it would surely
reduce the load in that case:
Since the HTTP request is blocking ....
> [...]
> 2023-12-18T15:25:55.861+0000 7fec91dbb640 20 sending request to
> https://keystone.example.com/v3/s3tokens
> 2023-12-18T15:25:55.861+0000 7fec91dbb640 20 register_request
> mgr=0x561a407ae0c0 req_data->id=778, curl_handle=0x7fedaccb36e0
> 2023-12-18T15:25:55.861+0000 7fec91dbb640 20 WARNING: blocking http
> request
> 2023-12-18T15:25:55.861+0000 7fede37fe640 20 link_request
> req_data=0x561a40a418b0 req_data->id=778, curl_handle=0x7fedaccb36e0
> [...]
this does not only stress the external authentication API (keystone in
this case), but also blocks RGW threads for the duration of the external
call.
I am currently looking into using the capabilities of HAProxy to rate
limit requests based on their resulting http-response [4]. So in essence
to rate-limit or tarpit clients that "produce" a high number of 403
"InvalidAccessKeyId" responses. To have less collateral it might make
sense to limit based on the presented credentials themselves. But this
would require to extract and track HTTP headers or URL parameters
(presigned URLs) [5] and to put them into tables.
* What are your thoughts on the matter?
* What kind of measures did you put in place?
* Does it make sense to extend RGWs capabilities to deal with those
cases itself?
** adding negative caching
** rate limits on concurrent external authentication requests (or is
there a pool of connections for those requests?)
Regards
Christian
[1] https://docs.ceph.com/en/latest/radosgw/admin/#rate-limit-management
[2]
https://docs.ceph.com/en/latest/radosgw/keystone/#integrating-with-openstac…
[3]
https://github.com/ceph/ceph/blob/86bb77eb9633bfd002e73b5e58b863bc2d0df594/…
[4]
https://www.haproxy.com/documentation/haproxy-configuration-manual/latest/#…
[5]
https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-reque…
Hi,
I'm developing RBD images' backup system. In my case, a backup data
must be stored at least two weeks. To meet this requirement, I'd like
to take backups as follows:
1. Take a full backup by rbd export first.
2. Take a differencial backups everyday.
3. Merge the full backup and the oldest (taken two weeks ago) diff.
As a result of evaluation, I confirmed there is no problem in step 1
and 2. However,
I found that step 3 couldn't be accomplished by `rbd merge-diff <full
backup> <diff>`
because `rbd merge-diff` only accepts a diff as a first parameter. Is
there any way
to merge a full backup and a diff?
Thanks,
Satoru