Hi,
I have a very old Ceph cluster running the old dumpling version 0.67.1. One
of the three monitors suffered a hardware failure and I am setting up a new
server to replace the third monitor running Ubuntu 22.04 LTS (all the other
monitors are using the old Ubuntu 12.04 LTS).
I used ceph-deploy to deploy the cluster initially, and I can't use it
since it's a very old version of ceph-deploy -- having issues with apt-key
being deprecated and since ceph-deploy is no longer maintained, I can't
upgrade it. And even if I can, I am not too sure if it works since the
dumpling version is no longer in Ceph's official repository.
So I tried to install it manually by cloning it from git:
git clone -b dumpling https://github.com/ceph/ceph.git
But when I tried to run "git submodule update --init" or "./autogen.sh" as
per the README file, I am encountering this error:
====
root@ceph-mon-04:~/ceph-dumpling/ceph# git submodule update --init
Submodule 'ceph-object-corpus' (git://ceph.com/git/ceph-object-corpus.git)
registered for path 'ceph-object-corpus'
Submodule 'src/libs3' (git://github.com/ceph/libs3.git) registered for path
'src/libs3'
Cloning into '/root/ceph-dumpling/ceph/ceph-object-corpus'...
fatal: repository 'https://ceph.com/git/ceph-object-corpus.git/' not found
fatal: clone of 'git://ceph.com/git/ceph-object-corpus.git' into submodule
path '/root/ceph-dumpling/ceph/ceph-object-corpus' failed
Failed to clone 'ceph-object-corpus'. Retry scheduled
Cloning into '/root/ceph-dumpling/ceph/src/libs3'...
Cloning into '/root/ceph-dumpling/ceph/ceph-object-corpus'...
fatal: repository 'https://ceph.com/git/ceph-object-corpus.git/' not found
fatal: clone of 'git://ceph.com/git/ceph-object-corpus.git' into submodule
path '/root/ceph-dumpling/ceph/ceph-object-corpus' failed
Failed to clone 'ceph-object-corpus' a second time, aborting
root@ceph-mon-04:~/ceph-dumpling/ceph# git submodule update --init
--recursive
Cloning into '/root/ceph-dumpling/ceph/ceph-object-corpus'...
fatal: repository 'https://ceph.com/git/ceph-object-corpus.git/' not found
fatal: clone of 'git://ceph.com/git/ceph-object-corpus.git' into submodule
path '/root/ceph-dumpling/ceph/ceph-object-corpus' failed
Failed to clone 'ceph-object-corpus'. Retry scheduled
Cloning into '/root/ceph-dumpling/ceph/ceph-object-corpus'...
fatal: repository 'https://ceph.com/git/ceph-object-corpus.git/' not found
fatal: clone of 'git://ceph.com/git/ceph-object-corpus.git' into submodule
path '/root/ceph-dumpling/ceph/ceph-object-corpus' failed
Failed to clone 'ceph-object-corpus' a second time, aborting
root@ceph-mon-04:~/ceph-dumpling/ceph#
====
It seems that the repositories required for the submodules are no longer
there. Anyone can advise me on the correct direction on how can I install
the dumpling version of Ceph for me to add a new monitor? At the moment
only 2 monitors out of 3 are up and I am worried that the cluster will be
down if I lose another monitor.
$ ceph status
cluster 1660b11f-1074-4f5d-aa7c-64b479397a2f
health HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02
What approach I should take:
- Continue trying the manual installation/compiling route?
- Continue trying the ceph-deploy route (by fixing the apt-key deprecation
issue)?
- Try to install the same old OS (Ubuntu 12.04 LTS) on the new server (not
too sure if I still have the ISO) and see if it works?
- Try to upgrade the current cluster and then add the monitor later after
upgrade? (is it risky to upgrade with HEALTH_WARN status)?
Any advice is greatly appreciated.
Best regards,
-ip-
Hi guys,
In perf dump of RGW instance I have two similar sections.
First one:
"objecter": {
"op_active": 0,
"op_laggy": 0,
"op_send": 38816,
"op_send_bytes": 199927218,
"op_resend": 0,
"op_reply": 38816,
"oplen_avg": {
"avgcount": 38816,
"sum": 90408
},
"op": 38816,
"op_r": 12624,
"op_w": 26192,
"op_rmw": 0,
"op_pg": 0,
…
}
Second one:
"objecter-0x55b63c38fb80": {
"op_active": 0,
"op_laggy": 0,
"op_send": 5540,
"op_send_bytes": 217343,
"op_resend": 0,
"op_reply": 5540,
"oplen_avg": {
"avgcount": 5540,
"sum": 5636
},
"op": 5540,
"op_r": 680,
"op_w": 4860,
"op_rmw": 0,
"op_pg": 0,
…
}
What is 0x55b63c38fb80 ?
I try to monitor ‘op_active’ metric, but this metric refreshes only in ‘objecter-0x55b63c38fb80’ section and always 0 in ‘objecter’ section and it’s so difficult to monitor this metric because this id is dynamic and will be changed on next rgw restart.
Hi,
I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but the
following problem existed when I was still everywhere on 17.2.5 .
I had a major issue in my cluster which could be solved with a lot of
your help and even more trial and error. Right now it seems that most is
already fixed but I can't rule out that there's still some problem
hidden. The very issue I'm asking about started during the repair.
When I want to orchestrate the cluster, it logs the command but it
doesn't do anything. No matter if I use ceph dashboard or "ceph orch" in
"cephadm shell". I don't get any error message when I try to deploy new
services, redeploy them etc. The log only says "scheduled" and that's
it. Same when I change placement rules. Usually I use tags. But since
they don't work anymore, too, I tried host and umanaged. No success. The
only way I can actually start and stop containers is via systemctl from
the host itself.
When I run "ceph orch ls" or "ceph orch ps" I see services I deployed
for testing being deleted (for weeks now). Ans especially a lot of old
MDS are listed as "error" or "starting". The list doesn't match reality
at all because I had to start them by hand.
I tried "ceph mgr fail" and even a complete shutdown of the whole
cluster with all nodes including all mgs, mds even osd - everything
during a maintenance window. Didn't change anything.
Could you help me? To be honest I'm still rather new to Ceph and since I
didn't find anything in the logs that caught my eye I would be thankful
for hints how to debug.
Cheers,
Thomas
--
http://www.widhalm.or.at
GnuPG : 6265BAE6 , A84CB603
Threema: H7AV7D33
Telegram, Signal: widhalmt(a)widhalm.or.at
Dear Ceph folks,
Recently one of our clients approached us with a request on encrpytion per user, i.e. using individual encrytion key for each user and encryption files and object store.
Does anyone know (or have experience) how to do with CephFS and Ceph RGW?
Any suggestionns or comments are highly appreciated,
best regards,
Samuel
huxiaoyu(a)horebdata.cn
Hi guys
I deployed the ceph cluster with cephadm and root user, but I need to
change the user to a non-root user
And I did these steps:
1- Created a non-root user on all hosts with access without password and
sudo
`$USER_NAME ALL = (root) NOPASSWD:ALL`
2- Generated a SSH key pair and use ssh-copy-it to add all hosts
`
ssh-keygen (accept the default file name and leave the passphrase empty)
ssh-copy-id USER_NAME@HOST_NAME
`
3 - ceph cephadm set-user <user>But I get "Error EINVAL: ssh connection to
root@hostname failed" error
How to deal with this issue?
What should be done to change the user to non-root?
Hi,
I don't find any documentation for this upgrade process. Is there anybody who has already done it yet?
Is the normal apt-get update method works?
Thank you
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
Details of this release are summarized here:
https://tracker.ceph.com/issues/61515#note-1
Release Notes - TBD
Seeking approvals/reviews for:
rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to
merge https://github.com/ceph/ceph/pull/51788 for
the core)
rgw - Casey
fs - Venky
orch - Adam King
rbd - Ilya
krbd - Ilya
upgrade/octopus-x - deprecated
upgrade/pacific-x - known issues, Ilya, Laura?
upgrade/reef-p2p - N/A
clients upgrades - not run yet
powercycle - Brad
ceph-volume - in progress
Please reply to this email with approval and/or trackers of known
issues/PRs to address them.
gibba upgrade was done and will need to be done again this week.
LRC upgrade TBD
TIA
Thank you for your response and for raising an important question regarding
the potential bottlenecks within the RGW or the overall Ceph cluster. I
appreciate your insight and would like to provide more information about
the issues I have been experiencing. In my deployment, RGW instances 17-20
have been encountering problems such as hanging or returning errors,
including "failed to read header: The socket was closed due to a timeout"
and "res_query() failed." These issues have led to disruptions and
congestions within the cluster. The index pool is indeed placed on a large
number of NVMe SSDs to ensure fast access and efficient indexing of data.
The number of Placement Groups (PGs) allocated for the index pool is also
configured to be sufficient for the workload
On Tue, Jun 6, 2023 at 21:27 Anthony D'Atri <anthony.datri(a)gmail.com> wrote:
> Do you have reason to believe that your bottlenecks are within RGW not
> within the cluster?
>
> e.g. is your index pool on a large number of NVMe SSDs with sufficient
> PGs? Is your bucket data on SSD as well?
>
>
> On Jun 6, 2023, at 13:52, Ramin Najjarbashi <ramin.najarbashi(a)gmail.com>
> wrote:
>
> I would like to seek your insights and recommendations regarding the
> practice of workload separation in a Ceph RGW (RADOS Gateway) cluster. I
> have been facing challenges with large queues in my deployment and would
> appreciate your expertise in determining whether workload separation is a
> recommended approach or not.
>
>
>
Hi
I would like to seek your insights and recommendations regarding the
practice of workload separation in a Ceph RGW (RADOS Gateway) cluster. I
have been facing challenges with large queues in my deployment and would
appreciate your expertise in determining whether workload separation is a
recommended approach or not.
In my current Ceph cluster, I have 20 RGW instances. Client requests are
directed to RGW1-16, while RGW17-20 are dedicated to administrative tasks
and backend usage. However, I have been encountering errors and congestion
issues due to the accumulation of large queues within the RGW instances.
Considering the above scenario, I would like to inquire about your opinions
on workload separation as a potential solution. Specifically, I am
interested in knowing whether workload separation is recommended in a Ceph
RGW cluster.
To address the queue congestion and improve performance, my proposed
solution includes separating the RGW instances based on their specific
purposes. This entails allocating dedicated instances for client requests,
backend usage, administrative tasks, metadata synchronization with other
zone groups, garbage collection (GC), and lifecycle (LC) operations.
I kindly request your feedback and insights on the following points:
1. Is workload separation considered a recommended practice in Ceph RGW
deployments?
2. What are the potential benefits and drawbacks of workload separation in
terms of performance, resource utilization, and manageability?
3. Are there any specific considerations or best practices to keep in mind
while implementing workload separation in a Ceph RGW cluster?
4. Can you share your experiences or any references/documentation that
highlight successful implementations of workload separation in Ceph RGW
deployments?
I truly value your expertise and appreciate your time and effort in
providing guidance on this matter. Your insights will contribute
significantly to optimizing the performance and stability of my Ceph RGW
cluster.