Hi
I hope this email finds you well. I am reaching out to you because I have
encountered an issue with my CEPH Bluestore cluster and I am seeking your
assistance.
I have a cluster with approximately one billion objects and when I run a PG
query, it shows that I have 27,000 objects per PG.
I have run the following command: "rados -p <mypool> ls | wc -l" which
returns the correct number of one billion objects. However, when I run the
same command per pg, the results are much less, with only 20 million
objects being reported. For example, "rados -p <mypool> --pgid 1.xx ls | wc
-l" shows only three objects in the specified PG.
This is a significant discrepancy and I am concerned about the integrity of
my data.
Do you have any idea about this discrepancy?
p.s:
I have a total of 30 million objects in a single bucket and versioning has
not been enabled for this particular bucket.
Thank you for your time and I look forward to your response.
Best regards,
Ramin
Hello Ceph community.
The company that recently hired me has a 3 mode ceph cluster that has been running and stable. I am the new lone administrator here and do not know ceph and this is my first experience with it.
The issue was that it is/was running out of space, which is why I made a 4th node and attempted to add it into the cluster. Along the way, things have begun to break. The manager daemon on boreal-01 failed to boreal-02 along the way and I tried to get it to fail back to boreal-01, but was unable, and realized while working on it yesterday I realized that the nodes in the cluster are all running different versions of the software. I suspect that might be a huge part of why things aren’t working as expected.
Boreal-01 - the host - 17.2.5:
root@boreal-01:/home/kadmin# ceph -v
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
root@boreal-01:/home/kadmin#
Boreal-01 - the admin docker instance running on the host 17.2.1:
root@boreal-01:/home/kadmin# cephadm shell
Inferring fsid 951fa730-0228-11ed-b1ef-f925f77b75d3
Inferring config /var/lib/ceph/951fa730-0228-11ed-b1ef-f925f77b75d3/mon.boreal-01/config
Using ceph image with id 'e5af760fa1c1' and tag 'v17' created on 2022-06-23 19:49:45 +0000 UTC
quay.io/ceph/ceph@sha256:d3f3e1b59a304a280a3a81641ca730982da141dad41e942631e4c5d88711a66b <http://quay.io/ceph/ceph@sha256:d3f3e1b59a304a280a3a81641ca730982da141dad41…>
root@boreal-01:/# ceph -v
ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable)
root@boreal-01:/#
Boreal-02 - 15.2.6:
root@boreal-02:/home/kadmin# ceph -v
ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable)
root@boreal-02:/home/kadmin#
Boreal-03 - 15.2.8:
root@boreal-03:/home/kadmin# ceph -v
ceph version 15.2.18 (f2877ae32a72fc25acadef57597f44988b805c38) octopus (stable)
root@boreal-03:/home/kadmin#
And the host I added - Boreal-04 - 17.2.5:
root@boreal-04:/home/kadmin# ceph -v
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
root@boreal-04:/home/kadmin#
The cluster ins’t rebalancing data, and drives are filling up unevenly, despite auto balancing being on. I can run a df and see that it isn’t working. However it says it is:
root@boreal-01:/# ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.011905",
"last_optimize_started": "Fri Feb 3 18:39:02 2023",
"mode": "upmap",
"optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",
"plans": []
}
root@boreal-01:/#
root@boreal-01:/# ceph -s
cluster:
id: 951fa730-0228-11ed-b1ef-f925f77b75d3
health: HEALTH_WARN
There are daemons running an older version of ceph
6 nearfull osd(s)
3 pgs not deep-scrubbed in time
3 pgs not scrubbed in time
4 pool(s) nearfull
1 daemons have recently crashed
services:
mon: 4 daemons, quorum boreal-01,boreal-02,boreal-03,boreal-04 (age 22h)
mgr: boreal-02.lqxcvk(active, since 19h), standbys: boreal-03.vxhpad, boreal-01.ejaggu
mds: 2/2 daemons up, 2 standby
osd: 89 osds: 89 up (since 5d), 89 in (since 45h)
data:
volumes: 2/2 healthy
pools: 7 pools, 549 pgs
objects: 227.23M objects, 193 TiB
usage: 581 TiB used, 356 TiB / 937 TiB avail
pgs: 533 active+clean
16 active+clean+scrubbing+deep
io:
client: 55 MiB/s rd, 330 KiB/s wr, 21 op/s rd, 45 op/s wr
root@boreal-01:/#
Part of me suspects that I exacerbated the problems by trying to monkey with boreal-04 for several days, trying to get the drives inside the machine turned into OSDs so that they would be used. One thing I did was attempt to upgrade the code on that machine, and I could have triggered a cluster-wide upgrade that failed outside of 1 and 4. With 2 and 3 not even running the same major release, if I did make that mistake, I can see why instead of an upgrade, things would be worse.
According to the documentation, I should be able to upgrade the entire cluster by running a single command on the admin node, but when I go to run commands I get errors that even google can’t solve:
root@boreal-01:/# ceph orch host ls
Error ENOENT: Module not found
root@boreal-01:/#
Consequently, I have very little faith that running commands to upgrade everything so that it’s all running the same code will work. I think each host could be upgraded and fix things, but do not feel confident doing so and risking our data.
Hopefully that gives a better idea of the problems I am facing. I am hoping for some professional services hours with someone who is a true expert with this software, to get us to a stable and sane deployment that can be managed without it being a terrifying guessing game, trying to get it to work.
If that is you, or if you know someone who can help — please contact me!
Thank you!
Thomas
Question:
What does the future hold with regard to cephadm vs rpm/deb packages? If it is now suggested to use cephadm and thus containers to deploy new clusters, what does the future hold? Is there an intent, at sometime in the future, to no longer support rpm/deb packages for Linux systems, and only support the cephadm container method?
I am not asking to argue containers vs traditional bare metal installs. I am just trying to plan for the future. Thanks
-Chris
Hi! 😊
It would be very kind of you to help us with that!
We have pools in our ceph cluster that are set to replicated size 2 min_size 1.
Obviously we want to go to size 3 / min_size 2 but we experience problems with that.
USED goes to 100% instantly and MAX AVAIL goes to 0. Write operations seemed to stop.
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
Pool1 24 35791G 35.04 66339G 8927762
Pool2 25 11610G 14.89 66339G 3004740
Pool3 26 17557G 100.00 0 2666972
Before the change it was like this:
NAME ID USED %USED MAX AVAIL OBJECTS
Pool1 24 35791G 35.04 66339G 8927762
Pool2 25 11610G 14.89 66339G 3004740
Pool3 26 17558G 20.93 66339G 2667013
This was quite surprising to us as we’d expect USED to go to something like 30%.
Going back to 2/1 also gave us back the 20.93% usage instantly.
What’s the matter here?
Thank you and best regards
Stefan
________________________________
BearingPoint GmbH
Sitz: Wien
Firmenbuchgericht: Handelsgericht Wien
Firmenbuchnummer: FN 175524z
The information in this email is confidential and may be legally privileged. If you are not the intended recipient of this message, any review, disclosure, copying, distribution, retention, or any action taken or omitted to be taken in reliance on it is prohibited and may be unlawful. If you are not the intended recipient, please reply to or forward a copy of this message to the sender and delete the message, any attachments, and any copies thereof from your system.
Hi everyone,
(sorry for the spam, apparently I was not subscribed to the ml)
I have a ceph test cluster and a proxmox test cluster (for try upgrade in test before the prod).
My ceph cluster is made up of three servers running debian 11, with two separate networks (cluster_network and public_network, in VLANs).
In ceph version 16.2.10 (cephadm with docker).
Each server has one MGR, one MON and 8 OSDs.
cluster:
id: xxx
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph01,ceph03,ceph02 (age 2h)
mgr: ceph03(active, since 77m), standbys: ceph01, ceph02
osd: 24 osds: 24 up (since 7w), 24 in (since 6M)
data:
pools: 3 pools, 65 pgs
objects: 29.13k objects, 113 GiB
usage: 344 GiB used, 52 TiB / 52 TiB avail
pgs: 65 active+clean
io:
client: 1.3 KiB/s wr, 0 op/s rd, 0 op/s wr
The proxmox cluster is also made up of 3 servers running proxmox 7.2-7 (with proxmox ceph pacific which is on 16.2.9 version). The ceph storage used is RBD (on the ceph public_network). I added the RBD datastores simply via the GUI.
So far so good. I have several VMs, on each of the proxmox.
When I update ceph to 16.2.11, that's where things go wrong.
I don't like when the update does everything for me without control, so I did a "staggered upgrade", following the official procedure (https://docs.ceph.com/en/pacific/cephadm/upgrade/#staggered-upgrade). As the version I'm starting from doesn't support staggered upgrade, I follow the procedure (https://docs.ceph.com/en/pacific/cephadm/upgrade/#upgrading-to-a-version-th…).
When I do the "ceph orch redeploy" of the two standby MGRs, everything is fine.
I do the "sudo ceph mgr fail", everything is fine (it switches well to an mgr which was standby, so I get an MGR 16.2.11).
However, when I do the "sudo ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.11 --daemon-types mgr", it updates me the last MGR which was not updated (so far everything is still fine), but it does a last restart of all the MGRs to finish, and there, the proxmox visibly loses the RBD and turns off all my VMs.
Here is the message in the proxmox syslog:
Feb 2 16:20:52 pmox01 QEMU[436706]: terminate called after throwing an instance of 'std::system_error'
Feb 2 16:20:52 pmox01 QEMU[436706]: what(): Resource deadlock avoided
Feb 2 16:20:52 pmox01 kernel: [17038607.686686] vmbr0: port 2(tap102i0) entered disabled state
Feb 2 16:20:52 pmox01 kernel: [17038607.779049] vmbr0: port 2(tap102i0) entered disabled state
Feb 2 16:20:52 pmox01 systemd[1]: 102.scope: Succeeded.
Feb 2 16:20:52 pmox01 systemd[1]: 102.scope: Consumed 43.136s CPU time.
Feb 2 16:20:53 pmox01 qmeventd[446872]: Starting cleanup for 102
Feb 2 16:20:53 pmox01 qmeventd[446872]: Finished cleanup for 102
For ceph, everything is fine, it does the update, and tells me everything is OK in the end.
Ceph is now on 16.2.11 and the health is OK.
When I redo a downgrade of the MGRs, I have the problem again and when I start the procedure again, I still have the problem. It's very reproducible.
According to my tests, the "sudo ceph orch upgrade" command always gives me trouble, even when trying a real staggered upgrade from and to version 16.2.11 with the command:
sudo ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.11 --daemon-types mgr --hosts ceph01 --limit 1
Does anyone have an idea?
Thank you everyone !
Pierre.
Hi Team,
We have a ceph cluster with 3 storage nodes:
1. storagenode1 - abcd:abcd:abcd::21
2. storagenode2 - abcd:abcd:abcd::22
3. storagenode3 - abcd:abcd:abcd::23
The requirement is to mount ceph using the domain name of MON node:
Note: we resolved the domain name via DNS server.
For this we are using the command:
```
mount -t ceph [storagenode.storage.com]:6789:/ /backup -o
name=admin,secret=AQCM+8hjqzuZEhAAcuQc+onNKReq7MV+ykFirg==
```
We are getting the following logs in /var/log/messages:
```
Jan 24 17:23:17 localhost kernel: libceph: resolve 'storagenode.storage.com'
(ret=-3): failed
Jan 24 17:23:17 localhost kernel: libceph: parse_ips bad ip '
storagenode.storage.com:6789'
```
We also tried mounting ceph storage using IP of MON which is working fine.
Query:
Could you please help us out with how we can mount ceph using FQDN.
My /etc/ceph/ceph.conf is as follows:
[global]
ms bind ipv6 = true
ms bind ipv4 = false
mon initial members = storagenode1,storagenode2,storagenode3
osd pool default crush rule = -1
fsid = 7969b8a3-1df7-4eae-8ccf-2e5794de87fe
mon host =
[v2:[abcd:abcd:abcd::21]:3300,v1:[abcd:abcd:abcd::21]:6789],[v2:[abcd:abcd:abcd::22]:3300,v1:[abcd:abcd:abcd::22]:6789],[v2:[abcd:abcd:abcd::23]:3300,v1:[abcd:abcd:abcd::23]:6789]
public network = abcd:abcd:abcd::/64
cluster network = eff0:eff0:eff0::/64
[osd]
osd memory target = 4294967296
[client.rgw.storagenode1.rgw0]
host = storagenode1
keyring = /var/lib/ceph/radosgw/ceph-rgw.storagenode1.rgw0/keyring
log file = /var/log/ceph/ceph-rgw-storagenode1.rgw0.log
rgw frontends = beast endpoint=[abcd:abcd:abcd::21]:8080
rgw thread pool size = 512
--
~ Lokendra
skype: lokendrarathour
We are finally going to upgrade our Ceph from Nautilus to Octopus, before looking at moving onward. We are still on Ubuntu 18.04, so once on Octopus, we will then upgrade the OS to 20.04, ready for the next upgrade.
Unfortunately, we have already upgraded our rados gateways to Ubuntu 20.04, last Sept, which had the side effect of upgrading the RGWs to Octopus. So I'm looking to downgrade the rados gateways, back to Nautilus, just to be safe. We can then do the upgrade in the right order.
I have no idea if the newer Octopus rados gateways will have altered any metadata, that would affect a downgrade back to Nautilus.
Any advise.
Hi everyone,
Our telemetry service is up and running again.
Thanks Adam Kraitman and Dan Mick for restoring the service.
We thank you for your patience and appreciate your contribution to the
project!
Thanks,
Yaarit
On Tue, Jan 3, 2023 at 3:14 PM Yaarit Hatuka <yhatuka(a)redhat.com> wrote:
> Hi everyone,
>
> We are having some infrastructure issues with our telemetry backend, and
> we are working on fixing it.
> Thanks Jan Horacek for opening this issue
> <https://tracker.ceph.com/issues/58371> [1]. We will update once the
> service is back up.
> We are sorry for any inconvenience you may be experiencing, and appreciate
> your patience.
>
> Thanks,
> Yaarit
>
> [1] https://tracker.ceph.com/issues/58371
>
Hi to all!
We are running a Ceph cluster (Octopus) on (99%) CentOS 7 (deployed at
the time with ceph-deploy) and we would like to upgrade it. As far as I
know for Pacific (and later releases) there aren't packages for CentOS 7
distribution (at least not on download.ceph.com), so we need to upgrade
(change) not only Ceph but also the distribution.
What is the raccomended path to do so?
We could upgrade (reinstall) all the nodes to Rocky 8 and then upgrade
Ceph to Quincy, but we will "stuck" with "not the latest" distribution
and probably we will have to upgrade (reinstall) again in the near future.
Our second idea is to leverage cephadm (which we would like to
implement) and switch from rpms to containers, but I don't have a clear
vision of how to do it. I was thinking to:
1. install a new monitor/manager with Rocky 9.
2. prepare the node for cephadm.
3. start the manager/monitor containers on that node.
4. repeat for the other monitors.
5. repeat for the OSD servers.
I'm not sure how to execute the point 2 and 3. The documentation says
how to bootstrap a NEW cluster and how to ADOPT an existing one, but our
situation is a hybrid (or in my mind it is...).
I cannot also adopt my current cluster to cephadm because we have 30% of
our OSD still on filestore. My intention was to drain them, reinstall
them and then adopt them. But I would like to avoid (if not necessary)
multiple reinstallations. In my mind all the OSD servers will be drained
before been reinstalled, just to be sure to have a "fresh" start).
Have you any ideas and/or advice to give us?
Thanks a lot!
Iztok
P.S. I saw that the script cephadm doesn't support Rocky, I can modify
it to do so and it should work, but is there a plan to officially
support it?
--
Iztok Gregori
ICT Systems and Services
Elettra - Sincrotrone Trieste S.C.p.A.
Telephone: +39 040 3758948
http://www.elettra.eu