This may be too broad of a topic, or opening a can of worms, but we are
running a CEPH environment and I was wondering if there's any guidance
about this question:
Given that some group would like to store 50-100 TBs of data on CEPH and
use it from a linux environment, are there any advantages or
disadvantages in terms of performance/ease of use/learning curve to
using cephfs vs using a block device thru rbd vs using object storage
thru rgw? Here are my general thoughts:
cephfs - Until recently, you were not allowed to have multiple
filesystems. Not sure about performance.
rbd - Can only be mounted on one system at a time, but I guess that
filesystem could then be served using NFS.
rgw - A different usage model from regular linux file/directory
structure. Are there advantages to forcing people to use this interface?
I'm tempted to set up 3 separate areas and try them and compare the
results, but I'm wondering if somebody has done some similar experiment
in the past.
Thanks for any help you can provide!
Jorge
Hi,
I'm currently exploring Pacific and the "_admin" label doesn't seem to
work as expected.
pacific1:~ # ceph -v
ceph version 16.2.3-26-g422932e923
(422932e923eb429b9e16c352a663968f4b6f0a52) pacific (stable)
According to the docs [1] the "_admin" label should instruct cephadm
to distribute a ceph.conf and the admin keyring to the labeled host:
> _admin: Distribute client.admin and ceph.conf to this host.
>
> By default, an _admin label is applied to the first host in the
> cluster (where bootstrap was originally run), and the client.admin
> key is set to be distributed to that host via the ceph orch
> client-keyring ... function. Adding this label to additional hosts
> will normally cause cephadm to deploy config and keyring files in
> /etc/ceph.
I can not confirm this, neither has the bootstrap node (pacific1) the
"_admin" label (but he has the ceph.conf and admin keyring) nor does
pacific2 have the admin keyring. Here's my host ls:
pacific1:~ # ceph orch host ls
HOST ADDR LABELS STATUS
pacific1 192.168.124.183 mon
pacific2 192.168.124.49 _admin mon
pacific5 192.168.124.63 mon
pacific1:~ # ssh pacific2 find /etc/ceph
/etc/ceph
/etc/ceph/rbdmap
pacific1:~ # find /etc/ceph/
/etc/ceph/
/etc/ceph/ceph.pub
/etc/ceph/ceph.conf
/etc/ceph/rbdmap
/etc/ceph/ceph.client.admin.keyring
I searched a bit on tracker.ceph.com but couldn't find an existing
issue. Is this worth a bug report?
Regards,
Eugen
[1]
https://docs.ceph.com/en/latest/cephadm/host-management/#special-host-labels
Hi guys,
I'm testing Ceph Pacific 16.2.4 in my lab before deciding if I will put
it in production on a 1PB+ storage cluster with rgw-only access.
I noticed a weird issue with my mons :
- if I reboot a mon host, the ceph-mon container is not starting after
reboot
- I can see with 'ceph orch ps' the following output :
mon.node01 node01 running (20h) 4m ago
20h 16.2.4 8d91d370c2b8 0a2e86af94b2
mon.node02 node02 running (115m) 12s ago
115m 16.2.4 8d91d370c2b8 51f4885a1b06
mon.node03 node03 stopped 4m ago
19h <unknown> <unknown> <unknown>
(where node03 is the host which was rebooted).
- I tried to start the mon container manually on node03 with '/bin/bash
/var/lib/ceph/c2d41ac4-baf5-11eb-865d-2dc838a337a3/mon.node03/unit.run'
and I've got the following output :
debug 2021-05-23T08:24:25.192+0000 7f9a9e358700 0
mon.node03(a)-1(???).osd e408 crush map has features 3314933069573799936,
adjusting msgr requires
debug 2021-05-23T08:24:25.192+0000 7f9a9e358700 0
mon.node03(a)-1(???).osd e408 crush map has features 432629308056666112,
adjusting msgr requires
debug 2021-05-23T08:24:25.192+0000 7f9a9e358700 0
mon.node03(a)-1(???).osd e408 crush map has features 432629308056666112,
adjusting msgr requires
debug 2021-05-23T08:24:25.192+0000 7f9a9e358700 0
mon.node03(a)-1(???).osd e408 crush map has features 432629308056666112,
adjusting msgr requires
cluster 2021-05-23T08:07:12.189243+0000 mgr.node01.ksitls (mgr.14164)
36380 : cluster [DBG] pgmap v36392: 417 pgs: 417 active+clean; 33 KiB
data, 605 MiB used, 651 GiB / 652 GiB avail; 9.6 KiB/s rd, 0 B/s wr, 15 op/s
debug 2021-05-23T08:24:25.196+0000 7f9a9e358700 1
mon.node03(a)-1(???).paxosservice(auth 1..51) refresh upgraded, format 0 -> 3
debug 2021-05-23T08:24:25.208+0000 7f9a88176700 1 heartbeat_map
reset_timeout 'Monitor::cpu_tp thread 0x7f9a88176700' had timed out
after 0.000000000s
debug 2021-05-23T08:24:25.208+0000 7f9a9e358700 0
mon.node03@-1(probing) e5 my rank is now 1 (was -1)
debug 2021-05-23T08:24:25.212+0000 7f9a87975700 0 mon.node03@1(probing)
e6 removed from monmap, suicide.
root@node03:/home/adrian# systemctl status
ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3(a)mon.node03.service
● ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3(a)mon.node03.service - Ceph
mon.node03 for c2d41ac4-baf5-11eb-865d-2dc838a337a3
Loaded: loaded
(/etc/systemd/system/ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3@.service;
enabled; vendor preset: enabled)
Active: inactive (dead) since Sun 2021-05-23 08:10:00 UTC; 16min ago
Process: 1176 ExecStart=/bin/bash
/var/lib/ceph/c2d41ac4-baf5-11eb-865d-2dc838a337a3/mon.node03/unit.run
(code=exited, status=0/SUCCESS)
Process: 1855 ExecStop=/usr/bin/docker stop
ceph-c2d41ac4-baf5-11eb-865d-2dc838a337a3-mon.node03 (code=exited,
status=1/FAILURE)
Process: 1861 ExecStopPost=/bin/bash
/var/lib/ceph/c2d41ac4-baf5-11eb-865d-2dc838a337a3/mon.node03/unit.poststop
(code=exited, status=0/SUCCESS)
Main PID: 1176 (code=exited, status=0/SUCCESS)
The only fix I could find was to redeploy the mon with :
ceph orch daemon rm mon.node03 --force
ceph orch daemon add mon node03
However, even if it's working after redeploy, it's not giving me a lot
of trust to use it in a production environment having an issue like
that. I could reproduce it with 2 different mons so it's not just an
exception.
My setup is based on Ubuntu 20.04 and docker instead of podman :
root@node01:~# docker -v
Docker version 20.10.6, build 370c289
Do you know a workaround for this issue or is this a known bug ? I
noticed that there are some other complaints with the same behaviour in
Octopus as well and the solution at that time was to delete the
/var/lib/ceph/mon folder .
Thanks.
Hello,
I am running a nautilus cluster with 5 OSD nodes/90 disks that is
exclusively used for S3. My disks are identical, but utilization
ranges from 9% to 82%, and I am starting to get backfill_toofull
errors even though I have only used 150TB out of 650TB of data.
- Other than manually crush reweighting OSDs, is there any other
option for me ?
- what would cause this uneven distribution? Is there some
documentation on how to track down what's going on?
output of 'ceph osd df" is at https://pastebin.com/17HWFR12
Thank you!
Hello, Ceph users,
what is the difference between "rbd cp" and "rbd deep cp"?
What I need to do is to make a copy of the rbd volume one of our users
inadveredly resized to a too big size, shrink the copied image to the
expected size, verify that everything is OK, and then delete the original
image. Would this work with rbd cp?
Thanks,
-Yenya
--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
We all agree on the necessity of compromise. We just can't agree on
when it's necessary to compromise. --Larry Wall
I found a warning in Red Hat documentation regarding OSD and RBD:
Ceph Block Devices must be deployed on separate nodes from the Ceph Monitor and OSD nodes. Running kernel clients and kernel server daemons on the same node can lead to kernel deadlocks. [1]
It was hard to tell from the documentation if this is still true as some Rook documentation [2] mentioned a solution. Is having an RBD on the same node as the OSD's still a no-no or is it OK with the conditions listed in the Rook docs (i.e. patched kernel + no XFS)? TIA!
--chuck
[1] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/i…
[2] https://rook.io/docs/rook/v1.6/ceph-common-issues.html#a-worker-node-using-…
A DocuBetter meeting is scheduled for later this week at 11AM AEST
Thursday, which is 6PM PDT Wednesday and 9PM EDT Wednesday (and 1AM UTC).
It's a pretty inconveniently scheduled meeting. Last month I sent an email
reporting that this monthly meeting, on account of its inconvenient time,
is not much attended. That email provoked good, helpful correspondence, so
I'm doing it again. Unless I get responses to this email thread that
request that I hold the meeting this week, I'm not going to hold it.
This email is a sincere request for documentation complaints. If anything
about the documentation irritates you, now's the time to tell me. If
anything about the documentation is incomplete or incorrect, I'm the guy
you should tell. You don't have to attend the DocuBetter Meetings to get
changes into the documentation, you just have to ask me and explain clearly
what needs to be changed.
The documentation initiatives that were underway last month remain underway
sedulously, if not apace. I am cleaning the syntax of the entire
documentation suite (a multi-month project that is as tedious as it
sounds... I am currently in the middle of the cephadm documentation). I am
also planning to consolidate the Intro Guide and part of the Developer
Guide, because there is some overlap in those sections. I've also begun
indexing the Ceph-related videos on Youtube. This means that I watch the
videos with a pencil in my hand, and I write down short summaries of what's
being discussed along with the number of minutes and seconds that have
passed since the video began. Over the past two weeks I have found myself
with command of the context and understanding necessary to appreciate
Sage's one-hour-and-twenty-seven-minute marathon "Intro to Ceph" talk from
June 2019 (https://www.youtube.com/watch?v=PmLPbrf-x9g). I'm not sure that
a video six minutes longer than "Toy Story" can be considered an intro, but
it is an impressive rhetorical performance by Sage, who speaks
authoritatively and concisely on every subject he touches as he provides a
tour of all the major components of Ceph.
Anyway: If someone responds to this email and says that their lives would
be improved if I hold the DocuBetter meeting, then I will hold the meeting.
However, if by twelve hours before the time of the meeting no one has
responded, there will be no DocuBetter meeting. Remember: even if no one
wants to have the meeting this week, this does not mean that you can't get
changes into the docs. Write to me anytime with your complaints. No
complaint is too crude, no irritation will be summarily dismissed.
That's it.
Here are the links relevant to DocuBetter Meetings:
Meeting: https://bluejeans.com/908675367
Etherpad: https://pad.ceph.com/p/Ceph_Documentation
Hi,
I've successfully updated my luminous lab environment to nautilus so next week I'll give a try to the prod env but 2 things same up in the description:
1. Upmap: I've never used this before don't know how I couldn't see it because it is quite cool feature. So ceph says let it in upmap, my question is like with the ceph-pg autoscaler module not the best thing to use in 'on' mode, better in warn, so is it good to use in upmap mode let's say in ceph world "safe"?
2. Regarding this assimilate-conf. It's not clear for me what is it actually? Doc says:
"This is also a good time to fully transition any config options in ceph.conf into the cluster's configuration database. On each host, you can use the following command to import any option into the monitors with ceph config assimilate-conf -i /etc/ceph/ceph.conf".
What does this mean? If I have different configs on different servers and I run it on the servers it will merge it together all the configs? Or what I can do with this feature?
Thank you
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.