If I understand the documentation for the placements in "ceph orch
apply" correctly, I can place the daemons by number or on specific
host. But what I want is:
"Start 3 mgr services, and one of it should be started on node ceph01."
How I can achieve this?
Thanks!
Hi all,
I'm a new ceph user and try to install my first cluster.
I try to install pacific but as result I get octopus.
What's wrong here?
I've done:
# curl --silent --remote-name --location
https://github.com/ceph/ceph/raw/pacific/src/cephadm/cephadm
# chmod +x cephadm
# ./cephadm add-repo --release pacific
# ./cephadm install
# cephadm install ceph-common
# ceph -v
ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus
(stable)
# cat /etc/apt/sources.list.d/ceph.list
deb https://download.ceph.com/debian-pacific/ focal main
??
Kind regards,
Jana
On Mon, Jun 21, 2021 at 8:13 PM opengers <zijian1012(a)gmail.com> wrote:
>
> Thanks for the answer, I still have some confusion when I see the explanation of "MDS_SLOW_REQUEST" from the document , as follows
> ------
> MDS_SLOW_REQUEST
>
> Message
> “N slow requests are blocked”
>
> Description
> One or more client requests have not been completed promptly, indicating that the MDS is either running very slowly, or that the RADOS cluster is not acknowledging journal writes promptly, or that there is a bug. Use the ops admin socket command to list outstanding metadata operations. This message appears if any client requests have taken longer than mds_op_complaint_time (default 30s).
>
> FROM: https://docs.ceph.com/en/latest/cephfs/health-messages/
> ------
>
> "or that the RADOS cluster is not acknowledging journal writes promptly", from this sentence, it seems that "MDS_SLOW_REQUEST" also contains OSD operations by the MDS?
Yes. If you have slow metadata IO warnings you will likely also have
slow request warnings.
--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
Hello,
As a follow-up of the thread "RBD migration between 2 EC pools : very slow".
I'm running Octopus 15.2.13.
RBD migration seems really fragile.
I started a migration to change the data pool (from an EC 3+2 to an EC 8+2) :
- rbd migration prepare
- rbd migration execute
=> 4% after 6h, and no progress 12h later
- rbd migration abort
=> does not return
After that, the state of the migration on destination image is "unknown'.
rbd info on source (in trash) and destination image shows migrating in the
features.
Debbugging another abord command shows that it tries to take/put a lock, and
there's one already, due to either the cacncelled execute or the subsequent
abort.
rbd lock delete does not work, I have a strange message about a read only
filesystem...
So, I'm stuck.
My production on the source image is stopped so far, because I use krbd and I
must terminate the migration (either commit or abort) before being able to map
and mount it.
What I think I'm just understanding, is that abort is meant to be used only
when the migration execution is finished, like commit ? Am I wrong ?
If so, How can we stop an ongoinfg migration, and also, How do we recover the
source image ?
Do I need to :
- delete the destination image
- restore the source image from the trash
What about the migrating feature that forbid krbd to map the image ?
Hello!
We are in the process of expanding our CEPH cluster (by both adding OSD
hosts and replacing smaller-sized HDDs on our existing hosts). So far we
have gone host by host, removing the old OSDs, swapping the physical
HDDs, and re-adding them. This process has gone smooth, aside from one
issue: upon any action taken on the cluster (adding new OSDs, replacing
old, etc), we have PGs get stuck "activating"which causes around 3.5% of
PGs go inactive, causing IO to stop.
Here is a current look at our ceph -s command:
cluster:
id: e8ffe2eb-f8fc-4110-a4bc-1715e878fb7b
health: HEALTH_WARN
Reduced data availability: 166 pgs inactive
Degraded data redundancy: 137153907/3658405707 objects
degraded (3.749%), 930 pgs degraded, 928 pgs undersized
10 pgs not deep-scrubbed in time
33709 slow ops, oldest one blocked for 35956 sec, daemons
[osd.103,osd.104,osd.105,osd.106,osd.107,osd.109,osd.111,osd.112,osd.113,osd.114]...
have slow ops.
services:
mon: 3 daemons, quorum lb3,lb2,lb1 (age 8w)
mgr: lb1(active, since 6w), standbys: lb3, lb2
osd: 117 osds: 117 up (since 15m), 117 in (since 10h); 2033
remapped pgs
rgw: 3 daemons active (lb1.rgw0, lb2.rgw0, lb3.rgw0)
task status:
data:
pools: 8 pools, 5793 pgs
objects: 609.74M objects, 169 TiB
usage: 308 TiB used, 430 TiB / 738 TiB avail
pgs: 2.866% pgs not active
137153907/3658405707 objects degraded (3.749%)
262215404/3658405707 objects misplaced (7.167%)
3754 active+clean
963 active+remapped+backfill_wait
892 active+undersized+degraded+remapped+backfill_wait
136 activating+remapped
27 activating+undersized+degraded+remapped
8 active+undersized+degraded+remapped+backfilling
6 active+clean+scrubbing+deep
3 activating+degraded+remapped
3 active+remapped+backfilling
1 active+undersized+remapped+backfill_wait
io:
client: 94 KiB/s rd, 94 op/s rd, 0 op/s wr
recovery: 112 MiB/s, 372 objects/s
progress:
Rebalancing after osd.20 marked in (10h)
[............................] (remaining: 11d)
Rebalancing after osd.41 marked in (10h)
[=...........................] (remaining: 8d)
Rebalancing after osd.30 marked in (10h)
[=...........................] (remaining: 9d)
Rebalancing after osd.1 marked in (10h)
[=======.....................] (remaining: 2h)
Rebalancing after osd.10 marked in (10h)
[............................] (remaining: 12d)
Rebalancing after osd.50 marked in (10h)
[............................] (remaining: 2w)
Rebalancing after osd.71 marked out (10h)
[==..........................] (remaining: 5d)
What you may find interesting is the "slow ops" warnings. This is where
our inactive PGs become stuck. Once the cluster gets into this state,
I'm able to recover IO usually by restarting the OSDs with slow ops.
However, what's extremely strange, is this workaround only works after
about 12 hours since the last OSD addition. Restarting the slow ops OSDs
before roughly 12 hours results in the slow ops returning immediately.
Our first thought was hardware issues, however we ruled this out after
the slow ops warnings appeared on brand new HDDs and OSD hosts.
Monitoring the IO saturation of the OSDs reporting slow ops shows actual
usage nowhere near saturation, and no hardware issues are present on the
drives themselves.
Looking at the journalctl logs of one of the affected OSDs above, we see
the following repeated multiple times:
osd.103 56934 get_health_metrics reporting 2 slow ops, oldest is
osd_op(client.467952.0:1520304537 8.6fbs0 8.1e6826fb (undecoded)
ondisk+retry+write+known_if_redirected e56923
So far my procedure for the disk swaps have been as follows:
1. Set noout,norebalance, and norecover on the cluster.
2. Use ceph-ansible to remove the old disk OSD IDs
3. Swap physical HDDs, re-add with ceph-ansible
4. Unset noout,norebalance,norecover
I should note this issue appears even with simple OSD additions (not
removals), as we added 2 brand new hosts to the cluster and saw the same
issue.
I've been trying to think of any possible cause of this issue, I should
mention our cluster is messy at the moment hardware-wise (we have a mix
of 7T HDDs, 4T HDDs, and 10T HDDs - moving to all 10T HDDs but the
process to swap has been taking a while). One warning I've noticed
during the old disk removals is a warning about too many PGs per OSD,
however this warning clears once the new OSDs are added, which is to be
expected I assume.
If anyone would be willing to provide any hints of where to look, it
would be much appreciated!
Thanks for your time.
--
Justin Goetz
Systems Engineer, TeraSwitch Inc.
jgoetz(a)teraswitch.com
412-945-7045 (NOC) | 412-459-7945 (Direct)
Sorry for the very naive question:
I know how to set/check the rgw quota for a user (using radosgw-admin)
But how can a radosgw user check what is the quota assigned to his/her
account , using the S3 and/or the swift interface ?
I don't get this information using "swift stat", and I can't find a s3cmd
quota related command ...
Thanks, Massimo
From what kernel / ceph version is krbd usage on a osd node problematic?
Currently I am running Nautilus 14.2.11 and el7 3.10 kernel without any issues.
I can remember using a cephfs mount without any issues as well, until some specific luminous update surprised me. So maybe nice to know when to expect this.
> -----Original Message-----
> Sent: Wednesday, 23 June 2021 11:25
> Subject: *****SPAM***** [ceph-users] Re: Can not mount rbd device
> anymore
>
> On Wed, Jun 23, 2021 at 9:59 AM Matthias Ferdinand
> wrote:
> >
> > On Tue, Jun 22, 2021 at 02:36:00PM +0200, Ml Ml wrote:
> > > Hello List,
> > >
> > > oversudden i can not mount a specific rbd device anymore:
> > >
> > > root@proxmox-backup:~# rbd map backup-proxmox/cluster5 -k
> > > /etc/ceph/ceph.client.admin.keyring
> > > /dev/rbd0
> > >
> > > root@proxmox-backup:~# mount /dev/rbd0 /mnt/backup-cluster5/
> > > (just never times out)
> >
> >
> > Hi,
> >
> > there used to be some kernel lock issues when the kernel rbd client
> > tried to access an OSD on the same machine. Not sure if these issues
> > still exist (but I would guess so) and if you use your proxmox cluster
> > in a hyperconverged manner (nodes providing VMs and storage service at
> > the same time) you may just have been lucky that it had worked before.
> >
> > Instead of the kernel client mount you can try to export the volume as
> > an NBD device (https://docs.ceph.com/en/latest/man/8/rbd-nbd/) and
> > mounting that. rbd-nbd runs in userspace and should not have that
> > locking problem.
>
> rbd-nbd is also susceptible to locking up in such setups, likely more
> so than krbd. Don't forget that it also has a kernel component and
> there are actually more opportunities for things to go sideways/lock up
> because there is an extra daemon involved allocating some additional
> memory for each I/O request.
>
> Thanks,
>
> Ilya
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
Hello Cephers,
On a capacitive Ceph cluster (13 nodes, 130 OSDs 8To HDD), I'm migrating a 40
To image from a 3+2 EC pool to a 8+2 one.
The use case is Veeam backup on XFS filesystems, mounted via KRBD.
Backups are running, and I can see 200MB/s Throughput.
But my migration (rbd migrate prepare / execute) is staling at 4% for 6h now.
When the backups are not running, I can see a little 20MB/s of throughput,
certainly my migration.
I need a month to migrate 40 to at that speed !
As I use a KRBD client, I cannot remap the rbd image straight after the rbd
prepare. So the filesystem is not usable until the migration is completed.
Not really usable for me...
Is anyone has a clue either to speed up the rbd migration, or another method
to move/copy an image between 2 pools, with the minimum downtime ?
I thought of rbd export-diff | rbd import-diff, while mounted, and another
unmapped before switching...
But, it forces me to rename my image, because if I use another data pool, the
metadata pool stays the same.
Can you see another method ?
--
Gilles
Hi all,
newbie question:
The documentation seems to suggest that with ceph-volume, one OSD is created for each HDD (cf. 4-HDD-example in
https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/)
This seems odd: what if a server has a finite number of disks? I was going to try cephfs on ~10 servers with 70 HDD each. That would make each system
having to deal with 70 OSDs, on 70 LVs?
Really no aggregation of the disks?
Regards,
Thomas
--
--------------------------------------------------------------------
Thomas Roth
Department: IT
GSI Helmholtzzentrum für Schwerionenforschung GmbH
www.gsi.de