Hi all,
My cephfs MDS is reporting damaged metadata following the addition (and
remapping) of 12 new OSDs.
`ceph tell mds.database-0 damage ls` reports ~85 files damaged. All of type
"backtrace" which is very concerning.
` ceph tell mds.database-0 scrub start / recursive repair` seems to have no
effect on the damage. What does this sort of damage mean? Is there anything
I can do to recover these files?
> ceph status reports:
cluster:
id: 692905c0-f271-4cd8-9e43-1c32ef8abd13
health: HEALTH_ERR
1 MDSs report damaged metadata
630 pgs not deep-scrubbed in time
630 pgs not scrubbed in time
services:
mon: 3 daemons, quorum database-0,file-server,webhost (age 37m)
mgr: webhost(active, since 3d), standbys: file-server, database-0
mds: cephfs:1 {0=database-0=up:active} 2 up:standby
osd: 48 osds: 48 up (since 56m), 48 in (since 13d); 10 remapped pgs
task status:
scrub status:
mds.database-0: idle
data:
pools: 7 pools, 633 pgs
objects: 60.82M objects, 231 TiB
usage: 336 TiB used, 246 TiB / 582 TiB avail
pgs: 623 active+clean
6 active+remapped+backfilling
4 active+remapped+backfill_wait
Thanks for the help.
Best,
Ricardo
I guess as a sort of follow up from my previous post. Our Nautilus (14.2.16 on ubuntu 18.04) cluster had some sort of event that caused many of the machines to have memory errors. The aftermath is that initially some OSDs had (and continue to have) this error https://tracker.ceph.com/issues/48827 others won't start for various reasons.
The OSDs that *will* start are badly behind the current epoch for the most part.
It sounds very similar to this:
https://blog.noc.grnet.gr/2016/10/18/surviving-a-ceph-cluster-outage-the-ha…
We are having trouble getting things back online.
I think the path forward is to:
-set noup/nodown/noout/nobackfill/and wait for the OSDs that run to come up; we were making good progress yesterday until some of the OSDs crashed with OOM errors. We are again moving forward but understandably nervous.
-export the PGs from questionable OSDs and and then rebuild the OSDs; import the PGs if necessary (very likely). Repeat until we are up.
Any suggestions for increasing speed? We are using noup/nobackfill/norebalance/pause but the epoch catchup is taking a very long time. Any tips for keeping the epoch from moving forward or speeding up the OSDs catching up? How can we estimate how long it should take?
Thank you for any ideas or assistance anyone can provide.
Will
Hi all,
I am running a cluster managed by orchestrator/cephadm. I installed new
host for OSDS yesterday, the osd daemons were automatically created
using drivegroups service specs
(https://docs.ceph.com/en/latest/cephadm/drivegroups/#drivegroups
<https://docs.ceph.com/en/latest/cephadm/drivegroups/#drivegroups>) and
they started with a 15.2.9 image, instead of 15.2.8 which all daemons of
the cluster are running.
I did not yet run ceph orch upgrade to 15.2.9.
Is there a way to lock the version of OSDS/daemons created by
orchestrator/cephadm?
Thanks!
Kenneth
Hi All,
My ceph-mgr keeps stopping (for some unknown reason) after about an hour
or so (but has run for up to 2-3 hours before stopping). Up till now
I've simple restarted it with 'ceph-mgr -i ceph01'.
Is this normal behaviour, or if it isn't, what should I be looking for
in the logs?
I was thinking of writing a quick cron script (with 'ceph-mgr -i
ceph01') to run on the hour every hour to restart it, but figured that
there had to be a better way - especially if ceph-mgr was crashing
instead of being a "feature". Any ideas/advice?
Thanks in advance
Dulux-Oz
--
Peregrine IT Signature
*Matthew J BLACK*
M.Inf.Tech.(Data Comms)
MBA
B.Sc.
MACS (Snr), CP, IP3P
When you want it done /right/ ‒ the first time!
Phone: +61 4 0411 0089
Email: matthew(a)peregrineit.net <mailto:matthew@peregrineit.net>
Web: www.peregrineit.net <http://www.peregrineit.net>
View Matthew J BLACK's profile on LinkedIn
<http://au.linkedin.com/in/mjblack>
This Email is intended only for the addressee. Its use is limited to
that intended by the author at the time and it is not to be distributed
without the author’s consent. You must not use or disclose the contents
of this Email, or add the sender’s Email address to any database, list
or mailing list unless you are expressly authorised to do so. Unless
otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the
contents of this Email except where subsequently confirmed in
writing. The opinions expressed in this Email are those of the author
and do not necessarily represent the views of Peregrine I.T. Pty
Ltd. This Email is confidential and may be subject to a claim of legal
privilege.
If you have received this Email in error, please notify the author and
delete this message immediately.
Hi Everyone,
Let me apologise upfront:
If this isn't the correct List to post to
If this has been answered already (& I've missed it in my searching)
If this has ended up double posted
If I've in any way given (or about to give) offence to anyone
I really need some help.
I'm trying to get a simple single host Pilot/Test Cluster up and running. I'm using CentOS 8 (fully updated), and Ceph-Octopus (latest version from the Ceph Repo). I have both ceph-mon and ceph-mgr working/running (although ceph-mge keeps stopping/crashing after about 1-3 hours or so - but that's another issue), and my first osd (and only osd at this point) *appears* to be working, but when I issue the command 'systemctl start ceph-osd@0' the ceph-osd daemon won't spin up and thus when I issue 'ceph -s' the result says the 'osd: 1 osds: 0 up, 0 in'.
I've gone through the relevant logs but I can't seem to find the issue.
I'm doing this as a Manual Install because I want to actually *learn* what's going on during the install/etc. I know I can use cephadmin (in a production environment), but as I said, I'm trying to learn how everything "fits together".
I've read and re-read the official Ceph Documentation and followed the following steps/commands to get Ceph installed and running:
Ran the following commands:
su -
useradd -d /home/ceph -m ceph -p <password>
mkdir /home/ceph/.ssh
Added a public SSH Key to /home/ceph/.ssh/authorized_keys.
Ran the following commands:
chmod 600 /home/ceph/.ssh/*
chown ceph:ceph -R /home/ceph/.ssh
Added the ceph.repo details to /etc/yum.repos.d/ceph.repo (as per the Ceph Documentation).
Ran the following command:
dnf -y install qemu-kvm qemu-guest-agent libvirt gdisk ceph
Created the /etc/ceph/ceph.conf file (see listing below).
Ran the following commands:
ceph-authtool --create-keyring /etc/ceph/ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring --gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *' --cap mgr 'allow *'
ceph-authtool --create-keyring /var/lib/ceph/bootstrap-osd/keyring --gen-key -n client.bootstrap-osd --cap mon 'profile bootstrap-osd' --cap mgr 'allow r'
ceph-authtool /etc/ceph/ceph.mon.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring
ceph-authtool /etc/ceph/ceph.mon.keyring --import-keyring /var/lib/ceph/bootstrap-osd/keyring
chown -R ceph:ceph /etc/ceph/
chown -R ceph:ceph /var/lib/ceph/
monmaptool --create --add ceph01 192.168.0.10 --fsid 98e84f97-031f-4958-bd54-22305f6bc738 /etc/ceph/monmap
mkdir /var/lib/ceph/mon/ceph-ceph01
chown -R ceph:ceph /var/lib/ceph
sudo -u ceph ceph-mon --mkfs -i ceph01 --monmap /etc/ceph/monmap --keyring /etc/ceph/ceph.mon.keyring
firewall-cmd --add-service=http --permanent
firewall-cmd --add-service=ceph --permanent
firewall-cmd --add-service=ceph-mon --permanent
firewall-cmd --reload
chmod -R 750 /var/lib/ceph/
systemctl start ceph-mon@ceph01
ceph mon enable-msgr2
mkdir /var/lib/ceph/mgr/ceph-ceph01
chown ceph:ceph /var/lib/ceph/mgr/ceph-ceph01
ceph auth get-or-create mgr.ceph01 mon 'allow profile mgr' mds 'allow *' osd 'allow *' -o /var/lib/ceph/mgr/ceph-ceph01/keyring
ceph-mgr -i ceph01
Fdisked 3 hdds (sdb, sdc, sdd) as GPT partitions.
Ran the following commands:
mkfs.xfs /dev/sdb1
mkfs.xfs /dev/sdc1
mkfs.xfs /dev/sdd1
mkdir -p /var/lib/ceph/osd/ceph-{0,1,2}
chown -R ceph:ceph /var/lib/ceph/osd
mount /dev/sdb1 /var/lib/ceph/osd/ceph-0
mount /dev/sdc1 /var/lib/ceph/osd/ceph-1
mount /dev/sdd1 /var/lib/ceph/osd/ceph-2
So, at this point everything is working, although 'ceph -s' does give a Health Warning about not having the required number of osds (as per the /etc/ceph/ceph.conf file).
Here is what I did to create and (fail to) run my first osd (osd.0):
Ran the following commands:
sudo -u ceph ceph osd new $(uuidgen)
sudo -u ceph ceph auth get-or-create osd.0 osd 'allow *' mon 'allow profile osd' mgr 'allow profile osd' -o /var/lib/ceph/osd/ceph-0/keyring
sudo -u ceph ceph-osd -i 0 --mkfs
ceph osd crush add 0 2 host=ceph01
systemctl start ceph-osd@0
The osd shows up when I issue the command 'ceph osd ls'.
The key shows up when I issue the command 'ceph auth ls'.
But as I said above, when I issue the command 'ceph -s' it shows 'osd: 1 osds: 0 up, 0 in'.
And when I look at the systemctl status for ceph-osd@0 it simply said it failed with 'exit code'.
The /etc/ceph/ceph.conf listing:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
fsid = 98e84f97-031f-4958-bd54-22305f6bc738
mon_host = ceph01
public_network = 192.168.0.0/24
[mgr]
mgr_initial_modules = dashboard alerts balancer restful status
[mgr.ceph01]
log_file = /var/log/ceph/ceph-mgr.ceph01.log
[mon]
mon_initial_members = ceph01
mon_data_size_warn = 8589934592
mon_allow_pool_delete = true
[mon.ceph01]
host = ceph01
mon_addr = 192.168.0.10
log_file = /var/log/ceph/ceph-mon.ceph01.log
[osd]
allow_ec_overwrites = true
osd_crush_chooseleaf_type = 1
osd_journal_size = 10240
osd_pool_default_min_size = 2
osd_pool_default_pg_num = 128
osd_pool_default_pgp_num = 128
osd_pool_default_size = 3
osd_scrub_auto_repair = true
osd_scrub_begin_hour = 3
osd_scrub_end_hour = 11
pg_autoscale_mode = on
[osd.0]
host = ceph01
log_file = /var/log/ceph/ceph-osd.0.log
[osd.1]
host = ceph01
log_file = /var/log/ceph/ceph-osd.1.log
[osd.2]
host = ceph01
log_file = /var/log/ceph/ceph-osd.2.log
So, could someone please point out to me where I'm going wrong - I know its got to be something super-simple, but this has been driving me mad for over a week now.
Thanks in advance
Dulux-Oz
osdc/Journaler is for RDB, client/Journaler is for CephFS.
On Thu, Feb 25, 2021 at 8:26 AM 조규진 <bori19960(a)snu.ac.kr> wrote:
>
> Hi, John.
>
> Thanks for your kind reply!
>
> While i'm checking the code that you recommend to check and other .cc files about journal, I find that there is two Journaler class.
> One is at "src/osdc/Journaler.h" and the other one is at "src/journal/Journaler.h".
> If you don't mind, could you tell me which one is for MDS journal? and the differences between them?
>
> Thanks.
> kyujin
>
> 2021년 2월 25일 (목) 오전 1:15, John Spray <jcspray(a)gmail.com>님이 작성:
>>
>> On Wed, Feb 24, 2021 at 9:10 AM 조규진 <bori19960(a)snu.ac.kr> wrote:
>> >
>> > Hi.
>> >
>> > I'm a newbie in CephFS and I have some questions about how per-MDS journals
>> > work.
>> > In Sage's paper (osdi '06), I read that each MDSs has its own journal and
>> > it lazily flushes metadata modifications on OSD cluster.
>> > What I'm wondering is that some directory operations like rename work with
>> > multiple metadata and It may work on two or more MDSs and their journals,
>> > so I think it needs some mechanisms to construct a transaction that works
>> > on multiple journals like some distributed transaction mechanisms.
>> >
>> > Could anybody explains how per-MDS journals work in such directory
>> > operations? or recommends some references about it?
>>
>> Your intuition is correct: these transactions span multiple MDS journals.
>>
>> The code for this stuff is somewhat long, in src/mds/Server.cc, but
>> here are a couple of pointers if you're interested in untangling it:
>> - Server::handle_client_rename is the entry point
>> - The MDS which handles the client request sends MMDSPeerRequest
>> messages to peers in rename_prepare_witness, and waits for
>> acknowledgements before writing EUpdate events to its journal
>> - The peer(s) write EPeerUpdate(OP_PREPARE) events to their journals
>> during prepare, and EPeerUpdate(OP_COMMIT) after the first MDS has
>> completed.
>>
>> John
>>
>>
>>
>> >
>> > Thanks.
>> > kyujin.
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users(a)ceph.io
>> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
Hello everyone!
I'm trying to calculate the theoretical usable storage of a ceph cluster with erasure coded pools.
I have 8 nodes and the profile for all data pools will be k=6 m=2.
If every node has 6 x 1TB wouldn't the calculation be like this:
RAW capacity: 8Nodes x 6Disks x 1TB = 48TB
Loss to m=2: 48TB / 8Nodes x 2m = 12TB
EC capacity: 48TB - 12TB = 36TB
At the moment I have one cluster with 8 nodes and different disks than the sample (but every node has the same amount of disks and the same sized disks).
The output of ceph df detail is:
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 109 TiB 103 TiB 5.8 TiB 5.9 TiB 5.41
TOTAL 109 TiB 103 TiB 5.8 TiB 5.9 TiB 5.41
--- POOLS ---
POOL ID PGS STORED OBJECTS %USED MAX AVAIL
device_health_metrics 1 1 51 MiB 48 0 30 TiB
rep_data_fs 2 32 14 KiB 3.41k 0 30 TiB
rep_meta_fs 3 32 227 MiB 1.72k 0 30 TiB
ec_bkp1 4 32 4.2 TiB 1.10M 6.11 67 TiB
So ec_bkp1 uses 4.2TiB an there are 67TiB free usable Storage.
This means total EC usable storage would be 71.2TiB.
But calculating with the 109TiB RAW storage, shouldn't it be 81.75?
Are the 10TiB just some overhead (that would be much overhead) or is the calculation not correct?
And what If I want to expand the cluster in the first sample above by three nodes with 6 x 2TB, which means not the same sized disks as the others.
Will the calculation with the same EC profile still be the same?
RAW capacity: 8Nodes x 6Disks x 1TB + 3Nodes x 6Disks x 2TB = 84TB
Loss to m=2: 84TB / 11Nodes x 2m = 15.27TB
EC capacity: 84TB - 15.27TB = 68.72TB
Thanks in advance,
Simon
Hi
we've been running our Ceph cluster for nearly 2 years now (Nautilus)
and recently, due to a temporary situation the cluster is at 80% full.
We are only using CephFS on the cluster.
Normally, I realize we should be adding OSD nodes, but this is a
temporary situation, and I expect the cluster to go to <60% full quite soon.
Anyway, we are noticing some really problematic slowdowns. There are
some things that could be related but we are unsure...
- Our 2 MDS nodes (1 active, 1 standby) are configured with 128GB RAM,
but are not using more than 2GB, this looks either very inefficient, or
wrong ;-)
"ceph config dump |grep mds":
mds basic mds_cache_memory_limit
107374182400
mds advanced mds_max_scrub_ops_in_progress 10
Perhaps we require more or different settings to properly use the MDS
memory?
- On all our OSD nodes, the memory line is red in "atop", though no swap
is in use, it seems the memory on the OSD nodes is taking quite a
beating, is this normal, or can we tweak settings to make it less stressed?
This is the first time we are having performance issues like this, I
think, I'd like to learn some commands to help me analyse this...
I hope this will ring a bell with someone...
Cheers
/Simon
Hello.
I'm trying to list the number of buckets that users have for monitoring
purposes, but I need to list and count the number of buckets per user. Is
it possible to get this information somewhere else?
Thanks, Marcelo