Hi,
which ceph release are you using? You mention ceph-disk so your OSDs
are not LVM based, I assume?
I've seen these messages a lot when testing in my virtual lab
environment although I don't believe it's the cluster's fsid but the
OSD's fsid that's in the error message (the OSDs have their own ID,
too, take a look in /var/lib/ceph/osd/ceph-<ID>/fsid). When I did
several re-installs of the whole cluster I had to make sure to
properly wipe the disks but sometimes only a reboot did the trick. Of
course, this is not an option in your situation.
If your OSDs are systemd units check for orphaned units that need to
be to disabled before restarting the correct ones. Did you re-deploy
some of those disks?
Regards,
Eugen
Zitat von Seth Duncan <Seth.Duncan2(a)bd.com>:
> I had 5 of 10 osds fail on one of my nodes, after reboot the other 5
> osds failed to start.
>
> I have tried running ceph-disk activate-all and get back and error
> message about the cluster fsid not matching in /etc/ceph/ceph.conf
>
> Has anyone experienced an issue such as this?
>
>
>
> *******************************************************************
> IMPORTANT MESSAGE FOR RECIPIENTS IN THE U.S.A.:
> This message may constitute an advertisement of a BD group's
> products or services or a solicitation of interest in them. If this
> is such a message and you would like to opt out of receiving future
> advertisements or solicitations from this BD group, please forward
> this e-mail to optoutbygroup(a)bd.com. [BD.v1.0]
> *******************************************************************
> This message (which includes any attachments) is intended only for
> the designated recipient(s). It may contain confidential or
> proprietary information and may be subject to the attorney-client
> privilege or other confidentiality protections. If you are not a
> designated recipient, you may not review, use, copy or distribute
> this message. If you received this in error, please notify the
> sender by reply e-mail and delete this message. Thank you.
> *******************************************************************
> Corporate Headquarters Mailing Address: BD (Becton, Dickinson and
> Company) 1 Becton Drive Franklin Lakes, NJ 07417 U.S.A.
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
Hello all,
can the NFS ganesha rados recovery for multi headed active/active setup
work with NFS 3 or it requires NFS 4/4.1 specifics ?
Thanks for any help /Maged
The systemd autofs will mount cephfs successfully, with both kernel and
fuse clients.
Marc Roos <M.Roos(a)f1-outsourcing.eu> 于2020年6月15日周一 下午6:44写道:
>
>
> Thanks for these I was missing the x-systemd. entries. I assume these
> are necessary so booting does not 'hang' on trying to mount these? I
> thought the _netdev was for this and sufficient?
>
>
>
>
>
> -----Original Message-----
> To: Derrick Lin
> Cc: ceph-users
> Subject: [ceph-users] Re: mount cephfs with autofs
>
> Hi,
>
> With CentOS 7.8 you can use the systemd autofs options in /etc/fstab.
> Here are two examples from our clusters, first with fuse and second
> with kernel:
>
> none /cephfs fuse.ceph
> ceph.id=admin,ceph.conf=/etc/ceph/dwight.conf,ceph.client_mountpoint=/,x
> -systemd.device-timeout=30,x-systemd.mount-timeout=30,noatime,_netdev,no
> auto,x-systemd.automount,x-systemd.idle-timeout=30,ro
> 0 2
>
> cephflax.cern.ch:6789:/ /cephfs2 ceph
> name=admin,secretfile=/etc/ceph/flax.admin.secret,x-systemd.device-timeo
> ut=30,x-systemd.mount-timeout=30,noatime,_netdev,noauto,x-systemd.automo
> unt,x-systemd.idle-timeout=30,ro
> 0 2
>
> Cheers, Dan
>
> On Mon, Jun 15, 2020 at 9:27 AM Derrick Lin <klin938(a)gmail.com> wrote:
> >
> > Hi guys,
> >
> > I can mount my cephfs via mount command and access it without any
> problem.
> >
> > Now I want to integrate it in autofs which is used on our cluster.
> >
> > It seems this is not a popular approach and I found only this link:
> >
> > https://drupal.star.bnl.gov/STAR/blog/mpoat/how-mount-cephfs
> >
> > I followed the link but could not get it to work. I am wondering if
> this is
> > possible at all?
> >
> > We are using CentOS 7.8 and the ceph cluster is running nautilus
> 14.2.9
> >
> > Regards,
> > Derrick
> > _______________________________________________
> > ceph-users mailing list -- ceph-users(a)ceph.io
> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
Yes, it's faster but I'd like to continue managing the cluster with
Ansible, is that possible?
On Mon, Jun 15, 2020 at 12:02 PM Marc Roos <M.Roos(a)f1-outsourcing.eu> wrote:
>
> Just do manual install that is faster.
>
>
>
> -----Original Message-----
> To: ceph-users
> Subject: [ceph-users] Fwd: Re-run ansible to add monitor and RGWs
>
> Any ideas on this?
>
> ---------- Forwarded message ---------
> From: Khodayar Doustar <doustar(a)rayanexon.ir>
> Date: Sun, Jun 14, 2020 at 6:07 PM
> Subject: Re-run ansible to add monitor and RGWs
> To: ceph-users <ceph-users(a)ceph.io>
>
>
> Hi,
>
>
> I've installed my ceph cluster with ceph-ansible a few months ago. I've
> just added one monitor and one rgw at that time.
>
> So I have 3 nodes, from which one is monitor and rgw and two others only
> OSD.
>
> Now I want to add the other two nodes as monitor and rgw.
>
> Can I just modify the ansible host file and re-run the site.yml?
>
> I've done some modification in Storage classes, I've added some OSD and
> uploaded a lot of data up to now. Is it safe to re-run ansible site.yml
> playbook?
>
> I don't want to end with a fresh new cluster! :D
>
>
> Thanks a lot,
>
> Khodayar
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
> email to ceph-users-leave(a)ceph.io
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
Dear people on this mailing list,
I've got the "problem" that our MAX AVAIL value increases by about
5-10 TB when I reboot a whole OSD node. After the reboot the value
goes back to normal.
I would love to know WHY.
Under normal circumstances I would ignore this behavior, but because I
am very new to the whole ceph software I would like to know why stuff
like this happens.
What I read is, that this value will be calculated by the most filled OSD.
I've set noout and norebalance while the node is offline and I unset
both values after the reboot.
We are currently on nautilus.
Cheers and thanks in advance
Boris
Hi, I am seeing an issue on one of our older ceph clusters (mimic 13.2.1) in an erasure coded pool on bluestore OSDs in which we are seeing 1 inconsistent pg and 1 scrub error. It should be noted that we have an ongoing rebalance of misplaced data that predates this issue which came from flapping OSDs due to OSD_NEARFULL OSD_TOOFULL warnings/errors we corrected by removing some user data from ceph's rgw/s3 api interface (users "s3 objects" where deleted via the s3 api).
If anyone has any suggestions or guidance for dealing with this it would be very much appreciated. I've included all the relevant / helpful information I can think of below, if there is any additional information that you think would be helpful to me or you in providing suggestions please let me know.
$ sudo ceph -s
cluster:
id: 6fa7ec72-79fb-4f45-8b9f-ea5cdc7ab18d
health: HEALTH_ERR
248317/437145405 objects misplaced (0.057%)
1 scrub errors
Possible data damage: 1 pg inconsistent
services:
mon: 3 daemons, quorum HW-CEPHM-AT01,HW-CEPHM-AT02,HW-CEPHM-AT03
mgr: HW-CEPHM-AT02(active)
osd: 109 osds: 107 up, 106 in; 2 remapped pgs
rgw: 3 daemons active
data:
pools: 10 pools, 1380 pgs
objects: 54.70 M objects, 68 TiB
usage: 116 TiB used, 169 TiB / 285 TiB avail
pgs: 248317/437145405 objects misplaced (0.057%)
1374 active+clean
3 active+clean+scrubbing+deep
2 active+remapped+backfilling
1 active+clean+inconsistent
io:
client: 28 KiB/s rd, 306 KiB/s wr, 26 op/s rd, 30 op/s wr
recovery: 6.2 MiB/s, 4 objects/s
$ sudo ceph health detail
HEALTH_ERR 247241/437143405 objects misplaced (0.057%); 1 scrub errors; Possible data damage: 1 pg inconsistent
OBJECT_MISPLACED 247241/437143405 objects misplaced (0.057%)
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 7.1 is active+clean+inconsistent, acting [2,57,51,15,20,28,9,39]
Examination of osd logs shows the error is in osd.2
zgrep -Hn 'ERR' ceph-osd.2.log-20200614.gz
ceph-osd.2.log-20200614.gz:1292:2020-06-14 03:31:06.572 7f94591a9700 -1 log_channel(cluster) log [ERR] : 7.1s0 deep-scrub stat mismatch, got 213029/213030 objects, 0/0 clones, 213029/213030 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 292308615921/292308670959 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
ceph-osd.2.log-20200614.gz:1293:2020-06-14 03:31:06.572 7f94591a9700 -1 log_channel(cluster) log [ERR] : 7.1 deep-scrub 1 errors
All other OSDs appear to be clean of errors
The pg in question (7.1) has been instructed to repair/scrub/deep-scrub but I do not see any indication in it's logs that it has done a scrub or repair (it does log a deep-scrub which comes back OK) and listing inconsistent objects seems to indicate no issues
$ sudo rados list-inconsistent-pg default.rgw.buckets.data
["7.1"]
$ sudo ceph pg repair 7.1
instructing pg 7.1s0 on osd.2 to repair
$ sudo ceph pg scrub 7.1
instructing pg 7.1s0 on osd.2 to scrub
$ sudo ceph pg deep-scrub 7.1
instructing pg 7.1s0 on osd.2 to deep-scrub
grep -HnEi 'scrub|repair|deep-scrub' ceph-osd.2.log
ceph-osd.2.log:118:2020-06-14 07:28:10.139 7f94599aa700 0 log_channel(cluster) log [DBG] : 7.91 deep-scrub starts
ceph-osd.2.log:177:2020-06-14 08:39:11.404 7f94599aa700 0 log_channel(cluster) log [DBG] : 7.91 deep-scrub ok
ceph-osd.2.log:322:2020-06-14 12:17:31.405 7f94579a6700 0 log_channel(cluster) log [DBG] : 13.135 deep-scrub starts
ceph-osd.2.log:323:2020-06-14 12:17:32.744 7f94579a6700 0 log_channel(cluster) log [DBG] : 13.135 deep-scrub ok
ceph-osd.2.log:387:2020-06-14 13:40:35.941 7f94591a9700 0 log_channel(cluster) log [DBG] : 7.d8 deep-scrub starts
ceph-osd.2.log:441:2020-06-14 14:49:06.111 7f94591a9700 0 log_channel(cluster) log [DBG] : 7.d8 deep-scrub ok
Only the last deep-scrub was manually triggered
$ sudo rados list-inconsistent-obj 7.1 --format=json-pretty
{
"epoch": 30869,
"inconsistents": []
}
$ sudo rados list-inconsistent-obj 7.1s0 --format=json-pretty
{
"epoch": 30869,
"inconsistents": []
}
I'm not sure why no inconsistents (empty set) are reported in the above
Chris Shultz
Global Systems Architect
1 Stiles Road
Suite 202
SalemNH03079
United States
cshultz(a)korewireless.com
(m) 774.270.2679
korewireless.com
Disclaimer
The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.
This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.
Hi,
Please see below.
On Sat, 3 Feb 2018, Sage Weil wrote:
> On Sat, 3 Feb 2018, Wido den Hollander wrote:
>> Hi,
>>
>> I just wanted to inform people about the fact that Monitor databases can grow
>> quite big when you have a large cluster which is performing a very long
>> rebalance.
>>
>> I'm posting this on ceph-users and ceph-large as it applies to both, but
>> you'll see this sooner on a cluster with a lof of OSDs.
>>
>> Some information:
>>
>> - Version: Luminous 12.2.2
>> - Number of OSDs: 2175
>> - Data used: ~2PB
>>
>> We are in the middle of migrating from FileStore to BlueStore and this is
>> causing a lot of PGs to backfill at the moment:
>>
>> 33488 active+clean
>> 4802 active+undersized+degraded+remapped+backfill_wait
>> 1670 active+remapped+backfill_wait
>> 263 active+undersized+degraded+remapped+backfilling
>> 250 active+recovery_wait+degraded
>> 54 active+recovery_wait+degraded+remapped
>> 27 active+remapped+backfilling
>> 13 active+recovery_wait+undersized+degraded+remapped
>> 2 active+recovering+degraded
>>
>> This has been running for a few days now and it has caused this warning:
>>
>> MON_DISK_BIG mons
>> srv-zmb03-05,srv-zmb04-05,srv-zmb05-05,srv-zmb06-05,srv-zmb07-05 are using a
>> lot of disk space
>> mon.srv-zmb03-05 is 31666 MB >= mon_data_size_warn (15360 MB)
>> mon.srv-zmb04-05 is 31670 MB >= mon_data_size_warn (15360 MB)
>> mon.srv-zmb05-05 is 31670 MB >= mon_data_size_warn (15360 MB)
>> mon.srv-zmb06-05 is 31897 MB >= mon_data_size_warn (15360 MB)
>> mon.srv-zmb07-05 is 31891 MB >= mon_data_size_warn (15360 MB)
>>
>> This is to be expected as MONs do not trim their store if one or more PGs is
>> not active+clean.
>>
>> In this case we expected this and the MONs are each running on a 1TB Intel
>> DC-series SSD to make sure we do not run out of space before the backfill
>> finishes.
>>
>> The cluster is spread out over racks and in CRUSH we replicate over racks.
>> Rack by rack we are wiping/destroying the OSDs and bringing them back as
>> BlueStore OSDs and letting the backfill handle everything.
>>
>> In between we wait for the cluster to become HEALTH_OK (all PGs active+clean)
>> so that the Monitors can trim their database before we start with the next
>> rack.
>>
>> I just want to warn and inform people about this. Under normal circumstances a
>> MON database isn't that big, but if you have a very long period of
>> backfills/recoveries and also have a large number of OSDs you'll see the DB
>> grow quite big.
>>
>> This has improved significantly going to Jewel and Luminous, but it is still
>> something to watch out for.
>>
>> Make sure your MONs have enough free space to handle this!
>
> Yes!
>
> Just a side note that Joao has an elegant fix for this that allows the mon
> to trim most of the space-consuming full osdmaps. It's still work in
> progress but is likely to get backported to luminous.
>
> sage
Hi Sage,
Has this issue ever been sorted out. I've added a batch of new nodes a
couple of days ago to our Nautilus (14.2.9) cluster and the mon db is
growing at about 50GB per day.
Cluster state:
osd: 1515 osds: 1494 up (since 2d), 1492 in (since 2d); 8740
remapped pgs
data:
pools: 15 pools, 17048 pgs
objects: 483.21M objects, 1.3 PiB
usage: 1.9 PiB used, 12 PiB / 14 PiB avail
pgs: 0.012% pgs not active
1612355425/4675115461 objects misplaced (34.488%)
8305 active+clean
4372 active+remapped+backfill_wait+backfill_toofull
4348 active+remapped+backfill_wait
19 active+remapped+backfilling
2 active+clean+remapped
2 peering
Health state:
SLOW_OPS 63640 slow ops, oldest one blocked for 1402 sec, daemons
[osd.477,osd.571,osd.589,osd.707,osd.786,mon.mon01,mon.mon02,mon.mon03,mon.mon04,mon.mon05]
have slow ops.
MON_DISK_BIG mons mon01,mon02,mon03,mon04,mon05 are using a lot of disk
space
mon.mon02 is 126 GiB >= mon_data_size_warn (15 GiB)
mon.mon03 is 126 GiB >= mon_data_size_warn (15 GiB)
mon.mon04 is 126 GiB >= mon_data_size_warn (15 GiB)
mon.mon05 is 127 GiB >= mon_data_size_warn (15 GiB)
mon.mon01 is 127 GiB >= mon_data_size_warn (15 GiB)
How large can this grow? If it continues to grow at this rate our SSDs
will not be able to ride it out.
Is the only way to deal with this to stop the whole cluster, put larger
SSD drives in the monitors and then let it continue?
Milan
--
Milan Kupcevic
Senior Cyberinfrastructure Engineer at Project NESE
Harvard University
FAS Research Computing
Hi,
I am looking for a Software suite to deploy Ceph Storage Node and Gateway
server (SMB & NFS) and also dashboard Showing entire Cluster status,
Individual node health, disk identification or maintenance activity,
network utilization.
Simple user manageable dashboard.
Please suggest any Paid or Community based you have been using or you
recommend to others.
regards
Amudhan