Hi,
Is there anybody running a cluster with different os?
Due to the centos 8 change I might try to add ubuntu osd nodes to centos cluster and decommission the centos slowly but I'm not sure this is possible or not.
Thank you
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
Docs for permissions are super vague. What each flag does?
What is 'x' permitting?
What's the difference between class-write and write?
And the last question: can we limit user to reading/writing only to
existing objects in the pool?
Thanks!
Hey all,
We will be having a Ceph science/research/big cluster call on Wednesday
January 27th. If anyone wants to discuss something specific they can add
it to the pad linked below. If you have questions or comments you can
contact me.
This is an informal open call of community members mostly from
hpc/htc/research environments where we discuss whatever is on our minds
regarding ceph. Updates, outages, features, maintenance, etc...there is
no set presenter but I do attempt to keep the conversation lively.
https://pad.ceph.com/p/Ceph_Science_User_Group_20210127
<https://pad.ceph.com/p/Ceph_Science_User_Group_20200923>
We try to keep it to an hour or less.
Ceph calendar event details:
January 27th, 2020
15:00 UTC
4pm Central European
9am Central US
Description: Main pad for discussions:
https://pad.ceph.com/p/Ceph_Science_User_Group_Index
Meetings will be recorded and posted to the Ceph Youtube channel.
To join the meeting on a computer or mobile phone:
https://bluejeans.com/908675367?src=calendarLink
To join from a Red Hat Deskphone or Softphone, dial: 84336.
Connecting directly from a room system?
1.) Dial: 199.48.152.152 or bjn.vc
2.) Enter Meeting ID: 908675367
Just want to dial in on your phone?
1.) Dial one of the following numbers: 408-915-6466 (US)
See all numbers: https://www.redhat.com/en/conference-numbers
2.) Enter Meeting ID: 908675367
3.) Press #
Want to test your video connection? https://bluejeans.com/111
Kevin
--
Kevin Hrpcek
NASA VIIRS Atmosphere SIPS
Space Science & Engineering Center
University of Wisconsin-Madison
Hi,
We have bucket sync enabled and seems like it is inconsistent ☹
This is the master zone sync status on that specific bucket:
realm 5fd28798-9195-44ac-b48d-ef3e95caee48 (realm)
zonegroup 31a5ea05-c87a-436d-9ca0-ccfcbad481e3 (data)
zone 9213182a-14ba-48ad-bde9-289a1c0c0de8 (hkg)
metadata sync no sync (zone is master)
data sync source: 61c9d940-fde4-4bed-9389-edc8d7741817 (sin)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
source: f20ddd64-924b-4f78-8d2d-dd6c65f98ba9 (ash)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is behind on 126 shards
behind shards: [0,1,2,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127]
oldest incremental change not applied: 2021-01-25T11:32:57.726042+0700 [62]
104 shards are recovering
recovering shards: [0,2,3,4,5,7,8,9,10,11,12,13,15,16,17,18,19,20,21,22,24,25,26,27,28,29,31,32,33,36,37,38,39,40,42,43,44,45,47,50,51,52,53,54,55,57,58,61,63,65,66,67,68,69,70,71,72,73,74,75,76,78,80,81,82,83,84,85,87,88,90,92,93,95,96,97,98,99,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,123,124,125,126,127]
This is the secondary zone where the data has been uploaded:
realm 5fd28798-9195-44ac-b48d-ef3e95caee48 (realm)
zonegroup 31a5ea05-c87a-436d-9ca0-ccfcbad481e3 (data)
zone f20ddd64-924b-4f78-8d2d-dd6c65f98ba9 (ash)
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: 61c9d940-fde4-4bed-9389-edc8d7741817 (sin)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
source: 9213182a-14ba-48ad-bde9-289a1c0c0de8 (hkg)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is behind on 125 shards
behind shards: [0,1,2,3,4,5,6,8,9,10,11,12,13,14,15,16,17,18,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127]
oldest incremental change not applied: 2021-01-25T11:29:32.450031+0700 [61]
126 shards are recovering
recovering shards: [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,104,105,106,107,108,109,110,111,112,114,115,116,117,118,119,120,121,122,123,124,125,126,127]
The pipes are already there:
"id": "seo-2",
"data_flow": {
"symmetrical": [
{
"id": "seo-2-flow",
"zones": [
"9213182a-14ba-48ad-bde9-289a1c0c0de8",
"f20ddd64-924b-4f78-8d2d-dd6c65f98ba9"
]
}
]
},
"pipes": [
{
"id": "seo-2-hkg-ash-pipe",
"source": {
"bucket": "seo..prerender",
"zones": [
"9213182a-14ba-48ad-bde9-289a1c0c0de8"
]
},
"dest": {
"bucket": "seo..prerender",
"zones": [
"f20ddd64-924b-4f78-8d2d-dd6c65f98ba9"
]
},
"params": {
"source": {
"filter": {
"tags": []
}
},
"dest": {},
"priority": 0,
"mode": "system",
"user": ""
}
},
{
"id": "seo-2-ash-hkg-pipe",
"source": {
"bucket": "seo..prerender",
"zones": [
"f20ddd64-924b-4f78-8d2d-dd6c65f98ba9"
]
},
"dest": {
"bucket": "seo..prerender",
"zones": [
"9213182a-14ba-48ad-bde9-289a1c0c0de8"
]
},
"params": {
"source": {
"filter": {
"tags": []
}
},
"dest": {},
"priority": 0,
"mode": "system",
"user": ""
}
}
],
"status": "enabled"
}
Any idea to troubleshoot?
Thank you
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
I'm trying to set up a new ceph cluster with cephadm on a SUSE SES trial
that has Ceph 15.2.8
Each OSD node has 18 rotational SAS disks, 4 NVMe 2TB SSDs for DB, and 2
NVME2 200GB Optane SSDs for WAL.
These servers will eventually have 24 rotational SAS disks that they will
inherit from existing storage servers. So I don't want all the space used
on the DB and WAL SSDs.
I suspect from the comment "(db_slots is actually to be favoured here, but
it's not implemented yet)" on this page in the docs:
https://docs.ceph.com/en/latest/cephadm/drivegroups/#the-advanced-case these
parameters are not yet implemented, yet are documented as such under
"ADDITIONAL OPTIONS"
My osd_spec.yml:
service_type: osd
service_id: three_tier_osd
placement:
host_pattern: '*'
data_devices:
rotational: 1
model: 'ST14000NM0288'
db_devices:
rotational: 0
model: 'INTEL SSDPE2KX020T8'
limit: 6
wal_devices:
model: 'INTEL SSDPEL1K200GA'
limit: 12
db_slots: 6
wal_slots: 12
All available space is consumed on my DB and WAL SSDs with only 18 OSDs,
leaving no room to add additional spindles.
Is this still work in progress, or a bug I should report? Possibly related
to https://github.com/rook/rook/issues/5026 At the minimum, this appears
to be a documentation bug.
How can I work around this?
-Chip
I have been trying to create two virtual test clusters to learn about the
RGW multisite setting. So far, I have set up two small Nautilus
(v.14.2.16) clusters, designated one of them as the "master zone site" and
followed every step outlined in the doc (
https://docs.ceph.com/en/nautilus/radosgw/multisite/), including create a
system user, updating the period, and restarting the rgw daemon. (For the
sake of simplicity, there is only one RGW daemon running on each site.)
Once I installed the RGW daemon on the secondary zone site, I tried pulling
the realm from the master zone cluster, but ended up with this:
```
$ radosgw-admin realm pull --url=http://<master zone gateway>:80
--access-key=<system_access_key> --secret=<system_secret_key>
request failed: (13) Permission denied
If the realm has been changed on the master zone, the master zone's gateway
may need to be restarted to recognize this user.
```
I tried adding the --rgw-realm=<realm set up in the primary site>, but the
result was the same. I restarted the rgw daemon on both sides -- that did
not help, either.
The output of all of the following on the master zone side, as far as I
could tell, seems correct -- the realm, zonegroup, zone I created are the
only ones and set to default.
```
radosgw-admin zone/zonegroup/realm list
radosgw-admin zone/zonegroup/realm get
```
On the "master zone" side, the rgw log shows
```
2021-01-22 13:34:48.404 7fb9ca89e700 1 ====== starting new request
req=0x7fb9ca897740 =====
2021-01-22 13:34:48.428 7fb9ca89e700 1 ====== req done req=0x7fb9ca897740
op status=0 http_status=403 latency=0.0240002s ======
2021-01-22 13:34:48.428 7fb9ca89e700 1 civetweb: 0x559d6509a000:
10.33.30.55 - - [22/Jan/2021:13:34:48 -0500] "GET /admin/realm HTTP/1.1"
403 318 - -
```
I am using Ubuntu 18.04, Ceph v.14.2.16, deployed using `ceph-deploy`.
*Mami Hayashida*
*Research Computing Associate*
Univ. of Kentucky ITS Research Computing Infrastructure
Hi Dan,
it is possible that the payload reduction also solved or at least reduced a really bad problem that looks related (beware, that's a long one): https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/FBGIJZNFG44… . Since reducing the payload size I still observe these large peaks in the MON network activity. However, it seems that the cluster does not go down like before any more. During these peaks, I see warnings like these:
2021-01-22 12:00:00.000102 [WRN] overall HEALTH_WARN 1 pools nearfull
2021-01-22 11:04:09.156796 [INF] Health check cleared: SLOW_OPS (was: 5 slow ops, oldest one blocked for 75 sec, mon.ceph-02 has slow ops)
2021-01-22 11:04:07.994416 [WRN] Health check update: 5 slow ops, oldest one blocked for 75 sec, mon.ceph-02 has slow ops (SLOW_OPS)
2021-01-22 11:04:01.469498 [WRN] Health check failed: 124 slow ops, oldest one blocked for 82 sec, daemons [mon.ceph-02,mon.ceph-03] have slow ops. (SLOW_OPS)
2021-01-22 11:00:00.000104 [WRN] overall HEALTH_WARN 1 pools nearfull
2021-01-22 10:36:44.576663 [INF] Health check cleared: SLOW_OPS (was: 25 slow ops, oldest one blocked for 42 sec, daemons [mon.ceph-02,mon.ceph-03] have slow ops.)
2021-01-22 10:36:38.543763 [WRN] Health check failed: 18 slow ops, oldest one blocked for 38 sec, daemons [mon.ceph-02,mon.ceph-03] have slow ops. (SLOW_OPS)
So, at least stuff is working.
I now lean towards the hypothesis that these outages were caused by some synchronisation process between MONs that got less problematic with reducing the payload size. I might be able to reduce my insane beacon time-outs again, but before doing so, do you know of any other communication parameters similar to the mon_sync_max_payload_size that might be relevant in MON-[MON, MGR, OSD] communication?
In general, I have the impression that due to such little bugs the recommendation for production clusters should be elevated to at least 5 MONs so that one can afford 2 MONs going out of quorum temporarily. I will upgrade our cluster to 5 MONs as soon as I can.
Thanks for your help and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Dan van der Ster <dan(a)vanderster.com>
Sent: 06 January 2021 20:53:14
To: Frank Schilder
Subject: Re: [ceph-users] Re: Storage down due to MON sync very slow
Yeah I was going to say -- ignore all of the rsync advice in that
thread, it is unnecessary.
Setting a small mon sync payload works like magic :)
-- dan
On Wed, Jan 6, 2021 at 8:49 PM Frank Schilder <frans(a)dtu.dk> wrote:
>
> OK, sorry for all my questions.
>
> Setting mon_sync_max_payload_size=4096 actually makes the MON sync in no time! Thank you so much :)
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Frank Schilder
> Sent: 06 January 2021 20:40:26
> To: Dan van der Ster
> Subject: Re: [ceph-users] Re: Storage down due to MON sync very slow
>
> OK, thanks a lot! I will try it now. Hope the cluster remains responsive.
>
> I'm wondering about this approach someone brought up in your thread:
>
> Eventually I stopped one MON, tarballed it's database and used that to
> bring back the MON which was upgraded to 13.2.8
>
> That work without any hickups. The MON joined again within a few seconds.
>
> Stopping one MON for a copy would be much shorter storage outage than the sync I'm doing. I guess its the entire mon data directory to copy. I always wondered if this contains data tied to a specific MON. If not, the copy approach could speed things up a lot. What do you think?
>
> Thanks again and best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Dan van der Ster <dan(a)vanderster.com>
> Sent: 06 January 2021 20:36:15
> To: Frank Schilder
> Subject: Re: [ceph-users] Re: Storage down due to MON sync very slow
>
> We have used mon_sync_max_payload_size 4096 on our largest most
> important prod cluster since that thread.
> The PR from Sage makes something like that the default anyway. (the PR
> counts keys rather than bytes, but the effect is the same).
>
> mon_sync_max_payload_size 4096 should not impact the speed of syncing
> -- it simply breaks the sync into smaller more manageable pieces.
> (Without this, if you have lots of keys in the mon db, in our case
> caused by lots of rbd snapshots, then syncing will never ever
> complete).
>
> -- dan
>
> On Wed, Jan 6, 2021 at 8:32 PM Frank Schilder <frans(a)dtu.dk> wrote:
> >
> > Hi Dan,
> >
> > thanks for that. Will it slow down or accelerate the syncing (will read your post after that e-mail), or will it just allow I/O to continue and sync more in the background? Current value is
> >
> > mon_sync_max_payload_size 1048576
> >
> > Related to that, would building a MON store from OSDs following https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#… help providing a head start? Not sure if this procedure works on an active cluster.
> >
> > Will study your thread now ...
> >
> > Thanks again and best regards,
> > =================
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > ________________________________________
> > From: Dan van der Ster <dan(a)vanderster.com>
> > Sent: 06 January 2021 20:26:46
> > To: Frank Schilder
> > Subject: Re: [ceph-users] Re: Storage down due to MON sync very slow
> >
> > (obviously just put that config in the ceph.conf on the mons if mimic
> > doesn't have ceph config... I don't quite remember.)
> >
> > -- dan
> >
> > On Wed, Jan 6, 2021 at 8:25 PM Dan van der Ster <dan(a)vanderster.com> wrote:
> > >
> > > This sounds a lot like an old thread of mine:
> > > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/M5ZKF7PTEO2…
> > >
> > > See the discussion about mon_sync_max_payload_size, and the PR that
> > > fixed this at some point in nautilus.
> > >
> > > Our workaround was:
> > >
> > > ceph config set mon mon_sync_max_payload_size 4096
> > >
> > > Hope that helps,
> > >
> > > Dan
> > >
> > >
> > > On Wed, Jan 6, 2021 at 8:18 PM Frank Schilder <frans(a)dtu.dk> wrote:
> > > >
> > > > Dear Dan,
> > > >
> > > > thanks for your fast response.
> > > >
> > > > Version: mimic 13.2.10.
> > > >
> > > > Here is the mon_status of the "new" MON during syncing:
> > > >
> > > > [root@ceph-01 ~]# ceph daemon mon.ceph-01 mon_status
> > > > {
> > > > "name": "ceph-01",
> > > > "rank": 0,
> > > > "state": "synchronizing",
> > > > "election_epoch": 0,
> > > > "quorum": [],
> > > > "features": {
> > > > "required_con": "144115188346404864",
> > > > "required_mon": [
> > > > "kraken",
> > > > "luminous",
> > > > "mimic",
> > > > "osdmap-prune"
> > > > ],
> > > > "quorum_con": "0",
> > > > "quorum_mon": []
> > > > },
> > > > "outside_quorum": [
> > > > "ceph-01"
> > > > ],
> > > > "extra_probe_peers": [],
> > > > "sync_provider": [],
> > > > "sync": {
> > > > "sync_provider": "mon.2 192.168.32.67:6789/0",
> > > > "sync_cookie": 33302773774,
> > > > "sync_start_version": 38355711
> > > > },
> > > > "monmap": {
> > > > "epoch": 3,
> > > > "fsid": "e4ece518-f2cb-4708-b00f-b6bf511e91d9",
> > > > "modified": "2019-03-14 23:08:34.717223",
> > > > "created": "2019-03-14 22:18:15.088212",
> > > > "features": {
> > > > "persistent": [
> > > > "kraken",
> > > > "luminous",
> > > > "mimic",
> > > > "osdmap-prune"
> > > > ],
> > > > "optional": []
> > > > },
> > > > "mons": [
> > > > {
> > > > "rank": 0,
> > > > "name": "ceph-01",
> > > > "addr": "192.168.32.65:6789/0",
> > > > "public_addr": "192.168.32.65:6789/0"
> > > > },
> > > > {
> > > > "rank": 1,
> > > > "name": "ceph-02",
> > > > "addr": "192.168.32.66:6789/0",
> > > > "public_addr": "192.168.32.66:6789/0"
> > > > },
> > > > {
> > > > "rank": 2,
> > > > "name": "ceph-03",
> > > > "addr": "192.168.32.67:6789/0",
> > > > "public_addr": "192.168.32.67:6789/0"
> > > > }
> > > > ]
> > > > },
> > > > "feature_map": {
> > > > "mon": [
> > > > {
> > > > "features": "0x3ffddff8ffacfffb",
> > > > "release": "luminous",
> > > > "num": 1
> > > > }
> > > > ],
> > > > "mds": [
> > > > {
> > > > "features": "0x3ffddff8ffacfffb",
> > > > "release": "luminous",
> > > > "num": 2
> > > > }
> > > > ],
> > > > "client": [
> > > > {
> > > > "features": "0x2f018fb86aa42ada",
> > > > "release": "luminous",
> > > > "num": 1
> > > > },
> > > > {
> > > > "features": "0x3ffddff8eeacfffb",
> > > > "release": "luminous",
> > > > "num": 1
> > > > },
> > > > {
> > > > "features": "0x3ffddff8ffacfffb",
> > > > "release": "luminous",
> > > > "num": 17
> > > > }
> > > > ]
> > > > }
> > > > }
> > > >
> > > > I'm a bit surprised that the other 2 MONs don't remain in quorum until this MON has caught up. Is there any way to monitor the syncing progress? Right now I need to interrupt regularly to allow some I/O, but I have no clue how long I need to wait.
> > > >
> > > > Thanks for your help!
> > > >
> > > > Best regards,
> > > > =================
> > > > Frank Schilder
> > > > AIT Risø Campus
> > > > Bygning 109, rum S14
> > > >
> > > > ________________________________________
> > > > From: Dan van der Ster <dan(a)vanderster.com>
> > > > Sent: 06 January 2021 20:16:44
> > > > To: Frank Schilder
> > > > Cc: Ceph Users
> > > > Subject: Re: [ceph-users] Re: Storage down due to MON sync very slow
> > > >
> > > > Which version of Ceph are you running?
> > > >
> > > > .. dan
> > > >
> > > >
> > > > On Wed, Jan 6, 2021, 8:14 PM Frank Schilder <frans(a)dtu.dk<mailto:frans@dtu.dk>> wrote:
> > > > In the output of the MON I see slow ops warnings:
> > > >
> > > > debug 2021-01-06 20:12:48.854 7f1a3d29f700 -1 mon.ceph-01@0(synchronizing) e3 get_health_metrics reporting 20 slow ops, oldest is log(1 entries from seq 1 at 2021-01-06 20:00:12.014861)
> > > >
> > > > There appears to be no progress on this operation, it is stuck.
> > > >
> > > > Best regards,
> > > > =================
> > > > Frank Schilder
> > > > AIT Risø Campus
> > > > Bygning 109, rum S14
> > > >
> > > > ________________________________________
> > > > From: Frank Schilder <frans(a)dtu.dk<mailto:frans@dtu.dk>>
> > > > Sent: 06 January 2021 20:11:25
> > > > To: ceph-users(a)ceph.io<mailto:ceph-users@ceph.io>
> > > > Subject: [ceph-users] Storage down due to MON sync very slow
> > > >
> > > > Dear all,
> > > >
> > > > I had to restart one out of 3 MONs on an empty MON DB dir. It is in state syncing right now, but I'm not sure if there is any progress. The cluster is completely unresponsive even though I have 2 healthy MONs. Is there any way to sync the DB directory faster and/or without downtime?
> > > >
> > > > Thanks a lot!
> > > >
> > > > Best regards,
> > > > =================
> > > > Frank Schilder
> > > > AIT Risø Campus
> > > > Bygning 109, rum S14
> > > > _______________________________________________
> > > > ceph-users mailing list -- ceph-users(a)ceph.io<mailto:ceph-users@ceph.io>
> > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io<mailto:ceph-users-leave@ceph.io>
> > > > _______________________________________________
> > > > ceph-users mailing list -- ceph-users(a)ceph.io<mailto:ceph-users@ceph.io>
> > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io<mailto:ceph-users-leave@ceph.io>
Hello everyone,
I'm trying to add an OSD node to my current cluster. I created an lvm volume for this node to use for OSD.
My current Ceph version is 14.2.6 and it runs on an RHEL 7 OS.
However, I got error when trying to activate the node. I'm confused with the output. I tried to see what really happened, but I don't know how to proceed to pinpoint the issues.
I appreciate so much for any help to clear the confusion.
* Why did the "mon getmap" command send the result to /dev/stderr? I saw the monmap file having value and it looked like it got the monmap, is there any way to check if the monmap having problem?
* What caused "_read_fsid unparsable uuid"?
Sincerely,
Hai
[xxx@xxx.com@xxx430 ~]$ sudo ceph-volume --cluster aap-storage lvm create --data /dev/kubernetes/ceph-osd
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster aap-storage --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/aap-storage.keyring -i - osd new c84f74b3-9a3e-4cd9-a4ce-4c1819a56f90
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/aap-storage-5
Running command: /sbin/restorecon /var/lib/ceph/osd/aap-storage-5
Running command: /bin/chown -h ceph:ceph /dev/kubernetes/ceph-osd
Running command: /bin/chown -R ceph:ceph /dev/dm-9
Running command: /bin/ln -s /dev/kubernetes/ceph-osd /var/lib/ceph/osd/aap-storage-5/block
Running command: /bin/ceph --cluster aap-storage --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/aap-storage.keyring mon getmap -o /var/lib/ceph/osd/aap-storage-5/activate.monmap
stderr: got monmap epoch 8
Running command: /bin/ceph-authtool /var/lib/ceph/osd/aap-storage-5/keyring --create-keyring --name osd.5 --add-key AQB2RAhgnkcLIBAAdcdR5N4YKzSJmfoA6G6XvA==
stdout: creating /var/lib/ceph/osd/aap-storage-5/keyring
added entity osd.5 auth(key=AQB2RAhgnkcLIBAAdcdR5N4YKzSJmfoA6G6XvA==)
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/aap-storage-5/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/aap-storage-5/
Running command: /bin/ceph-osd --cluster aap-storage --osd-objectstore bluestore --mkfs -i 5 --monmap /var/lib/ceph/osd/aap-storage-5/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/aap-storage-5/ --osd-uuid c84f74b3-9a3e-4cd9-a4ce-4c1819a56f90 --setuser ceph --setgroup ceph
stderr: 2021-01-20 15:55:52.319 7f0970d3ba80 -1 bluestore(/var/lib/ceph/osd/aap-storage-5/) _read_fsid unparsable uuid
--> ceph-volume lvm prepare successful for: kubernetes/ceph-osd
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/aap-storage-5
Running command: /bin/ceph-bluestore-tool --cluster=aap-storage prime-osd-dir --dev /dev/kubernetes/ceph-osd --path /var/lib/ceph/osd/aap-storage-5 --no-mon-config
Running command: /bin/ln -snf /dev/kubernetes/ceph-osd /var/lib/ceph/osd/aap-storage-5/block
Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/aap-storage-5/block
Running command: /bin/chown -R ceph:ceph /dev/dm-9
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/aap-storage-5
Running command: /bin/systemctl enable ceph-volume@lvm-5-c84f74b3-9a3e-4cd9-a4ce-4c1819a56f90
stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-volume(a)lvm-5-c84f74b3-9a3e-4cd9-a4ce-4c1819a56f90.service to /usr/lib/systemd/system/ceph-volume@.service.
Running command: /bin/systemctl enable --runtime ceph-osd@5
stderr: Created symlink from /run/systemd/system/ceph-osd.target.wants/ceph-osd(a)5.service to /usr/lib/systemd/system/ceph-osd@.service.
Running command: /bin/systemctl start ceph-osd@5
stderr: Job for ceph-osd(a)5.service failed because the control process exited with error code. See "systemctl status ceph-osd(a)5.service" and "journalctl -xe" for details.
--> Was unable to complete a new OSD, will rollback changes
Running command: /bin/ceph --cluster aap-storage --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/aap-storage.keyring osd purge-new osd.5 --yes-i-really-mean-it
stderr: purged osd.5
--> RuntimeError: command returned non-zero exit status: 1
So I break the procedure into steps, and proceed them one by one. I saw the problem happen with the bluestore.
[xxx@xxx.com@xxx430 ~]$ sudo ceph-osd -c /etc/ceph/${CLUSTER_NAME}.conf -k /etc/ceph/${CLUSTER_NAME}.client.admin.keyring -i $ID --mkfs \
> --monmap /var/lib/ceph/osd/${CLUSTER_NAME}-$ID/activate.monmap --osd-uuid $UUID --no-mon-config \
> --osd-data /var/lib/ceph/osd/${CLUSTER_NAME}-$ID/ --setuser ceph --setgroup ceph
2021-01-21 16:59:14.907 7f724e882a80 -1 bluestore(/var/lib/ceph/osd/aap-storage-5//block) _read_bdev_label failed to open /var/lib/ceph/osd/aap-storage-5//block: (13) Permission denied
2021-01-21 16:59:14.908 7f724e882a80 -1 bluestore(/var/lib/ceph/osd/aap-storage-5//block) _read_bdev_label failed to open /var/lib/ceph/osd/aap-storage-5//block: (13) Permission denied
2021-01-21 16:59:14.908 7f724e882a80 -1 bluestore(/var/lib/ceph/osd/aap-storage-5/) _read_fsid unparsable uuid
2021-01-21 16:59:14.908 7f724e882a80 -1 bluestore(/var/lib/ceph/osd/aap-storage-5/) _setup_block_symlink_or_file failed to open block file: (13) Permission denied
2021-01-21 16:59:14.908 7f724e882a80 -1 bluestore(/var/lib/ceph/osd/aap-storage-5/) mkfs failed, (13) Permission denied
2021-01-21 16:59:14.908 7f724e882a80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (13) Permission denied
2021-01-21 16:59:14.908 7f724e882a80 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/aap-storage-5/: (13) Permission denied
Because of the permission problem, so I avoid using the ceph user for the ceph service, however, the problem still persisted.
[xxx@xxx.com@xxx430 ~]$ sudo ceph-osd -c /etc/ceph/${CLUSTER_NAME}.conf -k /etc/ceph/${CLUSTER_NAME}.client.admin.keyring -i $ID --mkfs --monmap /var/lib/ceph/osd/${CLUSTER_NAME}-$ID/activate.monmap --osd-uuid $UUID --no-mon-config --osd-data /var/lib/ceph/osd/${CLUSTER_NAME}-$ID/
2021-01-21 17:00:08.583 7f3acc546a80 -1 bluestore(/var/lib/ceph/osd/aap-storage-5/) _read_fsid unparsable uuid
--
KPMG IT Service GmbH
Sitz/Registergericht: Berlin/Amtsgericht Charlottenburg, HRB 87521 B
Geschäftsführer: Hans-Christian Schwieger, Helmar Symmank
Aufsichtsratsvorsitzender: WP StB Klaus Becker
Allgemeine Informationen zur Datenverarbeitung im Rahmen unserer allgemeinen Geschäftstätigkeit sowie im Mandatsverhältnis gemäß EU Datenschutz-Grundverordnung sind hier <https://home.kpmg.com/content/dam/kpmg/de/pdf/Themen/2018/datenschutzinform…> abrufbar.
Die Information in dieser E-Mail ist vertraulich und kann dem Berufsgeheimnis unterliegen. Sie ist ausschließlich für den Adressaten bestimmt. Jeglicher Zugriff auf diese E-Mail durch andere Personen als den Adressaten ist untersagt. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, ist Ihnen jede Veröffentlichung, Vervielfältigung oder Weitergabe wie auch das Ergreifen oder Unterlassen von Maßnahmen im Vertrauen auf erlangte Information untersagt. In dieser E-Mail enthaltene Meinungen oder Empfehlungen unterliegen den Bedingungen des jeweiligen Mandatsverhältnisses mit dem Adressaten.
The information in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. Access to this e-mail by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Any opinions or advice contained in this e-mail are subject to the terms and conditions expressed in the governing KPMG client engagement letter.
Hi all,
During rejoin an MDS can sometimes go OOM if the openfiles table is too large.
The workaround has been described by ceph devs as "rados rm -p
cephfs_metadata mds0_openfiles.0".
On our cluster we have several such objects for rank 0:
mds0_openfiles.0 exists with size: 199978
mds0_openfiles.1 exists with size: 153650
mds0_openfiles.2 exists with size: 40987
mds0_openfiles.3 exists with size: 7746
mds0_openfiles.4 exists with size: 413
If we suffer such an OOM, do we need to rm *all* of those objects or
only the `.0` object?
Best Regards,
Dan
Hi,
What limits are there on the "reasonable size" of an rbd?
E.g. when I try to create a 1 PB rbd with default 4 MiB objects on my
octopus cluster:
$ rbd create --size 1P --data-pool rbd.ec rbd.meta/fs
2021-01-20T18:19:35.799+1100 7f47a99253c0 -1 librbd::image::CreateRequest: validate_layout: image size not compatible with object
...which somes from:
== src/librbd/image/CreateRequest.cc
bool validate_layout(CephContext *cct, uint64_t size, file_layout_t &layout) {
if (!librbd::ObjectMap<>::is_compatible(layout, size)) {
lderr(cct) << "image size not compatible with object map" << dendl;
return false;
}
== src/librbd/ObjectMap.cc
template <typename I>
bool ObjectMap<I>::is_compatible(const file_layout_t& layout, uint64_t size) {
uint64_t object_count = Striper::get_num_objects(layout, size);
return (object_count <= cls::rbd::MAX_OBJECT_MAP_OBJECT_COUNT);
}
== src/cls/rbd/cls_rbd_types.h
static const uint32_t MAX_OBJECT_MAP_OBJECT_COUNT = 256000000;
For 4 MiB objects that object count equates to just over 976 TB.
Is there any particular reason for that MAX_OBJECT_MAP_OBJECT_COUNT, or it
just "this is crazy large, if you're trying to go over this you're doing
something wrong, rethink your life..."?
Yes, I realise I can increase the size of the objects to get a larger rbd,
or drop the object-map support (and the fast-diff that goes along with
it).
I'm SO glad I found this limit now rather than starting on a smaller rbd
and a finding the limit when I tried to grow the rbd underneath a rapidly
filling filesystem.
What else should I know?
Background: I currently have nearly 0.5 PB on XFS (on lvm / raid6) and ZFS
that I'm looking to move over to ceph. XFS is a requirement, for the
reflinking (sadly not yet available in CephFS: https://tracker.ceph.com/issues/1680).
The recommendation for XFS is to start larger, on a thin-provisioned store
(hello rbd!), rather than start smaller and grow as needed - e.g. see the
thread surrounding:
https://www.spinics.net/lists/linux-xfs/msg20099.html
Rather than a single large rbd, should I be looking at multiple smaller
rbds linked together using lvm or somesuch? What are the tradeoffs?
And whilst we're here... for an rbd with the data on an erasure-coded
pool, how do you calculate the amount of rbd metadata required if/when the
rbd data is fully allocated?
Cheers,
Chris