Hi,
I've successfully updated my luminous lab environment to nautilus so next week I'll give a try to the prod env but 2 things same up in the description:
1. Upmap: I've never used this before don't know how I couldn't see it because it is quite cool feature. So ceph says let it in upmap, my question is like with the ceph-pg autoscaler module not the best thing to use in 'on' mode, better in warn, so is it good to use in upmap mode let's say in ceph world "safe"?
2. Regarding this assimilate-conf. It's not clear for me what is it actually? Doc says:
"This is also a good time to fully transition any config options in ceph.conf into the cluster's configuration database. On each host, you can use the following command to import any option into the monitors with ceph config assimilate-conf -i /etc/ceph/ceph.conf".
What does this mean? If I have different configs on different servers and I run it on the servers it will merge it together all the configs? Or what I can do with this feature?
Thank you
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
Hi,
I have a Ceph cluster used for RGW and RBD. I found that all I/Os to
RGW seemed to be
blocked while dynamic resharding. Could you tell me whether this
behavior is by design or not?
I attached a graph which means I/O seemed to be blocked. Here x-axis
is time and y-axis
is the number of RADOS objects. In addition, dynamic resharding was
run between 16:22:30 and 16:31:30.
I read the official documents about dynamic resharding. But there is
no description about blocking during dynamic resharding.
https://docs.ceph.com/en/octopus/radosgw/dynamicresharding/
In addition, I read the following Red Hat's blog post.
https://www.redhat.com/ja/blog/ceph-rgw-dynamic-bucket-sharding-performance…
> You do not need to stop reading or writing objects to the bucket while resharding is happening.
It would mean the dynamic resharding is online operation. However,
it's not clear whether this feature blocks I/Os or not.
Thanks,
Satoru
Hi all,
We need to store millions of files using S3 protocol in Ceph (version
Nautilus), but have projects where isn't appropriate or possible to
create a lot of S3 accounts. Is it better to have multiple S3 buckets or
one bucket with sub folders?
For example AWS service from Amazon allows you to create up to 100
buckets in each of your AWS cloud accounts. You can request more
buckets, up to a maximum quota of 1,000, by submitting a service limit
increase. There is no limit on the number of objects you can store in a
bucket, but in the Ceph we run into a problem with listing and
resharding with a millions of files in one bucket.
Thank you
Michal
Hi,
We have a problem with a PG that was inconsistent, currently the PG in
our cluster have 3 copies.
It was not possible for us to repair this pg with "ceph pg repair" (This
PG is in osd 14,1,2) so we deleted some of the copies of osd 14 with the
following command.
ceph-objectstore-tool --data-path /var/lib/ceph/osd.14/ --pgid 22.f --op
remove --force
This caused an automatic attempt to create the missing copy entering the
backfilling state, but when doing this it crashed osd 1 and 2 and threw
the IOPS to 0, freezing the cluster.
Is there any way to remove this entire pg or try to recreate the missing
copy or ignore it completely? It causes instability in the cluster.
Thank you, I await comments
--
Untitled Document
------------------------------------------------------------------------
Gabriel I. Medve
Hi all
An accidental power failure happened.
That resulted CephFS offline and cannot be mounted.
I have 3 MDS daemons but it complains "1 mds daemon damaged".
It seems a PG of cephfs_metadata is inconsistent. I tried to repair, but doesn't get it repaired.
How do I repair the damaged MDS and bring the CephFS up/online?
Details are included below.
Many thanks in advance.
Sagara
# ceph -s
cluster:
id: abc...
health: HEALTH_ERR
1 filesystem is degraded
1 filesystem is offline
1 mds daemon damaged
4 scrub errors
Possible data damage: 1 pg inconsistent
services:
mon: 3 daemons, quorum a,b,c (age 107s)
mgr: a(active, since 22m), standbys: b, c
mds: cephfs:0/1 3 up:standby, 1 damaged
osd: 3 osds: 3 up (since 96s), 3 in (since 96s)
data:
pools: 3 pools, 192 pgs
objects: 281.05k objects, 327 GiB
usage: 2.4 TiB used, 8.1 TiB / 11 TiB avail
pgs: 191 active+clean
1 active+clean+inconsistent
# ceph health detail
HEALTH_ERR 1 filesystem is degraded; 1 filesystem is offline; 1 mds daemon damaged; 4 scrub errors; Possible data damage: 1 pg inconsistent
FS_DEGRADED 1 filesystem is degraded
fs cephfs is degraded
MDS_ALL_DOWN 1 filesystem is offline
fs cephfs is offline because no MDS is active for it.
MDS_DAMAGE 1 mds daemon damaged
fs cephfs mds.0 is damaged
OSD_SCRUB_ERRORS 4 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 2.44 is active+clean+inconsistent, acting [0,2,1]
# ceph osd lspools
2 cephfs_metadata
3 cephfs_data
4 rbd
# ceph pg repair 2.44
# ceph -w
2021-05-22 01:48:04.775783 osd.0 [ERR] 2.44 shard 0 soid 2:22efaf6a:::200.00006048:head : candidate size 1540096 info size 1555896 mismatch
2021-05-22 01:48:04.775786 osd.0 [ERR] 2.44 shard 1 soid 2:22efaf6a:::200.00006048:head : candidate size 1540096 info size 1555896 mismatch
2021-05-22 01:48:04.775787 osd.0 [ERR] 2.44 shard 2 soid 2:22efaf6a:::200.00006048:head : candidate size 1441792 info size 1555896 mismatch
2021-05-22 01:48:04.775789 osd.0 [ERR] 2.44 soid 2:22efaf6a:::200.00006048:head : failed to pick suitable object info
2021-05-22 01:48:04.775849 osd.0 [ERR] repair 2.44 2:22efaf6a:::200.00006048:head : on disk size (1540096) does not match object info size (1555896) adjusted for ondisk to (1555896)
2021-05-22 01:48:04.787167 osd.0 [ERR] 2.44 repair 4 errors, 0 fixed
--- End of detail ---
Oh these are filestore OSDs? I didn't expect that.
> 2021-05-20 23:56:52.223 2227200 0 mds.0.journaler.mdlog(ro)
> _finish_read got less than expected (1555896)
> 2021-05-20 23:56:52.223 2229180 0 mds.0.log _replay journaler got
> error -22, aborting
> 2021-05-20 23:56:52.223 2229180 -1 log_channel(cluster) log [ERR] :
> Error loading MDS rank 0: (22) Invalid argument
This confirms that the MDS is not starting because of the object size.
I assume there was some recovery going on when this happened? The OSD
uptime from your status was quite short.
I'm not really sure what to do next, I found an old thread [4] with
almost the same error, only that the user ended up truncating the
objects because the expected size was smaller than the object info, in
your case they're the other way around. In any case I would backup all
three copies of that object including md5sum before chaning anything.
[4] https://www.spinics.net/lists/ceph-devel/msg16516.html
Zitat von Sagara Wijetunga <sagarawmw(a)yahoo.com>:
> Here are the physical file sizes of the "200.00006048*":
> OSD.0:-rw-r--r-- 1 ceph ceph 1540096 May 20 22:47
> /var/lib/ceph/osd/ceph-0/current/2.44_head/200.00006048__head_56F5F744__
>
> OSD.1:-rw-r--r-- 1 ceph ceph 1540096 May 20 22:47
> /var/lib/ceph/osd/ceph-1/current/2.44_head/DIR_4/DIR_4/200.00006048__head_56F5F744__2
>
> OSD2:-rw-r--r-- 1 ceph ceph 1441792 May 20 22:47
> /var/lib/ceph/osd/ceph-2/current/2.44_head/DIR_4/DIR_4/200.00006048__head_56F5F744__2
>
> Sagara
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
Hi Eugen, this is the output: ceph mgr module ls
{
"always_on_modules": [
"balancer",
"crash",
"devicehealth",
"orchestrator",
"pg_autoscaler",
"progress",
"rbd_support",
"status",
"telemetry",
"volumes"
],
"enabled_modules": [
"cephadm",
"dashboard",
"diskprediction_local",
"iostat",
"prometheus",
"restful"
],
"disabled_modules": [
...
}
As you see balancer/crash/… are in section always_on. I checked it on all of the 3 monitor nodes with the same output.
Then, checking disabled_modules I’ve seen many modules that could help to track some more information (logs) on our problem, like:
- alerts
- insights
- test_orchestrator
- and other…
But we are not sure if we can enable some of them. Now all the logs we have from Ceph are not showing errors. Would it help to see more logs to enable some of those modules?
On the other hand, as for what you indicate us of the commands that hang, we see that the containers that launch it remain executing, since they are waiting to finish. So you can see here the list of tests that remain running (hung):
af13bda77a1a 172.16.3.146:4000/ceph/ceph:v15.2.9 "ceph osd status" 23 hours ago Up 23 hours wizardly_leavitt
5b5c760454c7 172.16.3.146:4000/ceph/ceph:v15.2.9 "ceph telemetry stat…" 24 hours ago Up 24 hours intelligent_bardeen
a98e6061489d 172.16.3.146:4000/ceph/ceph:v15.2.9 "ceph service dump" 24 hours ago Up 24 hours romantic_mendel
66c943a032f8 172.16.3.146:4000/ceph/ceph:v15.2.9 "ceph service status" 24 hours ago Up 24 hours happy_shannon
7e18899dffc5 172.16.3.146:4000/ceph/ceph:v15.2.9 "ceph crash stat" 24 hours ago Up 24 hours xenodochial_germain
8268082e753b 172.16.3.146:4000/ceph/ceph:v15.2.9 "ceph crash ls" 24 hours ago Up 24 hours stoic_volhard
fc5c434a4e23 172.16.3.146:4000/ceph/ceph:v15.2.9 "ceph balancer status" 24 hours ago Up 24 hours epic_mendel
So the containers will have to be eliminated.
As for the logs of these containers nothing appears inside the container (docker logs xxxx), only, when you kill it you can see (--verbose):
[ceph: root@spsrc-mon-1 /]# ceph —verbose pg stat
….
validate_command: pg stat
better match: 2.5 > 0: pg stat
bestcmds_sorted:
[{'flags': 8,
'help': 'show placement group status.',
'module': 'pg',
'perm': 'r',
'sig': [argdesc(<class 'ceph_argparse.CephPrefix'>, req=True, name=prefix, n=1, numseen=0, prefix=pg),
argdesc(<class 'ceph_argparse.CephPrefix'>, req=True, name=prefix, n=1, numseen=0, prefix=stat)]}]
Submitting command: {'prefix': 'pg stat', 'target': ('mon-mgr', '')}
submit ['{"prefix": "pg stat", "target": ["mon-mgr", ""]}'] to mon-mgr
[hung forever …]
Kind regards,
Manu.
Hi,
I have a 12.2.13 I want to go and upgrade. However, there are a whole
bunch of stray files/inodes(?) which I would want to have processed.
Also because I get a lot of 'No space left on device' messages. I
started a 'find . -ls' in the root of the CephFS filesystem, but that
causes overload and takes a lot of time, while not neccesarily fixing
the num_strays.
How do I force the mds'es to process those strays so that clients do not
get 'incorrect' errors?
--
Mark Schouten | Tuxis B.V.
KvK: 74698818 | http://www.tuxis.nl/
T: +31 318 200208 | info(a)tuxis.nl
I recently configured Prometheus to scrape mgr /metrics and add
Grafana dashboards.
All daemons currently at 15.2.11
I use Hashicorp consul to advertise the active mgr in DNS, and Prometheus
points at a single DNS target. (Is anyone else using this method, or just
statically pointing Prometheus at all potentially active managers?)
All was working fine initially, and it's *mostly* still working fine. For
the first couple of days, all went well, and then a few rate metrics
stopped meaningfully increasing — essentially pegged at zero, which is
implausible in a healthy cluster. Some cluster maintenance was occurring
such as outing and recreating some OSDs, so I have a baseline for
throughput and recovery.
Metric graphs that stopped functioning:
Throughput: ceph_osd_op_r_out_bytes, ceph_osd_op_w_in_bytes,
ceph_osd_op_rw_in_bytes
Recovery: ceph_osd_recovery_ops
I can see that Grafana output is using this method of converting the
counters to rates:
sum(irate(ceph_osd_recovery_ops{job="$job"}[$interval]))
The underlying counters appear to be sane, and reading the raw values from
prometheus is also valid, so I'm guessing some failure either of the irate
or sum functions? By inspection in Grafana, the queries return correct
timestamps with zero values, so that leaves us with "sum(irate)" as the
likely source of the problem.
Does anyone have experience with this? I admit it is possibly tangential
to ceph itself, but as the Prometheus/grafana integration is more or less
supported, I thought I'd try here first among active mgr/Prometheus users.
--
Jeremy Austin
jhaustin(a)gmail.com
Hello
I have a weird problem on 3 node cluster. "Nautilus 14.2.9"
When I try power failure OSD's are not marking as DOWN and MDS do not
respond anymore.
If I manually set osd down then MDS becomes active again.
BTW: Only 2 node has OSD's. Third node is only for MON.
I've set mon_osd_down_out_interval = 0.3 in ceph.conf at global
section and restart all MON's but when I check it with "ceph daemon
mon.ID config show" I see mon_osd_down_out_interval: "600". I didn't
get it why its still "600" and honestly I don't know even it has any
effect on my problem.
Where should I check?