Hi all,
I just had created a ceph cluster to use cephfs. When i create the a ceph
fs pool i get the filesystem below error.
# ceph osd pool create cephfs_data 128
pool 'cephfs_data' created
# ceph osd pool create cephfs_metadata 128
pool 'cephfs_metadata' created
# ceph fs new cephfs cephfs_metadata cephfs_data
new fs with metadata pool 6 and data pool 5
# ceph -s
cluster:
id: 1c27def45-f0f9-494d-sfke-eb4323432fd
health: HEALTH_ERR
1 filesystem is offline
1 filesystem is online with fewer MDS than max_mds
services:
mon: 2 daemons, quorum ceph-mon01,ceph-mon02
mgr: ceph-adm01(active)
mds: cephfs-0/0/1 up
osd: 12 osds: 12 up, 12 in
data:
pools: 2 pools, 256 pgs
objects: 0 objects, 0 B
usage: 12 GiB used, 588 GiB / 600 GiB avail
pgs: 256 active+clean
but when i check the max_mds for the ceph fs it says 1
# ceph fs get cephfs | grep max_mds
max_mds 1
Let anyone know what am i missing here? Any inputs is much appreciated.
Regards,
Ram
Ceph-explorer..
I have some questions for those who’ve experienced this issue.
1. It seems like those reporting this issue are seeing it strictly after upgrading to Octopus. From what version did each of these sites upgrade to Octopus? From Nautilus? Mimic? Luminous?
2. Does anyone have any lifecycle rules on a bucket experiencing this issue? If so, please describe.
3. Is anyone making copies of the affected objects (to same or to a different bucket) prior to seeing the issue? And if they are making copies, does the destination bucket have lifecycle rules? And if they are making copies, are those copies ever being removed?
4. Is anyone experiencing this issue willing to run their RGWs with 'debug_ms=1'? That would allow us to see a request from an RGW to either remove a tail object or decrement its reference counter (and when its counter reaches 0 it will be deleted).
Thanks,
Eric
> On Nov 12, 2020, at 4:54 PM, huxiaoyu(a)horebdata.cn wrote:
>
> Looks like this is a very dangerous bug for data safety. Hope the bug would be quickly identified and fixed.
>
> best regards,
>
> Samuel
>
>
>
> huxiaoyu(a)horebdata.cn <mailto:huxiaoyu@horebdata.cn>
>
> From: Janek Bevendorff
> Date: 2020-11-12 18:17
> To: huxiaoyu(a)horebdata.cn <mailto:huxiaoyu@horebdata.cn>; EDH - Manuel Rios; Rafael Lopez
> CC: Robin H. Johnson; ceph-users
> Subject: Re: [ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk
> I have never seen this on Luminous. I recently upgraded to Octopus and the issue started occurring only few weeks later.
>
> On 12/11/2020 16:37, huxiaoyu(a)horebdata.cn wrote:
> which Ceph versions are affected by this RGW bug/issues? Luminous, Mimic, Octupos, or the latest?
>
> any idea?
>
> samuel
>
>
>
> huxiaoyu(a)horebdata.cn
>
> From: EDH - Manuel Rios
> Date: 2020-11-12 14:27
> To: Janek Bevendorff; Rafael Lopez
> CC: Robin H. Johnson; ceph-users
> Subject: [ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk
> This same error caused us to wipe a full cluster of 300TB... will be related to some rados index/database bug not to s3.
>
> As Janek exposed is a mayor issue, because the error silent happend and you can only detect it with S3, when you're going to delete/purge a S3 bucket. Dropping NoSuchKey. Error is not related to S3 logic ..
>
> Hope this time dev's can take enought time to find and resolve the issue. Error happens with low ec profiles, even with replica x3 in some cases.
>
> Regards
>
>
>
> -----Mensaje original-----
> De: Janek Bevendorff <janek.bevendorff(a)uni-weimar.de <mailto:janek.bevendorff@uni-weimar.de>>
> Enviado el: jueves, 12 de noviembre de 2020 14:06
> Para: Rafael Lopez <rafael.lopez(a)monash.edu <mailto:rafael.lopez@monash.edu>>
> CC: Robin H. Johnson <robbat2(a)gentoo.org <mailto:robbat2@gentoo.org>>; ceph-users <ceph-users(a)ceph.io <mailto:ceph-users@ceph.io>>
> Asunto: [ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk
>
> Here is a bug report concerning (probably) this exact issue:
> https://tracker.ceph.com/issues/47866 <https://tracker.ceph.com/issues/47866>
>
> I left a comment describing the situation and my (limited) experiences with it.
>
>
> On 11/11/2020 10:04, Janek Bevendorff wrote:
>>
>> Yeah, that seems to be it. There are 239 objects prefixed
>> .8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh in my dump. However, there are none
>> of the multiparts from the other file to be found and the head object
>> is 0 bytes.
>>
>> I checked another multipart object with an end pointer of 11.
>> Surprisingly, it had way more than 11 parts (39 to be precise) named
>> .1, .1_1 .1_2, .1_3, etc. Not sure how Ceph identifies those, but I
>> could find them in the dump at least.
>>
>> I have no idea why the objects disappeared. I ran a Spark job over all
>> buckets, read 1 byte of every object and recorded errors. Of the 78
>> buckets, two are missing objects. One bucket is missing one object,
>> the other 15. So, luckily, the incidence is still quite low, but the
>> problem seems to be expanding slowly.
>>
>>
>> On 10/11/2020 23:46, Rafael Lopez wrote:
>>> Hi Janek,
>>>
>>> What you said sounds right - an S3 single part obj won't have an S3
>>> multipart string as part of the prefix. S3 multipart string looks
>>> like "2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme".
>>>
>>> From memory, single part S3 objects that don't fit in a single rados
>>> object are assigned a random prefix that has nothing to do with
>>> the object name, and the rados tail/data objects (not the head
>>> object) have that prefix.
>>> As per your working example, the prefix for that would be
>>> '.8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh'. So there would be (239) "shadow"
>>> objects with names containing that prefix, and if you add up the
>>> sizes it should be the size of your S3 object.
>>>
>>> You should look at working and non working examples of both single
>>> and multipart S3 objects, as they are probably all a bit different
>>> when you look in rados.
>>>
>>> I agree it is a serious issue, because once objects are no longer in
>>> rados, they cannot be recovered. If it was a case that there was a
>>> link broken or rados objects renamed, then we could work to
>>> recover...but as far as I can tell, it looks like stuff is just
>>> vanishing from rados. The only explanation I can think of is some
>>> (rgw or rados) background process is incorrectly doing something with
>>> these objects (eg. renaming/deleting). I had thought perhaps it was a
>>> bug with the rgw garbage collector..but that is pure speculation.
>>>
>>> Once you can articulate the problem, I'd recommend logging a bug
>>> tracker upstream.
>>>
>>>
>>> On Wed, 11 Nov 2020 at 06:33, Janek Bevendorff
>>> <janek.bevendorff(a)uni-weimar.de <mailto:janek.bevendorff@uni-weimar.de>
>>> <mailto:janek.bevendorff@uni-weimar.de <mailto:janek.bevendorff@uni-weimar.de>>> wrote:
>>>
>>> Here's something else I noticed: when I stat objects that work
>>> via radosgw-admin, the stat info contains a "begin_iter" JSON
>>> object with RADOS key info like this
>>>
>>>
>>> "key": {
>>> "name":
>>> "29/items/WIDE-20110924034843-crawl420/WIDE-20110924065228-02544.warc.gz",
>>> "instance": "",
>>> "ns": ""
>>> }
>>>
>>>
>>> and then "end_iter" with key info like this:
>>>
>>>
>>> "key": {
>>> "name":
>>> ".8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh_239",
>>> "instance": "",
>>> "ns": "shadow"
>>> }
>>>
>>> However, when I check the broken 0-byte object, the "begin_iter"
>>> and "end_iter" keys look like this:
>>>
>>>
>>> "key": {
>>> "name":
>>> "29/items/WIDE-20110903143858-crawl428/WIDE-20110903143858-01166.warc.gz.2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme.1",
>>> "instance": "",
>>> "ns": "multipart"
>>> }
>>>
>>> [...]
>>>
>>>
>>> "key": {
>>> "name":
>>> "29/items/WIDE-20110903143858-crawl428/WIDE-20110903143858-01166.warc.gz.2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme.19",
>>> "instance": "",
>>> "ns": "multipart"
>>> }
>>>
>>> So, it's the full name plus a suffix and the namespace is
>>> multipart, not shadow (or empty). This in itself may just be an
>>> artefact of whether the object was uploaded in one go or as a
>>> multipart object, but the second difference is that I cannot find
>>> any of the multipart objects in my pool's object name dump. I
>>> can, however, find the shadow RADOS object of the intact S3 object.
>>>
>>>
>>>
>>>
>>> --
>>> *Rafael Lopez*
>>> Devops Systems Engineer
>>> Monash University eResearch Centre
>>>
>>> T: +61 3 9905 9118 <tel:%2B61%203%209905%209118>
>>> E: rafael.lopez(a)monash.edu <mailto:rafael.lopez@monash.edu>
>>>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
Hi Community!!!
Are we logging IRC channels? I ask this because a lot of people only use
Slack, and the Slack we use doesn't have a subscription, so messages are
lost after 90 days (I believe)
I believe it's important to keep track of the technical knowledge we see
each day over IRC+Slack
Cheers!
--
Alvaro Soto
*Note: My work hours may not be your work hours. Please do not feel the
need to respond during a time that is not convenient for you.*
----------------------------------------------------------
Great people talk about ideas,
ordinary people talk about things,
small people talk... about other people.
Hello.
We're rebuilding our OSD nodes.
Once cluster worked without any issues, this one is being stubborn
I attempted to add one back to the cluster and seeing the error below
in out logs:
cephadm ['--image',
'registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160', 'pull']
2024-03-27 19:30:53,901 7f49792ed740 DEBUG /bin/podman: 4.6.1
2024-03-27 19:30:53,905 7f49792ed740 INFO Pulling container image
registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160...
2024-03-27 19:30:54,045 7f49792ed740 DEBUG /bin/podman: Trying to pull
registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160...
2024-03-27 19:30:54,266 7f49792ed740 DEBUG /bin/podman: Error:
initializing source
docker://registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160: reading
manifest 16.2.10-160 in registry.redhat.io/rhceph/rhceph-5-rhel8:
manifest unknown
2024-03-27 19:30:54,270 7f49792ed740 INFO Non-zero exit code 125 from
/bin/podman pull registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160
2024-03-27 19:30:54,270 7f49792ed740 INFO /bin/podman: stderr Trying
to pull registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160...
2024-03-27 19:30:54,270 7f49792ed740 INFO /bin/podman: stderr Error:
initializing source
docker://registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160: reading
manifest 16.2.10-160 in registry.redhat.io/rhceph/rhceph-5-rhel8:
manifest unknown
2024-03-27 19:30:54,270 7f49792ed740 ERROR ERROR: Failed command:
/bin/podman pull registry.redhat.io/rhceph/rhceph-5-rhel8:16.2.10-160
$ ceph versions
{
"mon": {
"ceph version 16.2.10-208.el8cp
(791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 1,
"ceph version 16.2.10-248.el8cp
(0edb63afd9bd3edb333364f2e0031b77e62f4896) pacific (stable)": 2
},
"mgr": {
"ceph version 16.2.10-208.el8cp
(791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 1,
"ceph version 16.2.10-248.el8cp
(0edb63afd9bd3edb333364f2e0031b77e62f4896) pacific (stable)": 2
},
"osd": {
"ceph version 16.2.10-160.el8cp
(6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 160
},
"mds": {},
"rgw": {
"ceph version 16.2.10-208.el8cp
(791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 3
},
"overall": {
"ceph version 16.2.10-160.el8cp
(6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 160,
"ceph version 16.2.10-208.el8cp
(791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 5,
"ceph version 16.2.10-248.el8cp
(0edb63afd9bd3edb333364f2e0031b77e62f4896) pacific (stable)": 4
}
}
I don't understand why it's trying to pull 16.2.10-160 which doesn't exist.
registry.redhat.io/rhceph/rhceph-5-dashboard-rhel8 5 93b3137e7a65 11
months ago 696 MB
registry.redhat.io/rhceph/rhceph-5-rhel8 5-416 838cea16e15c 11 months
ago 1.02 GB
registry.redhat.io/openshift4/ose-prometheus v4.6 ec2d358ca73c 17
months ago 397 MB
This happens using cepadm-ansible as well as
$ ceph orch ls --export --service_name xxx > xxx.yml
$ sudo ceph orch apply -i xxx.yml
I tried ceph orch daemon add osd host:/dev/sda
which surprisingly created a volume on host:/dev/sda and created an
osd i can see in
$ ceph osd tree
but It did not get added to host I suspect because of the same Podman
error and now I'm unable remove it.
$ ceph orch osd rm
does not work even with the --force flag.
I stopped the removal with
$ ceph orch osd rm stop
after 10+ minutes
I'm considering running $ ceph osd purge osd# --force but worried it
may only make things worse.
ceph -s shows that osd but not up or in.
Thanks, and looking forward to any advice!
Hello Ceph Gurus!
I'm running Ceph Pacific version.
if I run
ceph orch host ls --label osds
shows all hosts label osds
or
ceph orch host ls --host-pattern host1
shows just host1
it works as expected
But combining the two the label tag seems to "take over"
ceph orch host ls --label osds --host-pattern host1
6 hosts in cluster who had label osds whose hostname matched host1
shows all host with the label osds instead of only host1.
So at first the flags seem to act like an OR instead of an AND.
ceph orch host ls --label osds --host-pattern foo
6 hosts in cluster who had label osds whose hostname matched foo
even though "foo" doesn't even exist
ceph orch host ls --label bar --host-pattern host1
0 hosts in cluster who had label bar whose hostname matched host1
if the label and host combo was an OR this should have worked
there is no label bar but host1 exists so it just disregards the host-pattern.
This started because the osd deployment task had both label and host_pattern.
The cluster was attempting to deploy OSDS on all the servers with the
given tag instead of the one host we needed,
which caused it to go into warning state.
If I ran
ceph orch ls --export --service_name host1
it also showed both tags and host_pattern.
unmanaged: false
placement:
host_pattern:
label:
The issue persisted until I removed the label tag.
Thanks.
Hello,
I have a question regarding the default pool of a cephfs.
According to the docs it is recommended to use a fast ssd replicated
pool as default pool for cephfs. I'm asking what are the space
requirements for storing the inode backtrace information?
Let's say I have a 85 TiB replicated ssd pool (hot data) and as 3 PiB EC
data pool (cold data).
Does it make sense to create a third pool as default pool which only
holds the inode backtrace information (what would be a good size), or is
it OK to use the ssd pool as default pool?
Thanks
Dietmar
I am dealing with a cluster that is having terrible performance with partial reads from an erasure coded pool. Warp tests and s3bench tests result in acceptable performance but when the application hits the data, performance plummets. Can anyone clear this up for me, When radosgw gets a partial read does it have to assemble all the rados objects that make up the s3 object before returning the range? With a replicated poll i am seeing 6 to 7 GiB/s of read performance and only 1GiB/s of read from the erasure coded pool which leads me to believe that the replicated pool is returning just the rados objects for the partial s3 object and the erasure coded pool is not.
Hi,
CephFS is provided as a shared file system service in a private cloud environment of our company, LINE. The number of sessions is approximately more than 5,000, and session evictions occur several times a day. When session eviction occurs, the message 'Cannot send after transport endpoint shutdown' or 'Permission denied' is displayed and file system access is not possible. Our users are very uncomfortable with this issue. In particular, there are no special problems such as network connection or CPU usage. When I access the machine and take a close look, there are no special problems. After this, users feel the inconvenience of having to perform umount/mount tasks and run the application again. In a Kubernetes environment, recovery is a bit more complicated, which causes a lot of frustration.
I tested that by setting the mds_session_blocklist_on_timeout and mds_session_blocklist_on_evict options to false and setting client_reconnect_stale to true on the client side, the file system can be accessed even if eviction occurs. It seemed like there was no major problem accessing the file system as the session was still attached.
What I'm curious about is if I turn on the above option, will there be any other side effects? For example, should I take some action if the integrity of the file is broken or if there is an issue on the mds side? I am asking a question because there are no details regarding this in the official CephFS documentation.
Thank you
Yongseok
Hi,
ceph fs subvolume getpath cephfs cluster_A_subvolume cephfs_data_pool_ec21_subvolumegroup
/volumes/cephfs_data_pool_ec21_subvolumegroup/cluster_A_subvolume/0f90806d-0d70-4fe1-9e2b-f958056ef0c9
If the subvolume got deleted, is it possible to recreate the subvolume with the same absolute path? so that yml specs that use the volume paths need not change
Thank you,
Anantha