Hi,
Is there any config on Ceph that block/not perform space reclaim?
I test on one pool which has only one image 1.8 TiB in used.
rbd $p du im/root
warning: fast-diff map is not enabled for root. operation may be slow.
NAME PROVISIONED USED
root 2.2 TiB 1.8 TiB
I already removed all snaphots and now pool has only one image alone.
I run both fstrim over the filesystem (XFS) and try rbd sparsify im/root (don't know what it is exactly but it mentions to reclaim something)
It still shows the pool used 6.9 TiB which totally not make sense right? It should be up to 3.6 (1.8 * 2) according to its replica?
POOLS:
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
im 19 32 3.5 TiB 918.34k 6.9 TiB 4.80 69 TiB N/A 10 TiB 918.34k 0 B 0 B
I think now some of others pool have this issue too, we do clean up a lot but seems space not reclaimed.
I estimate more than 50 TiB should be able to reclaim, actual usage of this cluster much less than current reported number.
Thank you for your help.
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
Hi,
One of our 16.2.14 cluster OSDs crashed again because of the dreaded
https://tracker.ceph.com/issues/53906 bug. Usually an OSD, which crashed
because of this bug, restarts within seconds and continues normal
operation. This time it failed to restart and kept crashing:
"assert_condition": "abort",
"assert_file":
"/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.14/rpm/el8/BUILD/ceph-16.2.14/src/blk/kernel/KernelDevice.cc",
"assert_func": "void KernelDevice::_aio_thread()",
"assert_line": 604,
"assert_msg":
"/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.14/rpm/el8/BUILD/ceph-16.2.14/src/blk/kernel/KernelDevice.cc:
In function 'void KernelDevice::_aio_thread()' thread 7f08520e2700 time
2023-12-03T04:00:36.689614+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.14/rpm/el8/BUILD/ceph-16.2.14/src/blk/kernel/KernelDevice.cc:
604: ceph_abort_msg(\"Unexpected IO error. This may suggest HW issue.
Please check your dmesg!\")\n",
"assert_thread_name": "bstore_aio",
"backtrace": [
"/lib64/libpthread.so.0(+0x12cf0) [0x7f085e308cf0]",
"gsignal()",
"abort()",
"(ceph::__ceph_abort(char const*, int, char const*,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&)+0x1b6) [0x55f01d9494cb]",
"(KernelDevice::_aio_thread()+0x1285) [0x55f01e4b5c15]",
"(KernelDevice::AioCompletionThread::entry()+0x11)
[0x55f01e4c0ee1]",
"/lib64/libpthread.so.0(+0x81ca) [0x7f085e2fe1ca]",
"clone()"
],
There was nothing in dmesg though and the block device looked healthy. I
took the OSD down, ran a long SMART test on its block drive, ran a read
test on the drive and found no issues. I tried restarting the OSD again and
found in its debug that it failed because of an
"2023-12-03T04:00:36.686+0000 7f08520e2700 -1 bdev(0x55f02a28a400
/var/lib/ceph/osd/ceph-56/block) _aio_thread got r=-1 ((1) Operation not
permitted)" error: https://pastebin.com/gDat6rfk
I remember hitting this previously:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/GYL72G3F4P…,
and this time a host reboot completely resolved the issue.
It would be good to understand what has triggered this condition and how it
can be resolved without rebooting the whole host. I would very much
appreciate any suggestions.
Best regards,
Zakhar
Dear Ceph users,
Our CephFS is not releasing/freeing up space after deleting hundreds of
terabytes of data.
By now, this drives us in a "nearfull" osd/pool situation and thus
throttles IO.
We are on ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5)
quincy (stable).
Recently, we moved a bunch of data to a new pool with better EC.
This was done by adding a new EC pool to the FS.
Then assigning the FS root to the new EC pool via the directory layout xattr
(so all new data is written to the new pool).
And finally copying old data to new folders.
I swapped the data as follows to remain the old directory structures.
I also made snapshots for validation purposes.
So basically:
cp -r mymount/mydata/ mymount/new/ # this creates copy on new pool
mkdir mymount/mydata/.snap/tovalidate
mkdir mymount/new/mydata/.snap/tovalidate
mv mymount/mydata/ mymount/old/
mv mymount/new/mydata mymount/
I could see the increase of data in the new pool as expected (ceph df).
I compared the snapshots with hashdeep to make sure the new data is alright.
Then I went ahead deleting the old data, basically:
rmdir mymount/old/mydata/.snap/* # this also included a bunch of other
older snapshots
rm -r mymount/old/mydata
At first we had a bunch of PGs with snaptrim/snaptrim_wait.
But they are done for quite some time now.
And now, already two weeks later the size of the old pool still hasn't
really decreased.
I'm still waiting for around 500 TB to be released (and much more is
planned).
I honestly have no clue, where to go from here.
From my point of view (i.e. the CephFS mount), the data is gone.
I also never hard/soft-linked it anywhere.
This doesn't seem to be a regular issue.
At least I couldn't find anything related or resolved in the docs or
user list, yet.
If anybody has an idea how to resolve this, I would highly appreciate it.
Best Wishes,
Mathias
Hi all,
For about a week our CephFS has experienced issues with its MDS.
Currently the MDS is stuck in "up:rejoin"
Issues become apparent when simple commands like "mv foo bar/" hung.
I unmounted CephFS offline on the clients, evicted those remaining, and then issued
ceph config set mds.0 mds_wipe_sessions true
ceph config set mds.1 mds_wipe_sessions true
which allowed me to delete the hung requests.
I've lost the exact commands I used, but something like
rados -p cephfs_metadata ls | grep mds
rados rm -p cephfs_metadata mds0_openfiles.0
etc
This allowed the MDS to get to "up:rejoin" where it has been stuck ever since which is getting on five days.
# ceph mds stat
cephfs:1/1 {0=cephfs.ceph00.uvlkrw=up:rejoin} 2 up:standby
root@ceph00:/var/log/ceph/a614303a-5eb5-11ed-b492-011f01e12c9a# ceph -s
cluster:
id: a614303a-5eb5-11ed-b492-011f01e12c9a
health: HEALTH_WARN
1 filesystem is degraded
1 pgs not deep-scrubbed in time
2 pool(s) do not have an application enabled
1 daemons have recently crashed
services:
mon: 3 daemons, quorum ceph00,ceph01,ceph02 (age 57m)
mgr: ceph01.lvdgyr(active, since 2h), standbys: ceph00.gpwpgs
mds: 1/1 daemons up, 2 standby
osd: 91 osds: 90 up (since 78m), 90 in (since 112m)
data:
volumes: 0/1 healthy, 1 recovering
pools: 5 pools, 1539 pgs
objects: 138.83M objects, 485 TiB
usage: 971 TiB used, 348 TiB / 1.3 PiB avail
pgs: 1527 active+clean
12 active+clean+scrubbing+deep
io:
client: 3.1 MiB/s rd, 3.16k op/s rd, 0 op/s wr
# ceph --version
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
I've tried failing the MDS so it switches. Rebooted a couple of times.
I've added more OSDs to the metadata pool and took one out as I thought it might be a bad metadata OSD (The "recently crashed" daemon).
The error logs are full of
(prefix to all are:
Nov 27 14:02:44 ceph00 bash[2145]: debug 2023-11-27T14:02:44.619+0000 7f74e845e700 1 -- [v2:192.168.1.128:6800/2157301677,v1:192.168.1.128:6801/2157301677] --> [v2:192.168.1.133:6896/4289132926,v1:192.168.1.133:6897/4289132926]
)
crc :-1 s=READY pgs=12 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).send_message enqueueing message m=0x559be00adc00 type=42 osd_op(mds.0.36244:8142873 3.ff 3:ff5b34d6:::1.00000000:head [getxattr parent in=6b] snapc 0=[] ondisk+read+known_if_redirected+full_force+supports_pool_eio e32465) v8
crc :-1 s=READY pgs=12 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).write_message sending message m=0x559be00adc00 seq=8142643 osd_op(mds.0.36244:8142873 3.ff 3:ff5b34d6:::1.00000000:head [getxattr parent in=6b] snapc 0=[] ondisk+read+known_if_redirected+full_force+supports_pool_eio e32465) v8
crc :-1 s=THROTTLE_DONE pgs=12 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_message got 154 + 0 + 30 byte message. envelope type=43 src osd.89 off 0
crc :-1 s=READ_MESSAGE_COMPLETE pgs=12 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_message received message m=0x559be01f4480 seq=8142643 from=osd.89 type=43 osd_op_reply(8142873 1.00000000 [getxattr (30) out=30b] v0'0 uv560123 ondisk = 0) v8
osd_op_reply(8142873 1.00000000 [getxattr (30) out=30b] v0'0 uv560123 ondisk = 0) v8 ==== 154+0+30 (crc 0 0 0) 0x559be01f4480 con 0x559be00ad800
osd_op(unknown.0.36244:8142874 3.ff 3:ff5b34d6:::1.00000000:head [getxattr parent in=6b] snapc 0=[] ondisk+read+known_if_redirected+full_force+supports_pool_eio e32465) v8 -- 0x559be2caec00 con 0x559be00ad800
Repeating multiple times a second (and filling /var)
Prior to taking one of the cephfs_metadata OSDs offline, these came from communications from ceph00 to the node hosting the suspected bad OSD.
Now they are between ceph00 and the host of the replacement metadata OSD.
Does anyone have any suggestion on how to get the MDS to switch from "up:rejoin" to "up:active"?
Is there any way to debug this, to determine what issue really is? I'm unable to interpret the debug log.
Cheers,
Eric
________________________________________________________
Dr Eric Tittley
Research Computing Officer www.roe.ac.uk/~ert<http://www.roe.ac.uk/~ert>
Institute for Astronomy Royal Observatory, Edinburgh
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
Hi Zitat,
I'm confused - doesn't k4 m2 mean that you can loose any 2 out of the 6
osds?
Cheers
Dulux-Oz
On 05/12/2023 20:02, ceph-users-request(a)ceph.io wrote:
> Send ceph-users mailing list submissions to
> ceph-users(a)ceph.io
>
> To subscribe or unsubscribe via email, send a message with subject or
> body 'help' to
> ceph-users-request(a)ceph.io
>
> You can reach the person managing the list at
> ceph-users-owner(a)ceph.io
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of ceph-users digest..."
>
> Today's Topics:
>
> 1. Re: EC Profiles & DR (David Rivera)
> 2. Re: EC Profiles & DR (duluxoz)
> 3. Re: EC Profiles & DR (Eugen Block)
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
HI, Experts,
we are using cephfs with 16.2.* with multi active mds, and recently, we have two nodes mount with ceph-fuse due to the old os system.
and one nodes run a python script with `glob.glob(path)`, and another client doing `cp` operation on the same path.
then we see some log about `mds slow request`, and logs complain “failed to authpin, subtree is being exported"
then need to restart mds,
our question is, does there any dead lock? how can we avoid this and how to fix it without restart mds(it will influence other users) ?
Thanks a ton!
xz
Hi,
I have an image with a snapshot and some changes after snapshot.
```
$ rbd du backup/f0408e1e-06b6-437b-a2b5-70e3751d0a26
NAME PROVISIONED USED
f0408e1e-06b6-437b-a2b5-70e3751d0a26@snapshot-eb085877-7557-4620-9c01-c5587b857029 10 GiB 2.4 GiB
f0408e1e-06b6-437b-a2b5-70e3751d0a26 10 GiB 2.4 GiB
<TOTAL> 10 GiB 4.8 GiB
```
If there is no changes after snapshot, the image line will show 0 used.
I did export and import.
```
$ rbd export --export-format 2 backup/f0408e1e-06b6-437b-a2b5-70e3751d0a26 - | rbd import --export-format 2 - backup/test
Exporting image: 100% complete...done.
Importing image: 100% complete...done.
```
When check the imported image, the image line shows 0 used.
```
$ rbd du backup/test
NAME PROVISIONED USED
test@snapshot-eb085877-7557-4620-9c01-c5587b857029 10 GiB 2.4 GiB
test 10 GiB 0 B
<TOTAL> 10 GiB 2.4 GiB
```
Any clues how that happened? I'd expect the same du as the source.
I tried another quick test. It works fine.
```
$ rbd create backup/test-src --size 10G
$ sudo rbd map backup/test-src
/dev/rbd0
$ echo "hello" | sudo tee /dev/rbd0
hello
$ rbd du backup/test-src
NAME PROVISIONED USED
test-src 10 GiB 4 MiB
$ rbd snap create backup/test-src@snap-1
Creating snap: 100% complete...done.
$ rbd du backup/test-src
NAME PROVISIONED USED
test-src@snap-1 10 GiB 4 MiB
test-src 10 GiB 0 B
<TOTAL> 10 GiB 4 MiB
$ echo "world" | sudo tee /dev/rbd0
world
$ rbd du backup/test-src
NAME PROVISIONED USED
test-src@snap-1 10 GiB 4 MiB
test-src 10 GiB 4 MiB
<TOTAL> 10 GiB 8 MiB
$ rbd export --export-format 2 backup/test-src - | rbd import --export-format 2 - backup/test-dst
Exporting image: 100% complete...done.
Importing image: 100% complete...done.
$ rbd du backup/test-dst
NAME PROVISIONED USED
test-dst@snap-1 10 GiB 4 MiB
test-dst 10 GiB 4 MiB
<TOTAL> 10 GiB 8 MiB
```
Thanks!
Tony
Hello Users,
We're using libvirt with KVM and the orchestrator is Cloudstack. I raised
the issue already at Cloudstack at
https://github.com/apache/cloudstack/issues/8211 but appears to be at
libvirtd. Did the same in libvirt ML at
https://lists.libvirt.org/archives/list/users@lists.libvirt.org/thread/SA2I…
but I'm now here looking for answers.
Below is our environment & issue description:
Ceph: v17.2.0
Pool: replicated
Number of block images in this pool: more than 1250
# virsh pool-info c15508c7-5c2c-317f-aa2e-29f307771415
Name: c15508c7-5c2c-317f-aa2e-29f307771415
UUID: c15508c7-5c2c-317f-aa2e-29f307771415
State: running
Persistent: no
Autostart: no
Capacity: 1.25 PiB
Allocation: 489.52 TiB
Available: 787.36 TiB
# kvm --version
QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.27)
Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers
# libvirtd --version
libvirtd (libvirt) 6.0.0
It appears that one of our Cloudstack KVM clusters having 8 hosts is having
the issue. We have HCI on these 8 hosts and there are around 700+ VMs
running. But strange enough, there are these logs like below on hosts.
Oct 25 13:38:11 hv-01 libvirtd[9464]: failed to open the RBD image
'087bb114-448a-41d2-9f5d-6865b62eed15': No such file or directory
Oct 25 20:35:22 hv-01 libvirtd[9464]: failed to open the RBD image
'ccc1168a-5ffa-4b6d-a953-8e0ac788ebc5': No such file or directory
Oct 26 09:48:33 hv-01 libvirtd[9464]: failed to open the RBD image
'a3fe82f8-afc9-4604-b55e-91b676514a18': No such file or directory
We've got DNS servers on which there is an`A` record resolving to all the
IPv4 Addresses of 5 monitors and there have not been any issues with the
DNS resolution. But the issue of "failed to open the RBD image
'ccc1168a-5ffa-4b6d-a953-8e0ac788ebc5': No such file or directory" gets
more weird because the VM that is making use of that RBD image lets say
"087bb114-448a-41d2-9f5d-6865b62eed15" is running on an altogether
different host like "hv-06". On further inspection of that specific Virtual
Machine, it has been running on that host "hv-06" for more than 4 months or
so. Fortunately, the Virtual Machine has no issues and has been running
since then. There are absolutely no issues with any of the Virtual Machines
because of these warnings.
From libvirtd mailing lists, one of the community members helped me
understand that libvirt only tries to get the info of the images and
doesn't open for reading or writing. All hosts where there is libvirtd
tries doing the same. We manually did "virsh pool-refresh" which CloudStack
itself takes care of at regular intervals and the warning messages still
appear. Please help me find the cause and let me know if further
information is needed.
Thanks,
Jayanth Reddy
Hey All,
So I got busy and failed at getting an email out with a couple days
notice for last week so let's meet up this week! We will be having a
Ceph science/research/big cluster call on Wednesday December 6th. If
anyone wants to discuss something specific they can add it to the pad
linked below. If you have questions or comments you can contact me.
This is an informal open call of community members mostly from
hpc/htc/research/big cluster environments (though anyone is welcome)
where we discuss whatever is on our minds regarding ceph. Updates,
outages, features, maintenance, etc...there is no set presenter but I do
attempt to keep the conversation lively.
NOTE: The change to using Jitsi for the meeting. We are no longer using
the bluejeans meeting links. The ceph calendar event does not yet
reflect this and has the wrong day as well.
Pad URL:
https://pad.ceph.com/p/Ceph_Science_User_Group_20231206
Virtual event details:
December 6th, 2023
15:00 UTC
4pm Central European
9am Central US
Description: Main pad for discussions:
https://pad.ceph.com/p/Ceph_Science_User_Group_Index
Meetings will be recorded and posted to the Ceph Youtube channel.
To join the meeting on a computer or mobile phone:
https://meet.jit.si/ceph-science-wg
Kevin
--
Kevin Hrpcek
NASA VIIRS Atmosphere SIPS/TROPICS
Space Science & Engineering Center
University of Wisconsin-Madison