May 2023 - ceph-users - lists.ceph.io

by johnnyjohnnypd＠gmail.com

Hi! I am trying to read only part of an object by specifying the non-trivial offset and length of the read function: `librados::IoCtxImpl::read(const object_t& oid, bufferlist& bl, size_t len, uint64_t off)` from `IoCtxImpl.cc`. However, after connecting to an erasure code pool (e.g., 12+4), I try to read data from a randomly chosen OSD (that is, 1/12 of the object), but the results of command `vmstat -d` and `iostat` show that the entire object was read, since read operations appeared on all 12 OSDs. So, I wonder if librados doesn't support the real sub-read of an object, and what should I do if I want to implement this function. Thanks!

11 months, 4 weeks

1
0
0 0

mgr memory usage constantly increasing

by Tobias Hachmer

Hello list, we have found that the active mgr process in our 3-node CEPH cluster takes a lot of memory. After start the memory usage is constantly increasing. After 6 days the process takes ~67GB: ~# ps -p 7371 -o rss,%mem,cmd RSS %MEM CMD 71053880 26.9 /usr/bin/ceph-mgr -n mgr.hostname.nvwzhc -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-journald=true --default-log-to-stderr=false Cluster Specs: - 3-node 3-way mirror - each node has 256GB memory - each node has 10x 7.68TB NVMe, each NVMe is split into 2 OSDs - monitoring node is a separate VM (separate Hypervisor) - iSCSI and NFS Gateway are located on 2 separate VMs (separate Hypervisor) - main purpose is CephFS with currently ~15.42 million objects Is this normal behaviour or might we have a misconfiguration somewhere? What can we do to dig into this further? ~# ceph status cluster: id: f5129939-964b-11ed-bb6a-f7caa5af2f56 health: HEALTH_OK services: mon: 3 daemons, quorum host1,host2,host3 (age 6d) mgr: host2.nvwzhc(active, since 6d), standbys: host1.wwczzn mds: 1/1 daemons up, 1 standby, 1 hot standby osd: 60 osds: 60 up (since 6d), 60 in (since 8w) tcmu-runner: 2 portals active (2 hosts) data: volumes: 1/1 healthy pools: 6 pools, 2161 pgs objects: 15.42M objects, 45 TiB usage: 135 TiB used, 75 TiB / 210 TiB avail pgs: 2161 active+clean Thanks and kind regards Tobias Hachmer

11 months, 4 weeks

3
4
0 0

Ceph Tech Talk For May 2023: RGW Lua Scripting Code Walkthrough

by Mike Perez

Hello everyone, Join us on May 24th at 17:00 UTC for a long overdue Ceph Tech Talk! This month, Yuval Lifshitz will give an RGW Lua Scripting Code Walkthrough. https://ceph.io/en/community/tech-talks/ You can also see Yuval's previous presentation at Ceph Month 2021, From Open Source to Open Ended in Ceph with Lua. https://www.youtube.com/watch?v=anQJugs27hE If you want to give a technical presentation for Ceph Tech Talks, please contact me directly with a title and description. Thank you! -- Mike Perez Community Manager Ceph Foundation

11 months, 4 weeks

2
1
0 0

Training on ceph fs

by Emmanuel Jaep

Hi, I inherited a ceph fs cluster. Even if I have years of experience in systems management, I fail to grasp the complete logic of it fully. From what I found on the web, the documentation is either too "high level" or too detailed. Do you know any good resources to get fully acquainted with ceph (specifically, ceph fs)? Or any live training I could attend? Thanks in advance for your help, Emmanuel

11 months, 4 weeks

2
1
0 0

Re: MDS crashes to damaged metadata

by Stefan Kooman

On 12/15/22 15:31, Stolte, Felix wrote: > Hi Patrick, > > we used your script to repair the damaged objects on the weekend and it went smoothly. Thanks for your support. > > We adjusted your script to scan for damaged files on a daily basis, runtime is about 6h. Until thursday last week, we had exactly the same 17 Files. On thursday at 13:05 a snapshot was created and our active mds crashed once at this time (snapshot was created): Are you willing to share this script? I would like to use it to scan our CephFS before upgrading to 16.2.13. Do you run this script when the filesystem is online / active? Thanks, Gr. Stefan

11 months, 4 weeks

2
3
0 0

cephfs max_file_size

by Dietmar Rieder

Hi, can the cephfs "max_file_size" setting be changed at any point in the lifetime of a cephfs? Or is it critical for existing data if it is changed after some time? Is there anything to consider when changing, let's say, from 1TB (default) to 4TB ? We are running the latest Nautilus release, BTW. Thanks in advance Dietmar

11 months, 4 weeks

3
5
0 0

Slow recovery on Quincy

by Sake Paulusma

We noticed extremely slow performance when remapping is necessary. We didn't do anything special other than assigning the correct device_class (to ssd). When checking ceph status, we notice the number of objects recovering is around 17-25 (with watch -n 1 -c ceph status). How can we increase the recovery process? There isn't any client load, because we're going to migrate to this cluster in the future, so only a rsync once a while is being executed. [ceph: root@pwsoel12998 /]# ceph status cluster: id: da3ca2e4-ee5b-11ed-8096-0050569e8c3b health: HEALTH_WARN noscrub,nodeep-scrub flag(s) set services: mon: 5 daemons, quorum pqsoel12997,pqsoel12996,pwsoel12994,pwsoel12998,prghygpl03 (age 3h) mgr: pwsoel12998.ylvjcb(active, since 3h), standbys: pqsoel12997.gagpbt mds: 4/4 daemons up, 2 standby osd: 32 osds: 32 up (since 73m), 32 in (since 6d); 10 remapped pgs flags noscrub,nodeep-scrub data: volumes: 2/2 healthy pools: 5 pools, 193 pgs objects: 13.97M objects, 853 GiB usage: 3.5 TiB used, 12 TiB / 16 TiB avail pgs: 755092/55882956 objects misplaced (1.351%) 183 active+clean 10 active+remapped+backfilling io: recovery: 2.3 MiB/s, 20 objects/s

11 months, 4 weeks

8
17
0 0

[cephfs-data-scan] Estimate time for scanning extents and inodes

by Justin Li

Dear All, I'm using metadata repair tools to repair a damaged MDS following below document. My storage has about 276TB data. Cephfs-data-scan is using 32 workers. How long will it take to finish scanning extents? What about scanning inodes? It has run 6 hours and metadata pool dropped 1G. Is this normal? Thanks. ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 1.3 PiB 454 TiB 835 TiB 835 TiB 64.79 ssd 7.1 TiB 6.2 TiB 878 GiB 878 GiB 12.15 TOTAL 1.3 PiB 460 TiB 836 TiB 836 TiB 64.50 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL cephfs_data 1 4096 276 TiB 222.96M 828 TiB 71.28 111 TiB cephfs_metadata 2 32 101 GiB 5.83M 302 GiB 0.09 111 TiB device_health_metrics 10 1 239 MiB 251 716 MiB 0 112 TiB cephfs_ssd 11 32 117 GiB 1.49M 361 GiB 5.83 1.9 TiB https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/ Justin Li Senior Technical Officer School of Information Technology Faculty of Science, Engineering and Built Environment Request for assistance can be lodged to the SIT Technical Team using this form<https://deakinesmprod.service-now.com/esc?id=sc_cat_item&sys_id=7afa8fa5db6…> Deakin University Melbourne Burwood Campus, 221 Burwood Highway, Burwood, VIC 3125 +61 3 9246 8932 justin.li(a)deakin.edu.au<mailto:justin.li@deakin.edu.au> http://www.deakin.edu.au<http://www.deakin.edu.au/> Deakin University CRICOS Provider Code 00113B Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free. Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free.

11 months, 4 weeks

1
0
0 0

RGW: How to restore the index and metadata of a bucket

by huy nguyen

There is a test bucket, I have removed its index and metadata: radosgw-admin bi purge --bucket abccc --yes-i-really-mean-it radosgw-admin metadata rm bucket.instance:abccc:17a4ce99-009e-40f2-a2d2-2afc218ebd9b.425824299.4 Now the index and metadata is gone, but how to clean its data? Or is there any way to restore the index and metadata? I have tried to fix the index, but it doesn't work: radosgw-admin bucket check --check-objects --fix --bucket abccc 2023-05-24 16:46:41.896 7f32d5890a80 -1 ERROR: get_bucket_instance_from_oid failed: -2 2023-05-24 16:46:41.896 7f32d5890a80 0 could not get bucket info for bucket=abccc Same result with bucket rm: radosgw-admin bucket rm --bucket abccc --purge-objects 2023-05-24 16:47:49.439 7fafb7939a80 -1 ERROR: get_bucket_instance_from_oid failed: -2 2023-05-24 16:47:49.439 7fafb7939a80 0 could not get bucket info for bucket=abccc

11 months, 4 weeks

1
0
0 0

Timeout in Dashboard

by mailing-lists

Hey all, im facing a "minor" problem. I do not always get results when going to the dashboard, under Block->Images in the tab Images or Namespaces. The little refresh button will keep spinning and sometimes after several minutes it will finally show something. That is odd, because from the shell I am not seeing any problems like that. rbd namespace ls as well as: rbd ls --namespace someNS ...will give immediately their results. Does anyone know whats causing this? I did not yet find anything in the logs. Best Ken

11 months, 4 weeks

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users May 2023