Hello Team,
We've integrated Ceph cluster storage with Kubernetes and provisioning
volumes through rbd-provisioner. When we're creating volumes from yaml
files in Kubernetes, pv > pvc > mounting to pod, In kubernetes end pvc are
showing as meaningful naming convention as per yaml file defined. But in
ceph cluster, rbd image name is creating with dynamic uid.
During troubleshooting time, this will be tedious to find exact rbd image.
Please find the provisioning logs in below pasted snippet.
kubectl get pods,pv,pvc
NAME READY STATUS RESTARTS AGE
pod/sleepypod 1/1 Running 0 4m9s
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON
AGE
persistentvolume/pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5 1Gi RWO Delete
Bound default/test-dyn-pvc ceph-rbd 4m9s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/test-dyn-pvc Bound
pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5 1Gi RWO ceph-rbd 4m11s
*rbd-provisioner logs*
I1121 10:59:15.009012 1 provision.go:132] successfully created rbd image
"kubernetes-dynamic-pvc-f4eac482-0c4d-11ea-8d70-8a582e0eb4e2" I1121
10:59:15.009092 1 controller.go:1087] provision "default/test-dyn-pvc"
class "ceph-rbd": volume "pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5"
provisioned I1121 10:59:15.009138 1 controller.go:1101] provision
"default/test-dyn-pvc" class "ceph-rbd": trying to save persistentvvolume
"pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5" I1121 10:59:15.020418 1
controller.go:1108] provision "default/test-dyn-pvc" class "ceph-rbd":
persistentvolume "pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5" saved I1121
10:59:15.020476 1 controller.go:1149] provision "default/test-dyn-pvc"
class "ceph-rbd": succeeded I1121 10:59:15.020802 1 event.go:221]
Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default",
Name:"test-dyn-pvc", UID:"cd37d2d6-cecc-4a05-9736-c8d80abde7f5",
APIVersion:"v1", ResourceVersion:"24545639", FieldPath:""}): type: 'Normal'
reason: 'ProvisioningSucceeded' Successfully provisioned volume
pvc-cd37d2d6-cecc-4a05-9736-c8d80abde7f5
*rbd image details in Ceph cluster end*
rbd -p kube ls --long
NAME SIZE PARENT FMT PROT LOCK
kubernetes-dynamic-pvc-f4eac482-0c4d-11ea-8d70-8a582e0eb4e2 1 GiB 2
is there way to setup proper naming convention for rbd image as well during
kubernetes deployment itself.
Kubernetes version: v1.15.5
Ceph cluster version: 14.2.2 nautilus (stable)
*Best Regards,*
*Palanisamy*
Hi,
running 14.2.6, debian buster (backports).
Have set up a cephfs with 3 data pools and one metadata pool:
myfs_data, myfs_data_hdd, myfs_data_ssd, and myfs_metadata.
The data of all files are with the use of ceph.dir.layout.pool either
stored in the pools myfs_data_hdd or myfs_data_ssd. This has also been
checked by dumping the ceph.file.layout.pool attributes of all files.
The filesystem has 1617949 files and 36042 directories.
There are however approximately as many objects in the first pool created
for the cephfs, myfs_data, as there are files. They also becomes more or
fewer as files are created or deleted (so cannot be some leftover from
earlier exercises). Note how the USED size is reported as 0 bytes,
correctly reflecting that no file data is stored in them.
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
myfs_data 0 B 1618229 0 4854687 0 0 0 2263590 129 GiB 23312479 124 GiB 0 B 0 B
myfs_data_hdd 831 GiB 136309 0 408927 0 0 0 106046 200 GiB 269084 277 GiB 0 B 0 B
myfs_data_ssd 43 GiB 1552412 0 4657236 0 0 0 181468 2.3 GiB 4661935 12 GiB 0 B 0 B
myfs_metadata 1.2 GiB 36096 0 108288 0 0 0 4828623 82 GiB 1355102 143 GiB 0 B 0 B
Is this expected?
I was assuming that in this scenario, all objects, both their data and any
keys would be either in the metadata pool, or the two pools where the
objects are stored.
Is it some additional metadata keys that are stored in the first
created data pool for cephfs? This would not be so nice in case the osd
selection rules for it are using worse disks than the data itself...
Btw: is there any tool to see the amount of key value data size associated
with a pool? 'ceph osd df' gives omap and meta for osds, but not broken
down per pool.
Best regards,
Håkan
hi,every one,
my ceph version 12.2.12,I want to set require min compat client
luminous,I use command
#ceph osd set-require-min-compat-client luminous
but ceph report:Error EPERM: cannot set require_min_compat_client to
luminous: 4 connected client(s) look like jewel (missing
0xa00000000200000); add --yes-i-really-mean-it to do it anyway
[root@node-1 ~]# ceph features
{
"mon": {
"group": {
"features": "0x3ffddff8eeacfffb",
"release": "luminous",
"num": 3
}
},
"osd": {
"group": {
"features": "0x3ffddff8eeacfffb",
"release": "luminous",
"num": 15
}
},
"client": {
"group": {
"features": "0x40106b84a842a52",
"release": "jewel",
"num": 4
},
"group": {
"features": "0x3ffddff8eeacfffb",
"release": "luminous",
"num": 168
}
}
}
so,I run command:
[root@node-1 gyt]# ceph osd set-require-min-compat-client luminous
--yes-i-really-mean-it
set require_min_compat_client to luminous
but now,I want to set require min compat client jewel,I use command:
[root@node-1 gyt]# ceph osd set-require-min-compat-client jewel
Error EPERM: osdmap current utilizes features that require luminous;
cannot set require_min_compat_client below that to jewel
what‘s the way we are set luminous chang to jewel?
Upgraded to 14.2.7, doesn't appear to have affected the behavior. As requested:
~$ ceph tell mds.mds1 heap stats
2020-02-10 16:52:44.313 7fbda2cae700 0 client.59208005
ms_handle_reset on v2:x.x.x.x:6800/3372494505
2020-02-10 16:52:44.337 7fbda3cb0700 0 client.59249562
ms_handle_reset on v2:x.x.x.x:6800/3372494505
mds.mds1 tcmalloc heap stats:------------------------------------------------
MALLOC: 50000388656 (47684.1 MiB) Bytes in use by application
MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist
MALLOC: + 174879528 ( 166.8 MiB) Bytes in central cache freelist
MALLOC: + 14511680 ( 13.8 MiB) Bytes in transfer cache freelist
MALLOC: + 14089320 ( 13.4 MiB) Bytes in thread cache freelists
MALLOC: + 90534048 ( 86.3 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 50294403232 (47964.5 MiB) Actual memory used (physical + swap)
MALLOC: + 50987008 ( 48.6 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 50345390240 (48013.1 MiB) Virtual address space used
MALLOC:
MALLOC: 260018 Spans in use
MALLOC: 20 Thread heaps in use
MALLOC: 8192 Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.
~$ ceph tell mds.mds1 heap release
2020-02-10 16:52:47.205 7f037eff5700 0 client.59249625
ms_handle_reset on v2:x.x.x.x:6800/3372494505
2020-02-10 16:52:47.237 7f037fff7700 0 client.59249634
ms_handle_reset on v2:x.x.x.x:6800/3372494505
mds.mds1 releasing free RAM back to system.
The pools over 15 minutes or so:
~$ ceph daemon mds.mds1 dump_mempools | jq .mempool.by_pool.buffer_anon
{
"items": 2045,
"bytes": 3069493686
}
~$ ceph daemon mds.mds1 dump_mempools | jq .mempool.by_pool.buffer_anon
{
"items": 2445,
"bytes": 3111162538
}
~$ ceph daemon mds.mds1 dump_mempools | jq .mempool.by_pool.buffer_anon
{
"items": 7850,
"bytes": 7658678767
}
~$ ceph daemon mds.mds1 dump_mempools | jq .mempool.by_pool.buffer_anon
{
"items": 12274,
"bytes": 11436728978
}
~$ ceph daemon mds.mds1 dump_mempools | jq .mempool.by_pool.buffer_anon
{
"items": 13747,
"bytes": 11539478519
}
~$ ceph daemon mds.mds1 dump_mempools | jq .mempool.by_pool.buffer_anon
{
"items": 14615,
"bytes": 13859676992
}
~$ ceph daemon mds.mds1 dump_mempools | jq .mempool.by_pool.buffer_anon
{
"items": 23267,
"bytes": 22290063830
}
~$ ceph daemon mds.mds1 dump_mempools | jq .mempool.by_pool.buffer_anon
{
"items": 44944,
"bytes": 40726959425
}
And one about a minute after the heap release showing continued growth:
~$ ceph daemon mds.mds1 dump_mempools | jq .mempool.by_pool.buffer_anon
{
"items": 50694,
"bytes": 47343942094
}
This is on a single active MDS with 2 standbys, scan for about a
million files with about 20 parallel threads on two clients, open and
read each if it exists.
On Wed, Jan 22, 2020 at 8:25 AM John Madden <jmadden.com(a)gmail.com> wrote:
>
> > Couldn't John confirm that this is the issue by checking the heap stats and triggering the release via
> >
> > ceph tell mds.mds1 heap stats
> > ceph tell mds.mds1 heap release
> >
> > (this would be much less disruptive than restarting the MDS)
>
> That was my first thought as well, but `release` doesn't appear to do
> anything in this case.
>
> John
Hi Folks
We are using Ceph as our storage backend on our 6 Node Proxmox VM Cluster. To Monitor our systems we use Zabbix and i would like to get some Ceph Data into our Zabbix to get some alarms when something goes wrong.
Ceph mgr has a module, "zabbix" that uses "zabbix-sender" to actively send data, but i cannot get the module working. It always responds with "failed to send data"
The network side seems to be fine:
root@vm-2:~# traceroute 192.168.15.253
traceroute to 192.168.15.253 (192.168.15.253), 30 hops max, 60 byte packets
1 192.168.15.253 (192.168.15.253) 0.411 ms 0.402 ms 0.393 ms
root@vm-2:~# nmap -p 10051 192.168.15.253
Starting Nmap 7.70 ( https://nmap.org ) at 2019-09-18 08:40 CEST
Nmap scan report for 192.168.15.253
Host is up (0.00026s latency).
PORT STATE SERVICE
10051/tcp open zabbix-trapper
MAC Address: BA:F5:30:EF:40:EF (Unknown)
Nmap done: 1 IP address (1 host up) scanned in 0.61 seconds
root@vm-2:~# ceph zabbix config-show
{"zabbix_port": 10051, "zabbix_host": "192.168.15.253", "identifier": "VM-2", "zabbix_sender": "/usr/bin/zabbix_sender", "interval": 60}
root@vm-2:~#
But if i try "ceph zabbix send" i get "failed to send data to zabbix" and this show up in the systems journal:
Sep 18 08:41:13 vm-2 ceph-mgr[54445]: 2019-09-18 08:41:13.272 7fe360fe4700 -1 mgr.server reply reply (1) Operation not permitted
The log of ceph-mgr on that machine states:
2019-09-18 08:42:18.188 7fe359fd6700 0 mgr[zabbix] Exception when sending: /usr/bin/zabbix_sender exited non-zero: zabbix_sender [3253392]: DEBUG: answer [{"response":"success","info":"processed: 0; failed: 44; total: 44; seconds spent: 0.000179"}]
2019-09-18 08:43:18.217 7fe359fd6700 0 mgr[zabbix] Exception when sending: /usr/bin/zabbix_sender exited non-zero: zabbix_sender [3253629]: DEBUG: answer [{"response":"success","info":"processed: 0; failed: 44; total: 44; seconds spent: 0.000321"}]
I'm guessing, this could have something to do with user rights. But i have no idea where to start to track this down.
Maybe someone here has a hint?
If more information is needed, i will gladly provide it.
greetings
Ingo
We've run into a problem on our test cluster this afternoon which is running Nautilus (14.2.2). It seems that any time PGs move on the cluster (from marking an OSD down, setting the primary-affinity to 0, or by using the balancer), a large number of the OSDs in the cluster peg the CPU cores they're running on for a while which causes slow requests. From what I can tell it appears to be related to slow peering caused by osd_pg_create() taking a long time.
This was seen on quite a few OSDs while waiting for peering to complete:
# ceph daemon osd.3 ops
{
"ops": [
{
"description": "osd_pg_create(e179061 287.7a:177739 287.9a:177739 287.e2:177739 287.e7:177739 287.f6:177739 287.187:177739 287.1aa:177739 287.216:177739 287.306:177739 287.3e6:177739)",
"initiated_at": "2019-08-27 14:34:46.556413",
"age": 318.25234538000001,
"duration": 318.25241895300002,
"type_data": {
"flag_point": "started",
"events": [
{
"time": "2019-08-27 14:34:46.556413",
"event": "initiated"
},
{
"time": "2019-08-27 14:34:46.556413",
"event": "header_read"
},
{
"time": "2019-08-27 14:34:46.556299",
"event": "throttled"
},
{
"time": "2019-08-27 14:34:46.556456",
"event": "all_read"
},
{
"time": "2019-08-27 14:35:12.456901",
"event": "dispatched"
},
{
"time": "2019-08-27 14:35:12.456903",
"event": "wait for new map"
},
{
"time": "2019-08-27 14:40:01.292346",
"event": "started"
}
]
}
},
...snip...
{
"description": "osd_pg_create(e179066 287.7a:177739 287.9a:177739 287.e2:177739 287.e7:177739 287.f6:177739 287.187:177739 287.1aa:177739 287.216:177739 287.306:177739 287.3e6:177739)",
"initiated_at": "2019-08-27 14:35:09.908567",
"age": 294.900191001,
"duration": 294.90068416899999,
"type_data": {
"flag_point": "delayed",
"events": [
{
"time": "2019-08-27 14:35:09.908567",
"event": "initiated"
},
{
"time": "2019-08-27 14:35:09.908567",
"event": "header_read"
},
{
"time": "2019-08-27 14:35:09.908520",
"event": "throttled"
},
{
"time": "2019-08-27 14:35:09.908617",
"event": "all_read"
},
{
"time": "2019-08-27 14:35:12.456921",
"event": "dispatched"
},
{
"time": "2019-08-27 14:35:12.456923",
"event": "wait for new map"
}
]
}
}
],
"num_ops": 6
}
That "wait for new map" message made us think something was getting hung up on the monitors, so we restarted them all without any luck.
I'll keep investigating, but so far my google searches aren't pulling anything up so I wanted to see if anyone else is running into this?
Thanks,
Bryan
Hi,
the current output of ceph -s reports a warning:
2 slow ops, oldest one blocked for 347335 sec, mon.ld5505 has slow ops
This time is increasing.
root@ld3955:~# ceph -s
cluster:
id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
health: HEALTH_WARN
9 daemons have recently crashed
2 slow ops, oldest one blocked for 347335 sec, mon.ld5505
has slow ops
services:
mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 3d)
mgr: ld5507(active, since 8m), standbys: ld5506, ld5505
mds: cephfs:2 {0=ld5507=up:active,1=ld5505=up:active} 2
up:standby-replay 3 up:standby
osd: 442 osds: 442 up (since 8d), 442 in (since 9d)
data:
pools: 7 pools, 19628 pgs
objects: 65.78M objects, 251 TiB
usage: 753 TiB used, 779 TiB / 1.5 PiB avail
pgs: 19628 active+clean
io:
client: 427 KiB/s rd, 22 MiB/s wr, 851 op/s rd, 647 op/s wr
The details are as follows:
root@ld3955:~# ceph health detail
HEALTH_WARN 9 daemons have recently crashed; 2 slow ops, oldest one
blocked for 347755 sec, mon.ld5505 has slow ops
RECENT_CRASH 9 daemons have recently crashed
mds.ld4464 crashed on host ld4464 at 2020-02-09 07:33:59.131171Z
mds.ld5506 crashed on host ld5506 at 2020-02-09 07:42:52.036592Z
mds.ld4257 crashed on host ld4257 at 2020-02-09 07:47:44.369505Z
mds.ld4464 crashed on host ld4464 at 2020-02-09 06:10:24.515912Z
mds.ld5507 crashed on host ld5507 at 2020-02-09 07:13:22.400268Z
mds.ld4257 crashed on host ld4257 at 2020-02-09 06:48:34.742475Z
mds.ld5506 crashed on host ld5506 at 2020-02-09 06:10:24.680648Z
mds.ld4465 crashed on host ld4465 at 2020-02-09 06:52:33.204855Z
mds.ld5506 crashed on host ld5506 at 2020-02-06 07:59:37.089007Z
SLOW_OPS 2 slow ops, oldest one blocked for 347755 sec, mon.ld5505 has
slow ops
There's no error on services (mgr, mon, osd).
Can you please advise how to identify the root cause of this slow ops?
THX
I have default centos7 setup with nautilus. I have been asked to install
5.5 to check a 'bug'. Where should I get this from? I read that the
elrepo kernel is not compiled like rhel.