January 2020 - ceph-users

moving small production cluster to different datacenter

by Marc Roos

Say one is forced to move a production cluster (4 nodes) to a different datacenter. What options do I have, other than just turning it off at the old location and on on the new location? Maybe buying some extra nodes, and move one node at a time?

4 years, 2 months

5
5
0 0

Can Ceph Do The Job?

by Adam Boyhan

We are looking to role out a all flash Ceph cluster as storage for our cloud solution. The OSD's will be on slightly slower Micron 5300 PRO's, with WAL/DB on Micron 7300 MAX NVMe's. My main concern with Ceph being able to fit the bill is its snapshot abilities. For each RBD we would like the following snapshots 8x 30 minute snapshots (latest 4 hours) With our current solution (HPE Nimble) we simply pause all write IO on the 10 minute mark for roughly 2 seconds and then we take a snapshot of the entire Nimble volume. Each VM within the Nimble volume is sitting on a Linux Logical Volume so its easy for us to take one big snapshot and only get access to a specific clients data. Are there any options for automating managing/retention of snapshots within Ceph besides some bash scripts? Is there anyway to take snapshots of all RBD's within a pool at a given time? Is there anyone successfully running with this many snapshots? If anyone is running a similar setup, would love to hear how your doing it.

4 years, 2 months

4
4
0 0

Servicing multiple OpenStack clusters from the same Ceph cluster

by Paul Browne

Hello, We have a medium-sized Ceph Luminous cluster that, up til now, has been the RBD image backend solely for an OpenStack Newton cluster that's marked for upgrade to Stein later this year. Recently we deployed a brand new Stein cluster however, and I'm curious whether the idea of pointing the new OpenStack cluster at the same RBD pools for Cinder/Glance/Nova as the Luminous cluster would be considered bad practice, or even potentially dangerous. One argument for doing it may be that multiple CInder/Glance/Nova pools serving disparate groups of clients would come at a PG cost to the cluster, though the separation of multiple, distinct pools also has its advantages. The UUIDs generated for RBD images in the pools by OpenStack services *should* be unique and collision-less between the 2 OpenStack clusters, in theory. One other point I was curious about was RBD image feature sets; Stein Ceph clients will be running later versions of Ceph libraries than Newton clients. If the 2 sets of clients were to share pools, would that itself cause problems (in the case that neither set needed to share RBD images within pools, only the pool itself) with some images in the pool having different feature lists? -- ******************* Paul Browne Research Computing Platforms University Information Services Roger Needham Building JJ Thompson Avenue University of Cambridge Cambridge United Kingdom E-Mail: pfb29(a)cam.ac.uk Tel: 0044-1223-746548 *******************

4 years, 2 months

5
5
0 0

health_warn: slow_ops 4 slow ops

by Ignacio Ocampo

Hi Ceph Community (I'm new here :), I'm learning Ceph in a Virtual Environment Vagrant/Virtualbox (I understand this is far from a real environment in several ways, mainly performance, but I'm ok with that at this point :) I've 3 nodes, and after few *vagrant halt/up*, when I do *ceph -s*, I got the following message: [vagrant@ceph-node1 ~]$ sudo ceph -s cluster: id: 7f8cb5f0-1989-4ab1-8fb9-d5c08aa96658 health: *HEALTH_WARN* Reduced data availability: 512 pgs inactive 4 slow ops, oldest one blocked for 1576 sec, daemons [osd.6,osd.7,osd.8] have slow ops. services: mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3 (age 7m) mgr: ceph-node1(active, since 26m), standbys: ceph-node2, ceph-node3 osd: 9 osds: 9 up (since 25m), 9 in (since 2d) data: pools: 1 pools, 512 pgs objects: 0 objects, 0 B usage: 9.1 GiB used, 162 GiB / 171 GiB avail pgs: 100.000% pgs unknown 512 unknown Here the output of *ceph health detail*: [vagrant@ceph-node1 ~]$ sudo ceph health detail HEALTH_WARN Reduced data availability: 512 pgs inactive; 4 slow ops, oldest one blocked for 1810 sec, daemons [osd.6,osd.7,osd.8] have slow ops. PG_AVAILABILITY Reduced data availability: 512 pgs inactive pg 2.1cd is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1ce is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1cf is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1d0 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1d1 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1d2 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1d3 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1d4 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1d5 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1d6 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1d7 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1d8 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1d9 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1da is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1db is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1dc is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1dd is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1de is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1df is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1e0 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1e1 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1e2 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1e3 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1e4 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1e5 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1e6 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1e7 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1e8 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1e9 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1ea is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1eb is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1ec is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1ed is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1ee is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1ef is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1f0 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1f1 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1f2 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1f3 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1f4 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1f5 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1f6 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1f7 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1f8 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1f9 is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1fa is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1fb is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1fc is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1fd is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1fe is stuck inactive for 1815.881027, current state unknown, last acting [] pg 2.1ff is stuck inactive for 1815.881027, current state unknown, last acting [] SLOW_OPS 4 slow ops, oldest one blocked for 1810 sec, daemons [osd.6,osd.7,osd.8] have slow ops. Do you have any guidance on how to proceed with this? I'm trying to understand why the cluster is HEALTH_WARN and what I need to do in order to make it health again. Thanks! -- Ignacio Ocampo

4 years, 2 months

2
1
0 0

Nautilus 14.2.6 ceph-volume bluestore _read_fsid unparsable uuid

by Dave Hall

All, I've just spent a significant amount of time unsuccessfully chasing the _read_fsid unparsable uuid error on Debian 10 / Natilus 14.2.6. Since this is a brand new cluster, last night I gave up and moved back to Debian 9 / Luminous 12.2.11. In both cases I'm using the packages from Debian Backports with ceph-ansible as my deployment tool. Note that above I said 'the _read_fsid unparsable uuid' error. I've searched around a bit and found some previously reported issues, but I did not see any conclusive resolutions. I would like to get to Nautilus as quickly as possible, so I'd gladly provide additional information to help track down the cause of this symptom. I can confirm that, looking at the ceph-volume.log on the OSD host I see no difference between the ceph-volume lvm batch command generated by the ceph-ansible versions associated with these two Ceph releases: ceph-volume --cluster ceph lvm batch --bluestore --yes --block-db-size 133358734540 /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/nvme0n1 Note that I'm using --block-db-size to divide my NVMe into 12 segments as I have 4 empty drive bays on my OSD servers that I may eventually be able to fill. My OSD hardware is: Disk /dev/nvme0n1: 1.5 TiB, 1600321314816 bytes, 3125627568 sectors Disk /dev/sdc: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors Disk /dev/sdd: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors Disk /dev/sde: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors Disk /dev/sdf: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors Disk /dev/sdg: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors Disk /dev/sdh: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors Disk /dev/sdi: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors Disk /dev/sdj: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors I'd send the output of ceph-volume inventory on Luminous, but I'm getting -->: KeyError: 'human_readable_size'. Please let me know if I can provide any further information. Thanks. -Dave -- Dave Hall Binghamton University

4 years, 2 months

3
7
0 0

Re: Servicing multiple OpenStack clusters from the same Ceph cluster

by Anastasios Dados

Yes but we are offering our rbd volumes in another cloud product which can enable them migrate their volumes to openstack when they want. Sent from my iPhone On 29 Jan 2020, at 18:38, Matthew H <matthew.heler(a)hotmail.com> wrote: You should have used separate pool name scemes for each OpenStack cluster.. ________________________________ From: tdados(a)hotmail.com <tdados(a)hotmail.com> Sent: Wednesday, January 29, 2020 12:29 PM To: ceph-users(a)ceph.io <ceph-users(a)ceph.io> Subject: [ceph-users] Re: Servicing multiple OpenStack clusters from the same Ceph cluster Hello, We have recently deployed that and it's working fine. We have deployed different keys for the different openstack clusters ofcourse and they are using the same cinder/nova/glance pools. The only risk is if a client from one openstack cluster creates a volume and the id that will be generated ends up being the same on an existing volume from the other openstack cluster. But that's like possibility of 1 in 5 billion or something. We took the risk. Regards _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

4 years, 2 months

1
0
0 0

cephfs : write error: Operation not permitted

by Yoann Moulin

Hello, On a fresh install (Nautilus 14.2.6) deploy with ceph-ansible playbook stable-4.0, I have an issue with cephfs. I can create a folder, I can create empty files, but cannot write data on like I'm not allowed to write to the cephfs_data pool. > $ ceph -s > cluster: > id: fded5bb5-62c5-4a88-b62c-0986d7c7ac09 > health: HEALTH_OK > > services: > mon: 3 daemons, quorum iccluster039,iccluster041,iccluster042 (age 23h) > mgr: iccluster039(active, since 21h), standbys: iccluster041, iccluster042 > mds: cephfs:3 {0=iccluster043=up:active,1=iccluster041=up:active,2=iccluster042=up:active} > osd: 24 osds: 24 up (since 22h), 24 in (since 22h) > rgw: 1 daemon active (iccluster043.rgw0) > > data: > pools: 9 pools, 568 pgs > objects: 800 objects, 225 KiB > usage: 24 GiB used, 87 TiB / 87 TiB avail > pgs: 568 active+clean The 2 cephfs pools: > $ ceph osd pool ls detail | grep cephfs > pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode warn last_change 83 lfor 0/0/81 flags hashpspool stripe_width 0 expected_num_objects 1 application cephfs > pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 48 flags hashpspool stripe_width 0 expected_num_objects 1 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs The status of the cephfs filesystem: > $ ceph fs status > cephfs - 1 clients > ====== > +------+--------+--------------+---------------+-------+-------+ > | Rank | State | MDS | Activity | dns | inos | > +------+--------+--------------+---------------+-------+-------+ > | 0 | active | iccluster043 | Reqs: 0 /s | 34 | 18 | > | 1 | active | iccluster041 | Reqs: 0 /s | 12 | 16 | > | 2 | active | iccluster042 | Reqs: 0 /s | 10 | 13 | > +------+--------+--------------+---------------+-------+-------+ > +-----------------+----------+-------+-------+ > | Pool | type | used | avail | > +-----------------+----------+-------+-------+ > | cephfs_metadata | metadata | 4608k | 27.6T | > | cephfs_data | data | 0 | 27.6T | > +-----------------+----------+-------+-------+ > +-------------+ > | Standby MDS | > +-------------+ > +-------------+ > MDS version: ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus (stable) > # mkdir folder > # echo "foo" > bar > -bash: echo: write error: Operation not permitted > # ls -al > total 4 > drwxrwxrwx 1 root root 2 Jan 22 07:30 . > drwxr-xr-x 28 root root 4096 Jan 21 09:25 .. > -rw-r--r-- 1 root root 0 Jan 22 07:30 bar > drwxrwxrwx 1 root root 1 Jan 21 16:49 folder > # df -hT . > Filesystem Type Size Used Avail Use% Mounted on > 10.90.38.15,10.90.38.17,10.90.38.18:/dslab2020 ceph 28T 0 28T 0% /cephfs I try 2 client config : > $ ceph --cluster dslab2020 fs authorize cephfs client.cephfsadmin / rw > [snip] > $ ceph auth get client.fsadmin > exported keyring for client.fsadmin > [client.fsadmin] > key = [snip] > caps mds = "allow rw" > caps mon = "allow r" > caps osd = "allow rw tag cephfs data=cephfs" > $ ceph --cluster dslab2020 fs authorize cephfs client.cephfsadmin / rw > [snip] > $ ceph auth caps client.cephfsadmin mds "allow rw" mon "allow r" osd "allow rw tag cephfs pool=cephfs_data " > [snip] > ceph auth caps client.cephfsadmin mds "allow rw" mon "allow r" osd "allow rw tag cephfs pool=cephfs_data "> updated caps for client.cephfsadmin > $ ceph auth get client.cephfsadmin > exported keyring for client.cephfsadmin > [client.cephfsadmin] > key = [snip] > caps mds = "allow rw" > caps mon = "allow r" > caps osd = "allow rw tag cephfs pool=cephfs_data " I don't where to look to get more information about that issue. Anyone can help me? Thanks Best regards, -- Yoann Moulin EPFL IC-IT

4 years, 2 months

5
15
0 0

Re: Luminous Bluestore OSDs crashing with ASSERT

by Stefan Priebe - Profihost AG

Hello Igor, i updated all servers to latest 4.19.97 kernel but this doesn't fix the situation. I can provide you with all those logs - any idea where to upload / how to sent them to you? Greets, Stefan Am 20.01.20 um 13:12 schrieb Igor Fedotov: > Hi Stefan, > > these lines are result of transaction dump performed on a failure during > transaction submission (which is shown as > > "submit_transaction error: Corruption: block checksum mismatch code = 2" > > Most probably they are out of interest (checksum errors are unlikely to > be caused by transaction content) and hence we need earlier stuff to > learn what caused that > > checksum mismatch. > > It's hard to give any formal overview of what you should look for, from > my troubleshooting experience generally one may try to find: > > - some previous error/warning indications (e.g. allocation, disk access, > etc) > > - prior OSD crashes (sometimes they might have different causes/stack > traces/assertion messages) > > - any timeout or retry indications > > - any uncommon log patterns which aren't present during regular running > but happen each time before the crash/failure. > > Anyway I think the inspection depth should be much(?) deeper than > presumably it is (from what I can see from your log snippets). > > Ceph keeps last 10000 log events with an increased log level and dumps > them on crash with negative index starting at -9999 up to -1 as a prefix. > > -1> 2020-01-16 01:10:13.404090 7f3350a14700 -1 rocksdb: > > > It would be great If you share several log snippets for different > crashes containing these last 10000 lines. > > > Thanks, > > Igor > > > On 1/19/2020 9:42 PM, Stefan Priebe - Profihost AG wrote: >> Hello Igor, >> >> there's absolutely nothing in the logs before. >> >> What do those lines mean: >> Put( Prefix = O key = >> 0x7f8000000000000001cc45c881217262'd_data.4303206b8b4567.0000000000009632!='0xfffffffffffffffeffffffffffffffff6f00120000'x' >> >> Value size = 480) >> Put( Prefix = O key = >> 0x7f8000000000000001cc45c881217262'd_data.4303206b8b4567.0000000000009632!='0xfffffffffffffffeffffffffffffffff'o' >> >> Value size = 510) >> >> on the right size i always see 0xfffffffffffffffeffffffffffffffff on all >> failed OSDs. >> >> greets, >> Stefan >> Am 19.01.20 um 14:07 schrieb Stefan Priebe - Profihost AG: >>> Yes, except that this happens on 8 different clusters with different >>> hw but same ceph version and same kernel version. >>> >>> Greets, >>> Stefan >>> >>>> Am 19.01.2020 um 11:53 schrieb Igor Fedotov <ifedotov(a)suse.de>: >>>> >>>> So the intermediate summary is: >>>> >>>> Any OSD in the cluster can experience interim RocksDB checksum >>>> failure. Which isn't present after OSD restart. >>>> >>>> No HW issues observed, no persistent artifacts (except OSD log) >>>> afterwards. >>>> >>>> And looks like the issue is rather specific to the cluster as no >>>> similar reports from other users seem to be present. >>>> >>>> >>>> Sorry, I'm out of ideas other then collect all the failure logs and >>>> try to find something common in them. May be this will shed some >>>> light.. >>>> >>>> BTW from my experience it might make sense to inspect OSD log prior >>>> to failure (any error messages and/or prior restarts, etc) sometimes >>>> this might provide some hints. >>>> >>>> >>>> Thanks, >>>> >>>> Igor >>>> >>>> >>>>> On 1/17/2020 2:30 PM, Stefan Priebe - Profihost AG wrote: >>>>> HI Igor, >>>>> >>>>>> Am 17.01.20 um 12:10 schrieb Igor Fedotov: >>>>>> hmmm.. >>>>>> >>>>>> Just in case - suggest to check H/W errors with dmesg. >>>>> this happens on around 80 nodes - i don't expect all of those have not >>>>> identified hw errors. Also all of them are monitored - no dmesg >>>>> outpout >>>>> contains any errors. >>>>> >>>>>> Also there are some (not very much though) chances this is another >>>>>> incarnation of the following bug: >>>>>> https://tracker.ceph.com/issues/22464 >>>>>> https://github.com/ceph/ceph/pull/24649 >>>>>> >>>>>> The corresponding PR works around it for main device reads (user data >>>>>> only!) but theoretically it might still happen >>>>>> >>>>>> either for DB device or DB data at main device. >>>>>> >>>>>> Can you observe any bluefs spillovers? Are there any correlation >>>>>> between >>>>>> failing OSDs and spillover presence if any, e.g. failing OSDs always >>>>>> have a spillover. While OSDs without spillovers never face the >>>>>> issue... >>>>>> >>>>>> To validate this hypothesis one can try to monitor/check (e.g. once a >>>>>> day for a week or something) "bluestore_reads_with_retries" >>>>>> counter over >>>>>> OSDs to learn if the issue is happening >>>>>> >>>>>> in the system. Non-zero values mean it's there for user data/main >>>>>> device and hence is likely to happen for DB ones as well (which >>>>>> doesn't >>>>>> have any workaround yet). >>>>> OK i checked bluestore_reads_with_retries on 360 osds but all of >>>>> them say 0. >>>>> >>>>> >>>>>> Additionally you might want to monitor memory usage as the above >>>>>> mentioned PR denotes high memory pressure as potential trigger for >>>>>> these >>>>>> read errors. So if such pressure happens the hypothesis becomes >>>>>> more valid. >>>>> we already do this heavily and have around 10GB of memory per OSD. >>>>> Also >>>>> no of those machines show any io pressure at all. >>>>> >>>>> All hosts show a constant rate of around 38GB to 45GB mem available in >>>>> /proc/meminfo. >>>>> >>>>> Stefan >>>>> >>>>>> Thanks, >>>>>> >>>>>> Igor >>>>>> >>>>>> PS. Everything above is rather a speculation for now. Available >>>>>> information is definitely not enough for extensive troubleshooting >>>>>> the >>>>>> cases which happens that rarely. >>>>>> >>>>>> You might want to start collecting failure-related information >>>>>> (including but not limited to failure logs, perf counter dumps, >>>>>> system >>>>>> resource reports etc) for future analysis. >>>>>> >>>>>> >>>>>> >>>>>> On 1/16/2020 11:58 PM, Stefan Priebe - Profihost AG wrote: >>>>>>> Hi Igor, >>>>>>> >>>>>>> answers inline. >>>>>>> >>>>>>> Am 16.01.20 um 21:34 schrieb Igor Fedotov: >>>>>>>> you may want to run fsck against failing OSDs. Hopefully it will >>>>>>>> shed >>>>>>>> some light. >>>>>>> fsck just says everything fine: >>>>>>> >>>>>>> # ceph-bluestore-tool --command fsck --path >>>>>>> /var/lib/ceph/osd/ceph-27/ >>>>>>> fsck success >>>>>>> >>>>>>> >>>>>>>> Also wondering if OSD is able to recover (startup and proceed >>>>>>>> working) >>>>>>>> after facing the issue? >>>>>>> no recover needed. It just runs forever after restarting. >>>>>>> >>>>>>>> If so do you have any one which failed multiple times? Do you >>>>>>>> have logs >>>>>>>> for these occurrences? >>>>>>> may be but there are most probably weeks or month between those >>>>>>> failures >>>>>>> - most probably logs are already deleted. >>>>>>> >>>>>>>> Also please note that patch you mentioned doesn't fix previous >>>>>>>> issues >>>>>>>> (i.e. duplicate allocations), it prevents from new ones only. >>>>>>>> >>>>>>>> But fsck should show them if any... >>>>>>> None showed. >>>>>>> >>>>>>> Stefan >>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Igor >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 1/16/2020 10:04 PM, Stefan Priebe - Profihost AG wrote: >>>>>>>>> Hi Igor, >>>>>>>>> >>>>>>>>> ouch sorry. Here we go: >>>>>>>>> >>>>>>>>> -1> 2020-01-16 01:10:13.404090 7f3350a14700 -1 rocksdb: >>>>>>>>> submit_transaction error: Corruption: block checksum mismatch >>>>>>>>> code = 2 >>>>>>>>> Rocksdb transaction: >>>>>>>>> Put( Prefix = M key = >>>>>>>>> 0x0000000000000402'.OBJ_0000000000000002.953BFD0A.bb85c.rbd%udata%e3e8eac6b8b4567%e0000000000001f2e..' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 97) >>>>>>>>> Put( Prefix = M key = >>>>>>>>> 0x0000000000000402'.MAP_00000000000BB85C_0000000000000002.953BFD0A.bb85c.rbd%udata%e3e8eac6b8b4567%e0000000000001f2e..' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 93) >>>>>>>>> Put( Prefix = M key = >>>>>>>>> 0x0000000000000916'.0000823257.00000000000073922044' Value size >>>>>>>>> = 196) >>>>>>>>> Put( Prefix = M key = >>>>>>>>> 0x0000000000000916'.0000823257.00000000000073922045' Value size >>>>>>>>> = 184) >>>>>>>>> Put( Prefix = M key = 0x0000000000000916'._info' Value size = 899) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00000000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 418) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00030000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 474) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f0007c000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 392) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00090000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 317) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f000a0000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 521) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f000f4000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 558) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00130000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 649) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00194000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 449) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f001cc000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 580) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00200000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 435) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00240000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 569) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00290000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 465) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f002e0000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 710) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f00300000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 599) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f0036c000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 372) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003a6000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 130) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003b4000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 540) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff6f003fc000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 47) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0x00000000000bb85cffffffffffffffff'o' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 1731) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff6f00040000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 675) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff6f00080000'x' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 395) >>>>>>>>> Put( Prefix = O key = >>>>>>>>> 0x7f80000000000000029acdfb05217262'd_data.3e8eac6b8b4567.0000000000001f2e!='0xfffffffffffffffeffffffffffffffff'o' >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Value size = 1328) >>>>>>>>> Put( Prefix = X key = 0x0000000018a38deb Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x0000000018a38dea Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x000000000d7a035b Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x000000000d7a035c Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x000000000d7a0355 Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x000000000d7a0356 Value size = 17) >>>>>>>>> Put( Prefix = X key = 0x000000001a54f6e4 Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x000000001b1c061e Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x000000000d7a038f Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x000000000d7a0389 Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x000000000d7a0358 Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x000000000d7a035f Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x000000000d7a0357 Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x000000000d7a0387 Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x000000000d7a038a Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x000000000d7a0388 Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x00000000134c3fbe Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x00000000134c3fb5 Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x000000000d7a036e Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x000000000d7a036d Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x00000000134c3fb8 Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x000000000d7a0371 Value size = 14) >>>>>>>>> Put( Prefix = X key = 0x000000000d7a036a Value size = 14) >>>>>>>>> 0> 2020-01-16 01:10:13.413759 7f3350a14700 -1 >>>>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void >>>>>>>>> BlueStore::_kv_sync_thread()' thread 7f3350a14700 time 2020-01-16 >>>>>>>>> 01:10:13.404113 >>>>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED >>>>>>>>> assert(r == 0) >>>>>>>>> >>>>>>>>> ceph version 12.2.12-11-gd3eae83543 >>>>>>>>> (d3eae83543bffc0fc6c43823feb637fa851b6213) luminous (stable) >>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, >>>>>>>>> char >>>>>>>>> const*)+0x102) [0x55c9a712d232] >>>>>>>>> 2: (BlueStore::_kv_sync_thread()+0x24c5) [0x55c9a6fb54b5] >>>>>>>>> 3: (BlueStore::KVSyncThread::entry()+0xd) [0x55c9a6ff608d] >>>>>>>>> 4: (()+0x7494) [0x7f33615f9494] >>>>>>>>> 5: (clone()+0x3f) [0x7f3360680acf] >>>>>>>>> >>>>>>>>> I already picked those: >>>>>>>>> https://github.com/ceph/ceph/pull/28644 >>>>>>>>> >>>>>>>>> Greets, >>>>>>>>> Stefan >>>>>>>>> Am 16.01.20 um 17:00 schrieb Igor Fedotov: >>>>>>>>>> Hi Stefan, >>>>>>>>>> >>>>>>>>>> would you please share log snippet prior the assertions? Looks >>>>>>>>>> like >>>>>>>>>> RocksDB is failing during transaction submission... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Igor >>>>>>>>>> >>>>>>>>>> On 1/16/2020 11:56 AM, Stefan Priebe - Profihost AG wrote: >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> does anybody know a fix for this ASSERT / crash? >>>>>>>>>>> >>>>>>>>>>> 2020-01-16 02:02:31.316394 7f8c3f5ab700 -1 >>>>>>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: In function 'void >>>>>>>>>>> BlueStore::_kv_sync_thread()' thread 7f8c3f5ab700 time >>>>>>>>>>> 2020-01-16 >>>>>>>>>>> 02:02:31.304993 >>>>>>>>>>> /build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r >>>>>>>>>>> == 0) >>>>>>>>>>> >>>>>>>>>>> ceph version 12.2.12-11-gd3eae83543 >>>>>>>>>>> (d3eae83543bffc0fc6c43823feb637fa851b6213) luminous (stable) >>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, >>>>>>>>>>> int, char >>>>>>>>>>> const*)+0x102) [0x55e6df9d9232] >>>>>>>>>>> 2: (BlueStore::_kv_sync_thread()+0x24c5) [0x55e6df8614b5] >>>>>>>>>>> 3: (BlueStore::KVSyncThread::entry()+0xd) [0x55e6df8a208d] >>>>>>>>>>> 4: (()+0x7494) [0x7f8c50190494] >>>>>>>>>>> 5: (clone()+0x3f) [0x7f8c4f217acf] >>>>>>>>>>> >>>>>>>>>>> all bluestore OSDs are randomly crashing sometimes (once a >>>>>>>>>>> week). >>>>>>>>>>> >>>>>>>>>>> Greets, >>>>>>>>>>> Stefan >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> ceph-users mailing list >>>>>>>>>>> ceph-users(a)lists.ceph.com >>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users(a)lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>

4 years, 2 months

2
1
0 0

Concurrent append operations

by David Bell

Hello, I am currently evaluating Ceph for our needs and I have a question about the 'object append' feature. I note that the rados core API supports an 'append' operation, and the S3-compatible interface has too. My question is: does Ceph support concurrent append? I would like to use Ceph as a temporary store, a "buffer" if you will, for incoming data from a variety of sources. Each object would hold data for a particular identifier. I'd like to know if two or more different clients can 'append' to the same object, and the data doesn't overwrite each other, and each 'append' is added to the end of the object? Performance wise we'd likely be performing 15-20 thousand writes per second, so we'd be building a pretty big cluster on very fast flash disk. Data would only reside on the system for about an hour at most before being read and deleted. Cheers, David Bell

4 years, 2 months

2
1
0 0

Ceph MDS specific perf info disappeared in Nautilus

by Stefan Kooman

Hi, The command "ceph daemon mds.$mds perf dump" does not give the collection with MDS specific data anymore. In Mimic I get the following MDS specific collections: - mds - mds_cache - mds_log - mds_mem - mds_server - mds_sessions But those are not available in Nautilus anymore (14.2.4). Also not listed in a "perf schema". Where did these metrics go? Thanks, Stefan -- | BIT BV https://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / info(a)bit.nl

4 years, 2 months

2
3
0 0

2024

2023

2022

2021

2020

2019

ceph-users January 2020