Over the weekend I had multiple OSD servers in my Octopus cluster
(15.2.4) crash and reboot at nearly the same time. The OSDs are part of
an erasure coded pool. At the time the cluster had been busy with a
long-running (~week) remapping of a large number of PGs after I
incrementally added more OSDs to the cluster. After bringing all of the
OSDs back up, I have 25 unfound objects and 75 degraded objects. There
are other problems reported, but I'm primarily concerned with these
unfound/degraded objects.
The pool with the missing objects is a cephfs pool. The files stored in
the pool are backed up on tape, so I can easily restore individual files
as needed (though I would not want to restore the entire filesystem).
I tried following the guide at
https://docs.ceph.com/docs/octopus/rados/troubleshooting/troubleshooting-pg….
I found a number of OSDs that are still 'not queried'. Restarting a
sampling of these OSDs changed the state from 'not queried' to 'already
probed', but that did not recover any of the unfound or degraded objects.
I have also tried 'ceph pg deep-scrub' on the affected PGs, but never
saw them get scrubbed. I also tried doing a 'ceph pg force-recovery' on
the affected PGs, but only one seems to have been tagged accordingly
(see ceph -s output below).
The guide also says "Sometimes it simply takes some time for the cluster
to query possible locations." I'm not sure how long "some time" might
take, but it hasn't changed after several hours.
My questions are:
* Is there a way to force the cluster to query the possible locations
sooner?
* Is it possible to identify the files in cephfs that are affected, so
that I could delete only the affected files and restore them from backup
tapes?
--Mike
ceph -s:
cluster:
id: 066f558c-6789-4a93-aaf1-5af1ba01a3ad
health: HEALTH_ERR
1 clients failing to respond to capability release
1 MDSs report slow requests
25/78520351 objects unfound (0.000%)
2 nearfull osd(s)
Reduced data availability: 1 pg inactive
Possible data damage: 9 pgs recovery_unfound
Degraded data redundancy: 75/626645098 objects degraded
(0.000%), 9 pgs degraded
1013 pgs not deep-scrubbed in time
1013 pgs not scrubbed in time
2 pool(s) nearfull
1 daemons have recently crashed
4 slow ops, oldest one blocked for 77939 sec, daemons
[osd.0,osd.41] have slow ops.
services:
mon: 4 daemons, quorum ceph1,ceph2,ceph3,ceph4 (age 9d)
mgr: ceph3(active, since 11d), standbys: ceph2, ceph4, ceph1
mds: archive:1 {0=ceph4=up:active} 3 up:standby
osd: 121 osds: 121 up (since 6m), 121 in (since 101m); 4 remapped pgs
task status:
scrub status:
mds.ceph4: idle
data:
pools: 9 pools, 2433 pgs
objects: 78.52M objects, 298 TiB
usage: 412 TiB used, 545 TiB / 956 TiB avail
pgs: 0.041% pgs unknown
75/626645098 objects degraded (0.000%)
135224/626645098 objects misplaced (0.022%)
25/78520351 objects unfound (0.000%)
2421 active+clean
5 active+recovery_unfound+degraded
3 active+recovery_unfound+degraded+remapped
2 active+clean+scrubbing+deep
1 unknown
1 active+forced_recovery+recovery_unfound+degraded
progress:
PG autoscaler decreasing pool 7 PGs from 1024 to 512 (5d)
[............................]
Hello,
Over the last week I have tried optimising the performance of our MDS
nodes for the large amount of files and concurrent clients we have. It
turns out that despite various stability fixes in recent releases, the
default configuration still doesn't appear to be optimal for keeping the
cache size under control and avoid intermittent I/O blocks.
Unfortunately, it is very hard to tweak the configuration to something
that works, because the tuning parameters needed are largely
undocumented or only described in very technical terms in the source
code making them quite unapproachable for administrators not familiar
with all the CephFS internals. I would therefore like to ask if it were
possible to document the "advanced" MDS settings more clearly as to what
they do and in what direction they have to be tuned for more or less
aggressive cap recall, for instance (sometimes it is not clear if a
threshold is a min or a max threshold).
I am am in the very (un)fortunate situation to have folders with a
several 100K direct sub folders or files (and one extreme case with
almost 7 million dentries), which is a pretty good benchmark for
measuring cap growth while performing operations on them. For the time
being, I came up with this configuration, which seems to work for me,
but is still far from optimal:
mds basic mds_cache_memory_limit 10737418240
mds advanced mds_cache_trim_threshold 131072
mds advanced mds_max_caps_per_client 500000
mds advanced mds_recall_max_caps 17408
mds advanced mds_recall_max_decay_rate 2.000000
The parameters I am least sure about---because I understand the least
how they actually work---are mds_cache_trim_threshold and
mds_recall_max_decay_rate. Despite reading the description in
src/common/options.cc, I understand only half of what they're doing and
I am also not quite sure in which direction to tune them for optimal
results.
Another point where I am struggling is the correct configuration of
mds_recall_max_caps. The default of 5K doesn't work too well for me, but
values above 20K also don't seem to be a good choice. While high values
result in fewer blocked ops and better performance without destabilising
the MDS, they also lead to slow but unbounded cache growth, which seems
counter-intuitive. 17K was the maximum I could go. Higher values work
for most use cases, but when listing very large folders with millions of
dentries, the MDS cache size slowly starts to exceed the limit after a
few hours, since the MDSs are failing to keep clients below
mds_max_caps_per_client despite not showing any "failing to respond to
cache pressure" warnings.
With the configuration above, I do not have cache size issues any more,
but it comes at the cost of performance and slow/blocked ops. A few
hints as to how I could optimise my settings for better client
performance would be much appreciated and so would be additional
documentation for all the "advanced" MDS settings.
Thanks a lot
Janek
Good day, cephers!
We've recently upgraded our cluster from 14.2.8 to 14.2.10 release, also
performing full system packages upgrade(Ubuntu 18.04 LTS).
After that performance significantly dropped, main reason beeing that
journal SSDs are now have no merges, huge queues, and increased latency.
There's a few screenshots in attachments. This is for an SSD journal that
supports block.db/block.wal for 3 spinning OSDs, and it looks like this for
all our SSD block.db/wal devices across all nodes.
Any ideas what may cause that? Maybe I've missed something important in
release notes?
Dear Cephers,
we are currently mounting CephFS with relatime, using the FUSE client (version 13.2.6):
ceph-fuse on /cephfs type fuse.ceph-fuse (rw,relatime,user_id=0,group_id=0,allow_other)
For the first time, I wanted to use atime to identify old unused data. My expectation with "relatime" was that the access time stamp would be updated less often, for example,
only if the last file access was >24 hours ago. However, that does not seem to be the case:
----------------------------------------------
$ stat /cephfs/grid/atlas/atlaslocalgroupdisk/rucio/group/phys-higgs/ed/cb/group.phys-higgs.17620861._000004.HSM_common.root
...
Access: 2019-04-10 15:50:04.975959159 +0200
Modify: 2019-04-10 15:50:05.651613843 +0200
Change: 2019-04-10 15:50:06.141006962 +0200
...
$ cat /cephfs/grid/atlas/atlaslocalgroupdisk/rucio/group/phys-higgs/ed/cb/group.phys-higgs.17620861._000004.HSM_common.root > /dev/null
$ sync
$ stat /cephfs/grid/atlas/atlaslocalgroupdisk/rucio/group/phys-higgs/ed/cb/group.phys-higgs.17620861._000004.HSM_common.root
...
Access: 2019-04-10 15:50:04.975959159 +0200
Modify: 2019-04-10 15:50:05.651613843 +0200
Change: 2019-04-10 15:50:06.141006962 +0200
...
----------------------------------------------
I also tried this via an nfs-ganesha mount, and via a ceph-fuse mount with admin caps,
but atime never changes.
Is atime really never updated with CephFS, or is this configurable?
Something as coarse as "update at maximum once per day only" would be perfectly fine for the use case.
Cheers,
Oliver
Hello,
We are planning to perform a small upgrade to our cluster and slowly start adding 12TB SATA HDD drives. We need to accommodate for additional SSD WAL/DB requirements as well. Currently we are considering the following:
HDD Drives - Seagate EXOS 12TB
SSD Drives for WAL/DB - Intel D3 S4510 960GB or Intel D3 S4610 960GB
Our cluster isn't hosting any IO intensive DBs nor IO hungry VMs such as Exchange, MSSQL, etc.
From the documentation that I've read the recommended size for DB is between 1% and 4% of the size of the osd. Would 2% figure be sufficient enough (so around 240GB DB size for each 12TB osd?)
Also, from your experience, which is a better model for the SSD DB/WAL? Would Intel S4510 be sufficient enough for our purpose or would the S4610 be a much better choice? Are there any other cost effective performance to consider instead of the above models?
The same question to the HDD. Any other drives we should consider instead of the Seagate EXOS series?
Thanks for you help and suggestions.
Andrei
Hi all,
on a mimic 13.2.8 cluster I observe a gradual increase of memory usage by OSD daemons, in particular, under heavy load. For our spinners I use osd_memory_target=2G. The daemons overrun the 2G in virt size rather quickly and grow to something like 4G virtual. The real memory consumption stays more or less around the 2G of the target. There are some overshoots, but these go down again during periods with less load.
What I observe now is that the actual memory consumption slowly grows and OSDs start using more than 2G virtual memory. I see this as slowly growing swap usage despite having more RAM available (swappiness=10). This indicates allocated but unused memory or memory not accessed for a long time, usually a leak. Here some heap stats:
Before restart:
osd.101 tcmalloc heap stats:------------------------------------------------
MALLOC: 3438940768 ( 3279.6 MiB) Bytes in use by application
MALLOC: + 5611520 ( 5.4 MiB) Bytes in page heap freelist
MALLOC: + 257307352 ( 245.4 MiB) Bytes in central cache freelist
MALLOC: + 357376 ( 0.3 MiB) Bytes in transfer cache freelist
MALLOC: + 6727368 ( 6.4 MiB) Bytes in thread cache freelists
MALLOC: + 25559040 ( 24.4 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 3734503424 ( 3561.5 MiB) Actual memory used (physical + swap)
MALLOC: + 575946752 ( 549.3 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 4310450176 ( 4110.8 MiB) Virtual address space used
MALLOC:
MALLOC: 382884 Spans in use
MALLOC: 35 Thread heaps in use
MALLOC: 8192 Tcmalloc page size
------------------------------------------------
# ceph daemon osd.101 dump_mempools
{
"mempool": {
"by_pool": {
"bloom_filter": {
"items": 0,
"bytes": 0
},
"bluestore_alloc": {
"items": 4691828,
"bytes": 37534624
},
"bluestore_cache_data": {
"items": 0,
"bytes": 0
},
"bluestore_cache_onode": {
"items": 51,
"bytes": 28968
},
"bluestore_cache_other": {
"items": 5761276,
"bytes": 46292425
},
"bluestore_fsck": {
"items": 0,
"bytes": 0
},
"bluestore_txc": {
"items": 67,
"bytes": 46096
},
"bluestore_writing_deferred": {
"items": 208,
"bytes": 26037057
},
"bluestore_writing": {
"items": 52,
"bytes": 6789398
},
"bluefs": {
"items": 9478,
"bytes": 183720
},
"buffer_anon": {
"items": 291450,
"bytes": 28093473
},
"buffer_meta": {
"items": 546,
"bytes": 34944
},
"osd": {
"items": 98,
"bytes": 1139152
},
"osd_mapbl": {
"items": 78,
"bytes": 8204276
},
"osd_pglog": {
"items": 341944,
"bytes": 120607952
},
"osdmap": {
"items": 10687217,
"bytes": 186830528
},
"osdmap_mapping": {
"items": 0,
"bytes": 0
},
"pgmap": {
"items": 0,
"bytes": 0
},
"mds_co": {
"items": 0,
"bytes": 0
},
"unittest_1": {
"items": 0,
"bytes": 0
},
"unittest_2": {
"items": 0,
"bytes": 0
}
},
"total": {
"items": 21784293,
"bytes": 461822613
}
}
}
Right after restart + health_ok:
osd.101 tcmalloc heap stats:------------------------------------------------
MALLOC: 1173996280 ( 1119.6 MiB) Bytes in use by application
MALLOC: + 3727360 ( 3.6 MiB) Bytes in page heap freelist
MALLOC: + 25493688 ( 24.3 MiB) Bytes in central cache freelist
MALLOC: + 17101824 ( 16.3 MiB) Bytes in transfer cache freelist
MALLOC: + 20301904 ( 19.4 MiB) Bytes in thread cache freelists
MALLOC: + 5242880 ( 5.0 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 1245863936 ( 1188.1 MiB) Actual memory used (physical + swap)
MALLOC: + 20488192 ( 19.5 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 1266352128 ( 1207.7 MiB) Virtual address space used
MALLOC:
MALLOC: 54160 Spans in use
MALLOC: 33 Thread heaps in use
MALLOC: 8192 Tcmalloc page size
------------------------------------------------
Am I looking at a memory leak here or are these heap stats expected?
I don't mind the swap usage, it doesn't have impact. I'm just wondering if I need to restart OSDs regularly. The "leakage" above occurred within only 2 months.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hi,
I've got a problem on Octopus (15.2.3, debian packages) install, bucket
S3 index shows a file:
s3cmd ls s3://upvid/255/38355 --recursive
2020-07-27 17:48 50584342
s3://upvid/255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4
radosgw-admin bi list also shows it
{
"type": "plain",
"idx":
"255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4",
"entry": { "name":
"255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4",
"instance": "", "ver": {
"pool": 11,
"epoch": 853842
},
"locator": "",
"exists": "true",
"meta": {
"category": 1,
"size": 50584342,
"mtime": "2020-07-27T17:48:27.203008Z",
"etag": "2b31cc8ce8b1fb92a5f65034f2d12581-7",
"storage_class": "",
"owner": "filmweb-app",
"owner_display_name": "filmweb app user",
"content_type": "",
"accounted_size": 50584342,
"user_data": "",
"appendable": "false"
},
"tag": "_3ubjaztglHXfZr05wZCFCPzebQf-ZFP",
"flags": 0,
"pending_map": [],
"versioned_epoch": 0
}
},
but trying to download it via curl (I've set permissions to public0 only gets me
<?xml version="1.0"
encoding="UTF-8"?><Error><Code>NoSuchKey</Code><BucketName>upvid</BucketName><RequestId>tx0000000000000000e716d-005f1f14cb-e478a-pl-war1</RequestId><HostId>e478a-pl-war1-pl</HostId></Error>
(the actually nonexisting files shows access denied in same context)
same with other tools:
$ s3cmd get s3://upvid/255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4 /tmp
download: 's3://upvid/255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4' -> '/tmp/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4' [1 of 1]
ERROR: S3 error: 404 (NoSuchKey)
cluster health is OK
Any ideas what is happening here ?
--
Mariusz Gronczewski, Administrator
Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
NOC: [+48] 22 380 10 20
E: admin(a)efigence.com
On 9/25/2020 6:07 PM, Saber(a)PlanetHoster.info wrote:
> Hi Igor,
>
> The only thing abnormal about this osdstore is that it was created by
> Mimic 13.2.8 and I can see that the OSDs size of this osdstore are not
> the same as the others in the cluster (while they should be exactly
> the same size).
>
> Can it be https://tracker.ceph.com/issues/39151 ?
hmm, may be... Did you change H/W at some point for this OSD's node as
it happened in the ticket?
And it's still unclear to me if the issue is reproducible for you.
Could you please also run fsck (at first) and then repair for this OSD
and collect log(s).
Thanks,
Igor
>
> Thanks!
> Saber
> CTO @PlanetHoster
>
>> On Sep 25, 2020, at 5:46 AM, Igor Fedotov <ifedotov(a)suse.de
>> <mailto:ifedotov@suse.de>> wrote:
>>
>> Hi Saber,
>>
>> I don't think this is related. New assertion happens along the write
>> path while the original one occurred on allocator shutdown.
>>
>>
>> Unfortunately there are not much information to troubleshoot this...
>> Are you able to reproduce the case?
>>
>>
>> Thanks,
>>
>> Igor
>>
>> On 9/25/2020 4:21 AM, Saber(a)PlanetHoster.info wrote:
>>> Hi Igor,
>>>
>>> We had an osd crash a week after running Nautilus. I have attached
>>> the logs, is it related to the same bug?
>>>
>>>
>>>
>>>
>>> Thanks,
>>> Saber
>>> CTO @PlanetHoster
>>>
>>>> On Sep 14, 2020, at 10:22 AM, Igor Fedotov <ifedotov(a)suse.de
>>>> <mailto:ifedotov@suse.de>> wrote:
>>>>
>>>> Thanks!
>>>>
>>>> Now got the root cause. The fix is on its way...
>>>>
>>>> Meanwhile you might want to try to workaround the issue via setting
>>>> "bluestore_hybrid_alloc_mem_cap" to 0 or using different allocator,
>>>> e.g. avl for bluestore_allocator (and optionally for
>>>> bluefs_allocator too).
>>>>
>>>>
>>>> Hope this helps,
>>>>
>>>> Igor.
>>>>
>>>>
>>>>
>>>> On 9/14/2020 5:02 PM, Jean-Philippe Méthot wrote:
>>>>> Alright, here’s the full log file.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Jean-Philippe Méthot
>>>>> Senior Openstack system administrator
>>>>> Administrateur système Openstack sénior
>>>>> PlanetHoster inc.
>>>>> 4414-4416 Louis B Mayer
>>>>> Laval, QC, H7P 0G1, Canada
>>>>> TEL : +1.514.802.1644 - Poste : 2644
>>>>> FAX : +1.514.612.0678
>>>>> CA/US : 1.855.774.4678
>>>>> FR : 01 76 60 41 43
>>>>> UK : 0808 189 0423
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Le 14 sept. 2020 à 06:49, Igor Fedotov <ifedotov(a)suse.de
>>>>>> <mailto:ifedotov@suse.de>> a écrit :
>>>>>>
>>>>>> Well, I can see duplicate admin socket command
>>>>>> registration/de-registration (and the second de-registration
>>>>>> asserts) but don't understand how this could happen.
>>>>>>
>>>>>> Would you share the full log, please?
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Igor
>>>>>>
>>>>>> On 9/11/2020 7:26 PM, Jean-Philippe Méthot wrote:
>>>>>>> Here’s the out file, as requested.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Jean-Philippe Méthot
>>>>>>> Senior Openstack system administrator
>>>>>>> Administrateur système Openstack sénior
>>>>>>> PlanetHoster inc.
>>>>>>> 4414-4416 Louis B Mayer
>>>>>>> Laval, QC, H7P 0G1, Canada
>>>>>>> TEL : +1.514.802.1644 - Poste : 2644
>>>>>>> FAX : +1.514.612.0678
>>>>>>> CA/US : 1.855.774.4678
>>>>>>> FR : 01 76 60 41 43
>>>>>>> UK : 0808 189 0423
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Le 11 sept. 2020 à 10:38, Igor Fedotov <ifedotov(a)suse.de
>>>>>>>> <mailto:ifedotov@suse.de>> a écrit :
>>>>>>>>
>>>>>>>> Could you please run:
>>>>>>>>
>>>>>>>> CEPH_ARGS="--log-file log --debug-asok 5" ceph-bluestore-tool
>>>>>>>> repair --path <...> ; cat log | grep asok > out
>>>>>>>>
>>>>>>>> and share 'out' file.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Igor
>>>>>>>>
>>>>>>>> On 9/11/2020 5:15 PM, Jean-Philippe Méthot wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> We’re upgrading our cluster OSD node per OSD node to Nautilus
>>>>>>>>> from Mimic. From some release notes, it was recommended to run
>>>>>>>>> the following command to fix stats after an upgrade :
>>>>>>>>>
>>>>>>>>> ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-0
>>>>>>>>>
>>>>>>>>> However, running that command gives us the following error
>>>>>>>>> message:
>>>>>>>>>
>>>>>>>>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc
>>>>>>>>>> <http://allocator.cc/>: In
>>>>>>>>>> function 'virtual Allocator::SocketHook::~SocketHook()'
>>>>>>>>>> thread 7f1a6467eec0 time 2020-09-10 14:40:25.872353
>>>>>>>>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc
>>>>>>>>>> <http://allocator.cc/>: 53
>>>>>>>>>> : FAILED ceph_assert(r == 0)
>>>>>>>>>> ceph version 14.2.11
>>>>>>>>>> (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int,
>>>>>>>>>> char const*)+0x14a) [0x7f1a5a823025]
>>>>>>>>>> 2: (()+0x25c1ed) [0x7f1a5a8231ed]
>>>>>>>>>> 3: (()+0x3c7a4f) [0x55b33537ca4f]
>>>>>>>>>> 4: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
>>>>>>>>>> 5: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
>>>>>>>>>> 6: (BlueStore::_close_db_and_around(bool)+0x2f8)
>>>>>>>>>> [0x55b335274528]
>>>>>>>>>> 7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1)
>>>>>>>>>> [0x55b3352749a1]
>>>>>>>>>> 8: (main()+0x10b3) [0x55b335187493]
>>>>>>>>>> 9: (__libc_start_main()+0xf5) [0x7f1a574aa555]
>>>>>>>>>> 10: (()+0x1f9b5f) [0x55b3351aeb5f]
>>>>>>>>>> 2020-09-10 14:40:25.873 7f1a6467eec0 -1
>>>>>>>>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc
>>>>>>>>>> <http://allocator.cc/>: In function 'virtual
>>>>>>>>>> Allocator::SocketHook::~SocketHook()' thread 7f1a6467eec0
>>>>>>>>>> time 2020-09-10 14:40:25.872353
>>>>>>>>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc
>>>>>>>>>> <http://allocator.cc/>: 53: FAILED ceph_assert(r == 0)
>>>>>>>>>>
>>>>>>>>>> ceph version 14.2.11
>>>>>>>>>> (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int,
>>>>>>>>>> char const*)+0x14a) [0x7f1a5a823025]
>>>>>>>>>> 2: (()+0x25c1ed) [0x7f1a5a8231ed]
>>>>>>>>>> 3: (()+0x3c7a4f) [0x55b33537ca4f]
>>>>>>>>>> 4: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
>>>>>>>>>> 5: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
>>>>>>>>>> 6: (BlueStore::_close_db_and_around(bool)+0x2f8)
>>>>>>>>>> [0x55b335274528]
>>>>>>>>>> 7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1)
>>>>>>>>>> [0x55b3352749a1]
>>>>>>>>>> 8: (main()+0x10b3) [0x55b335187493]
>>>>>>>>>> 9: (__libc_start_main()+0xf5) [0x7f1a574aa555]
>>>>>>>>>> 10: (()+0x1f9b5f) [0x55b3351aeb5f]
>>>>>>>>>> *** Caught signal (Aborted) **
>>>>>>>>>> in thread 7f1a6467eec0 thread_name:ceph-bluestore-
>>>>>>>>>> ceph version 14.2.11
>>>>>>>>>> (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
>>>>>>>>>> 1: (()+0xf630) [0x7f1a58cf0630]
>>>>>>>>>> 2: (gsignal()+0x37) [0x7f1a574be387]
>>>>>>>>>> 3: (abort()+0x148) [0x7f1a574bfa78]
>>>>>>>>>> 4: (ceph::__ceph_assert_fail(char const*, char const*, int,
>>>>>>>>>> char const*)+0x199) [0x7f1a5a823074]
>>>>>>>>>> 5: (()+0x25c1ed) [0x7f1a5a8231ed]
>>>>>>>>>> 6: (()+0x3c7a4f) [0x55b33537ca4f]
>>>>>>>>>> 7: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
>>>>>>>>>> 8: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
>>>>>>>>>> 9: (BlueStore::_close_db_and_around(bool)+0x2f8)
>>>>>>>>>> [0x55b335274528]
>>>>>>>>>> 10: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1)
>>>>>>>>>> [0x55b3352749a1]
>>>>>>>>>> 11: (main()+0x10b3) [0x55b335187493]
>>>>>>>>>> 12: (__libc_start_main()+0xf5) [0x7f1a574aa555]
>>>>>>>>>> 13: (()+0x1f9b5f) [0x55b3351aeb5f]
>>>>>>>>>> 2020-09-10 14:40:25.874 7f1a6467eec0 -1 *** Caught signal
>>>>>>>>>> (Aborted) **
>>>>>>>>>> in thread 7f1a6467eec0 thread_name:ceph-bluestore-
>>>>>>>>>>
>>>>>>>>>> ceph version 14.2.11
>>>>>>>>>> (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
>>>>>>>>>> 1: (()+0xf630) [0x7f1a58cf0630]
>>>>>>>>>> 2: (gsignal()+0x37) [0x7f1a574be387]
>>>>>>>>>> 3: (abort()+0x148) [0x7f1a574bfa78]
>>>>>>>>>> 4: (ceph::__ceph_assert_fail(char const*, char const*, int,
>>>>>>>>>> char const*)+0x199) [0x7f1a5a823074]
>>>>>>>>>> 5: (()+0x25c1ed) [0x7f1a5a8231ed]
>>>>>>>>>> 6: (()+0x3c7a4f) [0x55b33537ca4f]
>>>>>>>>>> 7: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
>>>>>>>>>> 8: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
>>>>>>>>>> 9: (BlueStore::_close_db_and_around(bool)+0x2f8)
>>>>>>>>>> [0x55b335274528]
>>>>>>>>>> 10: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1)
>>>>>>>>>> [0x55b3352749a1]
>>>>>>>>>> 11: (main()+0x10b3) [0x55b335187493]
>>>>>>>>>> 12: (__libc_start_main()+0xf5) [0x7f1a574aa555]
>>>>>>>>>> 13: (()+0x1f9b5f) [0x55b3351aeb5f]
>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS
>>>>>>>>>> <executable>` is needed to interpret this.
>>>>>>>>>
>>>>>>>>> What could be the source of this error? I haven’t found much
>>>>>>>>> of anything about it online.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Jean-Philippe Méthot
>>>>>>>>> Senior Openstack system administrator
>>>>>>>>> Administrateur système Openstack sénior
>>>>>>>>> PlanetHoster inc.
>>>>>>>>> 4414-4416 Louis B Mayer
>>>>>>>>> Laval, QC, H7P 0G1, Canada
>>>>>>>>> TEL : +1.514.802.1644 - Poste : 2644
>>>>>>>>> FAX : +1.514.612.0678
>>>>>>>>> CA/US : 1.855.774.4678
>>>>>>>>> FR : 01 76 60 41 43
>>>>>>>>> UK : 0808 189 0423
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>>>>>>> <mailto:ceph-users@ceph.io>
>>>>>>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>>>>>>> <mailto:ceph-users-leave@ceph.io>
>>>>>>>
>>>>>
>>>
>
Hi guys,
When I update the pg_num of a pool, I found it not worked(no
rebalanced), anyone know the reason? Pool's info:
pool 21 'openstack-volumes-rs' replicated size 3 min_size 2 crush_rule
21 object_hash rjenkins pg_num 1024 pgp_num 512 pgp_num_target 1024
autoscale_mode warn last_change 85103 lfor 82044/82044/82044 flags
hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application rbd
removed_snaps [1~1e6,1e8~300,4e9~18,502~3f,542~11,554~1a,56f~1d7]
pool 22 'openstack-vms-rs' replicated size 3 min_size 2 crush_rule 22
object_hash rjenkins pg_num 512 pgp_num 512 pg_num_target 256
pgp_num_target 256 autoscale_mode warn last_change 84769 lfor 0/0/55294
flags hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application rbd
The pgp_num_target is set, but pgp_num not set.
I have scale out new OSDs and is backfilling before setting the value,
is it the reason?