I created a cluster on Nautilus 14.2.0, then upgraded to 14.2.1, and finally 14.2.3
Now I am seeing this warning that I thought should only appear if the cluster was created pre-Nautilus.
Legacy BlueStore stats reporting detected on XX OSD
I can't seem to find any information about this on a cluster that was always on 14.2
Gavin
Hi,
I have successfully configured Ceph dashboard following the this
<https://docs.ceph.com/docs/master/mgr/dashboard> documentation.
According to the documentation you can configure a URL prefix with this
command:
ceph config set mgr mgr/dashboard/url_prefix $PREFIX
However when I try to access the dashboard with this URL
http://$IP:$PORT/$PREFIX/ I get an error:
{"status": "404 Not Found", "version": "8.9.1", "detail": "The path
'/dashboard' was not found.", "traceback": "Traceback (most recent call
last):\n File
\"/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py\", line 670,
in respond\n response.body = self.handler()\n File
\"/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py\", line 220,
in __call__\n self.body = self.oldhandler(*args, **kwargs)\n File
\"/usr/lib/python2.7/dist-packages/cherrypy/_cperror.py\", line 415, in
__call__\n raise self\nNotFound: (404, \"The path '/dashboard' was
not found.\")\n"}
A workaround for this error is to use this URL:
http://$IP:$PORT/#/$PREFIX/
Using the # ensures that redirect to active MGR service is working.
Regards
Thomas
Hi,
I've upgraded to Nautilus from Mimic a while ago and enabled the pg_autoscaler.
When pg_autoscaler was activated I got a HEALTH_WARN regarding:
POOL_TARGET_SIZE_BYTES_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_bytes
Pools ['cephfs_data_reduced', 'cephfs_data', 'cephfs_metadata'] overcommit available storage by 1.460x due to target_size_bytes 0 on pools []
POOL_TARGET_SIZE_RATIO_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_ratio
Pools ['cephfs_data_reduced', 'cephfs_data', 'cephfs_metadata'] overcommit available storage by 1.460x due to target_size_ratio 0.000 on pools []
Both target_size_bytes and target_size_ratio on all the pools are set to 0, so I started to wonder why this error message appear.
My autoscale-status looks like this:
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
cephfs_metadata 16708M 4.0 34465G 0.0019 1.0 8 warn
cephfs_data_reduced 15506G 2.0 34465G 0.8998 1.0 375 warn
cephfs_data 6451G 3.0 34465G 0.5616 1.0 250 warn
So the ratio in total is 1.4633..
Isn't 1.0 of the combined ratio of all pools equal of full?
I also enabled the Dashboard and saw that the PG Status showed "645% clean" PG's.
This cluster was originally installed with version Jewel, so may it be any legacy setting or such that causing this?
when create user rgw using command :
radosgw-admin user create --uid={username} --display-name="{display-name}" [--email={email}]
There are 2 flag that I can use —system and —admin. What is this flag for ?
--
------------------------------------------------------------
Wahyu Muqsita Wardana
System Engineer
------------------------------------------------------------
Jl. Ampera Raya Nomor 22, Cilandak Timur
Jakarta Selatan 12560, Indonesia.
T. +62217182008 | M. +62 8227 3185 744
www.bukalapak.com
This message and any attachments are confidential and intended solely for the use of the individual to whom it is addressed.
Hi!
We're looking to mantain our rgw pools out of orphans objects, checking the
documentation and mailist is not really clear how it works and what will do.
Radosgw-admin orphands find -pool= --job-id=
Loops over all objects in the cluster looking for leaked objects and add it
to a shard in the pool rgw.log.
For us after more than 72 hours running stuck with 24 GB ram used
Console show :
7fc6ca719700 0 run(): building index of all bucket indexes
7fc6ca719700 0 run(): building index of all linked objects
7fc6ca719700 0 building linked oids index: 0/64
7fc6ca719700 0 building linked oids index: 1/64
Checking the RGW.log pool it generated 64 largeomaps log file.
Anyone got experience with orphans objects?
We calculated near 80-100TB orphans objects in our cluster.
Regards
Manuel
Hi there,
Did anyone get the mgr diskprediction-local plugin working on CentOS ?
When I enable the plugin with v14.2.3 I get:
HEALTH_ERR 2 mgr modules have failed
MGR_MODULE_ERROR 2 mgr modules have failed
Module 'devicehealth' has failed: Failed to import _strptime
because the import lockis held by another thread.
Module 'diskprediction_local' has failed: No module named
sklearn.svm.classes
When the package is installed it brings in several deps but apparently
these are not enough?
Installing:
ceph-mgr-diskprediction-local noarch 2:14.2.3-0.el7
ceph-noarch 1.1 M
Installing for dependencies:
atlas x86_64 3.10.1-12.el7
base 4.5 M
blas x86_64 3.4.2-8.el7
base 399 k
lapack x86_64 3.4.2-8.el7
base 5.4 M
libgfortran x86_64 4.8.5-39.el7
cr 300 k
libquadmath x86_64 4.8.5-39.el7
cr 190 k
numpy x86_64 1:1.7.1-13.el7
base 2.8 M
numpy-f2py x86_64 1:1.7.1-13.el7
base 206 k
python-devel x86_64 2.7.5-86.el7
cr 398 k
python-nose noarch 1.3.7-1.el7
base 276 k
python-rpm-macros noarch 3-32.el7
cr 8.8 k
python-srpm-macros noarch 3-32.el7
cr 8.4 k
python2-rpm-macros noarch 3-32.el7
cr 7.7 k
scipy x86_64 0.12.1-6.el7
base 9.3 M
suitesparse x86_64 4.0.2-10.el7
base 928 k
tbb x86_64 4.1-9.20130314.el7
base 124 k
I've seen https://tracker.ceph.com/issues/38088 but didn't find the
sklearn package in any standard repo.
Thanks!
Dan
I have a cephFS 13.2.6 setup composed by 3 OSDs nodes + 1 MDS + 1 monitor. All the nodes are working with CentOS Linux release 7.6.1810 (Core)
When writing multiple lot of files (500MB) then the "ls" command of this directory works very slow from an external client (if I list the directory from the same client node that is writing the files then the operation returns immediately):
[cephuser@stor2demo ~]$ time ll -h /mnt/cephfs/dir1/dir2
real 2m40.246s
user 0m0.002s
sys 0m0.003s
The Ceph logs shows this information:
[cephuser@stor1demo ~]$ ceph health detail
HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow requests; 1/10269 objects misplaced (0.010%)
MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
mdsstor1demo(mds.0): 5 slow metadata IOs are blocked > 30 secs, oldest blocked for 34 secs
MDS_SLOW_REQUEST 1 MDSs report slow requests
mdsstor1demo(mds.0): 2 slow requests are blocked > 30 secs
OBJECT_MISPLACED 1/10269 objects misplaced (0.010%)
Why this behaviour ?
Thanks.
On Wed, Sep 11, 2019 at 11:17:47AM +0100, Matthew Vernon wrote:
>Hi,
>
>We keep finding part-made OSDs (they appear not attached to any host,
>and down and out; but still counting towards the number of OSDs); we
>never saw this with ceph-disk. On investigation, this is because
>ceph-volume lvm create makes the OSD (ID and auth at least) too early in
>the process and is then unable to roll-back cleanly (because the
>bootstrap-osd credential isn't allowed to remove OSDs).
>
>As an example (very truncated):
>
>Running command: /usr/bin/ceph --cluster ceph --name
>client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
>-i - osd new 20cea174-4c1b-4330-ad33-505a03156c33
>Running command: vgcreate --force --yes
>ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e /dev/sdbh
> stderr: Device /dev/sdbh not found (or ignored by filtering).
> Unable to add physical volume '/dev/sdbh' to volume group
>'ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e'.
>--> Was unable to complete a new OSD, will rollback changes
>--> OSD will be fully purged from the cluster, because the ID was generated
>Running command: ceph osd purge osd.828 --yes-i-really-mean-it
> stderr: 2019-09-10 15:07:53.396528 7fbca2caf700 -1 auth: unable to find
>a keyring on
>/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:
>(2) No such file or directory
> stderr: 2019-09-10 15:07:53.397318 7fbca2caf700 -1 monclient:
>authenticate NOTE: no keyring found; disabled cephx authentication
>2019-09-10 15:07:53.397334 7fbca2caf700 0 librados: client.admin
>authentication error (95) Operation not supported
>
>This is annoying to have to clear up, and it seems to me could be
>avoided by either:
>
>i) ceph-volume should (attempt to) set up the LVM volumes &c before
>making the new OSD id
>or
>ii) allow the bootstrap-osd credential to purge OSDs
>
>i) seems like clearly the better answer...?
Agreed. Would you mind opening a bug report on
https://tracker.ceph.com/projects/ceph-volume.
I have found other situation where a roll-back is working as it should, though
not with as much impact as this.
>
>Regards,
>
>Matthew
>
>_______________________________________________
>ceph-users mailing list
>ceph-users(a)lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Jan Fajerski
Senior Software Engineer Enterprise Storage
SUSE Software Solutions Germany GmbH
(HRB 247165, AG München)
Geschäftsführer: Felix Imendörffer
Hi,
sorry to disturb you with list admin stuff. I haven't received any new
ceph-users@ mail after August, 28th (I was subscribed to daily digest).
lists.ceph.com now is defunct, mails are bounced. I hope that is only
temporary, because most of my annotated ceph bookmarks point to
lists.ceph.com...
From the website it looks like the list was moved to ceph.io. I don't
remember reading any announcement of this move (might be my fault).
Tried re-subscribing at lists.ceph.io, but it says I am already
subscribed. Tried logging in to check preferences, but my old password
from lists.ceph.com does not work anymore. Created a new account, logged
in and to me the subscription settings look ok.
Can you help me here? Maybe it is just the digests that do not work?
Please answer to me directly, as I am currently not receiving any list
messages.
Thank you
Matthias Ferdinand
Hi,
do you use hard links in your workload? The 'no space left on device'
message may also refer to too many stray files. Strays are either files
that are to be deleted (e.g. the purge queue), but also files which are
deleted, but hard links are still pointing to the same content. Since
cephfs does not use an indirect layer between inodes and data, and the
data chunks are named after the inode id, removing the original file
will leave stray entries since cephfs is not able to rename the
underlying rados objects.
There are 10 hidden directories for stray files, and given a maximum
size of 100.000 entries you can store only up to 1 million entries. I
don't know exactly how entries are distributed among the 10 directories,
so the limit may be reached earlier for a single stray directory. The
performance counters contains some values for stray, so they are easy to
check. The daemonperf output also shows the current value.
The problem of the upper limit of directory entries was solved by
directory fragmentation, so you should check whether fragmentation is
allowed in your filesystem. You can also try to increase the upper
directory entry limit, but this might lead to other problems (too large
rados omap objects....).
Regards,
Burkhard
--
Dr. rer. nat. Burkhard Linke
Bioinformatics and Systems Biology
Justus-Liebig-University Giessen
35392 Giessen, Germany
Phone: (+49) (0)641 9935810