Hi there,
Did anyone get the mgr diskprediction-local plugin working on CentOS ?
When I enable the plugin with v14.2.3 I get:
HEALTH_ERR 2 mgr modules have failed
MGR_MODULE_ERROR 2 mgr modules have failed
Module 'devicehealth' has failed: Failed to import _strptime
because the import lockis held by another thread.
Module 'diskprediction_local' has failed: No module named
sklearn.svm.classes
When the package is installed it brings in several deps but apparently
these are not enough?
Installing:
ceph-mgr-diskprediction-local noarch 2:14.2.3-0.el7
ceph-noarch 1.1 M
Installing for dependencies:
atlas x86_64 3.10.1-12.el7
base 4.5 M
blas x86_64 3.4.2-8.el7
base 399 k
lapack x86_64 3.4.2-8.el7
base 5.4 M
libgfortran x86_64 4.8.5-39.el7
cr 300 k
libquadmath x86_64 4.8.5-39.el7
cr 190 k
numpy x86_64 1:1.7.1-13.el7
base 2.8 M
numpy-f2py x86_64 1:1.7.1-13.el7
base 206 k
python-devel x86_64 2.7.5-86.el7
cr 398 k
python-nose noarch 1.3.7-1.el7
base 276 k
python-rpm-macros noarch 3-32.el7
cr 8.8 k
python-srpm-macros noarch 3-32.el7
cr 8.4 k
python2-rpm-macros noarch 3-32.el7
cr 7.7 k
scipy x86_64 0.12.1-6.el7
base 9.3 M
suitesparse x86_64 4.0.2-10.el7
base 928 k
tbb x86_64 4.1-9.20130314.el7
base 124 k
I've seen https://tracker.ceph.com/issues/38088 but didn't find the
sklearn package in any standard repo.
Thanks!
Dan
I have a cephFS 13.2.6 setup composed by 3 OSDs nodes + 1 MDS + 1 monitor. All the nodes are working with CentOS Linux release 7.6.1810 (Core)
When writing multiple lot of files (500MB) then the "ls" command of this directory works very slow from an external client (if I list the directory from the same client node that is writing the files then the operation returns immediately):
[cephuser@stor2demo ~]$ time ll -h /mnt/cephfs/dir1/dir2
real 2m40.246s
user 0m0.002s
sys 0m0.003s
The Ceph logs shows this information:
[cephuser@stor1demo ~]$ ceph health detail
HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow requests; 1/10269 objects misplaced (0.010%)
MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
mdsstor1demo(mds.0): 5 slow metadata IOs are blocked > 30 secs, oldest blocked for 34 secs
MDS_SLOW_REQUEST 1 MDSs report slow requests
mdsstor1demo(mds.0): 2 slow requests are blocked > 30 secs
OBJECT_MISPLACED 1/10269 objects misplaced (0.010%)
Why this behaviour ?
Thanks.
On Wed, Sep 11, 2019 at 11:17:47AM +0100, Matthew Vernon wrote:
>Hi,
>
>We keep finding part-made OSDs (they appear not attached to any host,
>and down and out; but still counting towards the number of OSDs); we
>never saw this with ceph-disk. On investigation, this is because
>ceph-volume lvm create makes the OSD (ID and auth at least) too early in
>the process and is then unable to roll-back cleanly (because the
>bootstrap-osd credential isn't allowed to remove OSDs).
>
>As an example (very truncated):
>
>Running command: /usr/bin/ceph --cluster ceph --name
>client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
>-i - osd new 20cea174-4c1b-4330-ad33-505a03156c33
>Running command: vgcreate --force --yes
>ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e /dev/sdbh
> stderr: Device /dev/sdbh not found (or ignored by filtering).
> Unable to add physical volume '/dev/sdbh' to volume group
>'ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e'.
>--> Was unable to complete a new OSD, will rollback changes
>--> OSD will be fully purged from the cluster, because the ID was generated
>Running command: ceph osd purge osd.828 --yes-i-really-mean-it
> stderr: 2019-09-10 15:07:53.396528 7fbca2caf700 -1 auth: unable to find
>a keyring on
>/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:
>(2) No such file or directory
> stderr: 2019-09-10 15:07:53.397318 7fbca2caf700 -1 monclient:
>authenticate NOTE: no keyring found; disabled cephx authentication
>2019-09-10 15:07:53.397334 7fbca2caf700 0 librados: client.admin
>authentication error (95) Operation not supported
>
>This is annoying to have to clear up, and it seems to me could be
>avoided by either:
>
>i) ceph-volume should (attempt to) set up the LVM volumes &c before
>making the new OSD id
>or
>ii) allow the bootstrap-osd credential to purge OSDs
>
>i) seems like clearly the better answer...?
Agreed. Would you mind opening a bug report on
https://tracker.ceph.com/projects/ceph-volume.
I have found other situation where a roll-back is working as it should, though
not with as much impact as this.
>
>Regards,
>
>Matthew
>
>_______________________________________________
>ceph-users mailing list
>ceph-users(a)lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Jan Fajerski
Senior Software Engineer Enterprise Storage
SUSE Software Solutions Germany GmbH
(HRB 247165, AG München)
Geschäftsführer: Felix Imendörffer
Hi,
sorry to disturb you with list admin stuff. I haven't received any new
ceph-users@ mail after August, 28th (I was subscribed to daily digest).
lists.ceph.com now is defunct, mails are bounced. I hope that is only
temporary, because most of my annotated ceph bookmarks point to
lists.ceph.com...
From the website it looks like the list was moved to ceph.io. I don't
remember reading any announcement of this move (might be my fault).
Tried re-subscribing at lists.ceph.io, but it says I am already
subscribed. Tried logging in to check preferences, but my old password
from lists.ceph.com does not work anymore. Created a new account, logged
in and to me the subscription settings look ok.
Can you help me here? Maybe it is just the digests that do not work?
Please answer to me directly, as I am currently not receiving any list
messages.
Thank you
Matthias Ferdinand
Hi,
do you use hard links in your workload? The 'no space left on device'
message may also refer to too many stray files. Strays are either files
that are to be deleted (e.g. the purge queue), but also files which are
deleted, but hard links are still pointing to the same content. Since
cephfs does not use an indirect layer between inodes and data, and the
data chunks are named after the inode id, removing the original file
will leave stray entries since cephfs is not able to rename the
underlying rados objects.
There are 10 hidden directories for stray files, and given a maximum
size of 100.000 entries you can store only up to 1 million entries. I
don't know exactly how entries are distributed among the 10 directories,
so the limit may be reached earlier for a single stray directory. The
performance counters contains some values for stray, so they are easy to
check. The daemonperf output also shows the current value.
The problem of the upper limit of directory entries was solved by
directory fragmentation, so you should check whether fragmentation is
allowed in your filesystem. You can also try to increase the upper
directory entry limit, but this might lead to other problems (too large
rados omap objects....).
Regards,
Burkhard
--
Dr. rer. nat. Burkhard Linke
Bioinformatics and Systems Biology
Justus-Liebig-University Giessen
35392 Giessen, Germany
Phone: (+49) (0)641 9935810
Riprendo quanto scritto nel suo messaggio del 29/08/2019...
> Another possibilty is to convert the MBR to GPT (sgdisk --mbrtogpt) and
> give the partition its UID (also sgdisk). Then it could be linked by
> its uuid.
and, in another email:
> And I forgot that you can also re-create the journal by itself. I can't
> recall the command ATM though.
Ahem, i stated the jornal disk are also the OS disks, and i'm using old
server, so i think that converting to GPT will lead to an unbootable
node...
But, the 'code' that identify (and change permission) for journal dev
are PVE specific? or Ceph generic? I suppose the latter...
Also, i've done:
adduser ceph disk
and partition devices are '660 root:disk': why still i get 'permission
denied'?
> Or if you are not in need of filestore OSDs, re-create them as bluestore
> ones. AFAICS, Ceph has laid more focus on bluestore and it might be
> better to do a conversion sooner than later. (my opinion)
Not for now; bluestore migration need a bit more
time/study/knowledge...
--
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/
Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN)
marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797
Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
I have 2 OpenStack environment that I want to integrate to an existing ceph
cluster. I know technically it can be done but has anyone tried this?
- Vlad
ᐧ
All;
We're trying to add a RADOSGW instance to our new production cluster, and it's not showing in the dashboard, or in ceph -s.
The cluster is running 14.2.2, and the RADOSGW got 14.2.3.
systemctl status ceph-radosgw@ rgw.s700037 returns: active (running).
ss -ntlp does NOT show port 80.
Here's the ceph.conf on the system:
[global]
fsid = effc5134-e0cc-4628-a079-d67b60071f90
mon initial members = s700034,s700035,s700036
mon host = [v1:10.0.80.10:6789/0,v2:10.0.80.10:3300/0],[v1:10.0.80.11:6789/0,v2:10.0.80.11:3300/0],[v1:10.0.80.12:6789/0,v2:10.0.80.12:3300/0]
public network = 10.0.80.0/24
cluster network = 10.0.88.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 1024
osd pool default size = 3
osd pool default min size = 2
osd pool default pg num = 8
osd pool default pgp num = 8
[client.rgw.s700037]
host = s700037.performair.local
rgw frontends = "civetweb port=80"
rgw dns name = radosgw.performair.local
Any thoughts on what I'm missing?
I'm also seeing these in the manager's logs:
2019-09-10 15:49:43.946 7efe6eee1700 0 mgr[dashboard] [10/Sep/2019:15:49:43] ENGINE Error in HTTPServer.tick
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", line 1837, in start
self.tick()
File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", line 1902, in tick
s, ssl_env = self.ssl_adapter.wrap(s)
File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/ssl_builtin.py", line 52, in wrap
keyfile=self.private_key, ssl_version=ssl.PROTOCOL_SSLv23)
File "/usr/lib64/python2.7/ssl.py", line 934, in wrap_socket
ciphers=ciphers)
File "/usr/lib64/python2.7/ssl.py", line 609, in __init__
self.do_handshake()
File "/usr/lib64/python2.7/ssl.py", line 831, in do_handshake
self._sslobj.do_handshake()
SSLError: [SSL: SSLV3_ALERT_CERTIFICATE_UNKNOWN] sslv3 alert certificate unknown (_ssl.c:618)
Thoughts on this?
Thank you,
Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
DHilsbos(a)PerformAir.com
www.PerformAir.com