- ceph-users - lists.ceph.io

CentOS deps for ceph-mgr-diskprediction-local

by Dan van der Ster

Hi there, Did anyone get the mgr diskprediction-local plugin working on CentOS ? When I enable the plugin with v14.2.3 I get: HEALTH_ERR 2 mgr modules have failed MGR_MODULE_ERROR 2 mgr modules have failed Module 'devicehealth' has failed: Failed to import _strptime because the import lockis held by another thread. Module 'diskprediction_local' has failed: No module named sklearn.svm.classes When the package is installed it brings in several deps but apparently these are not enough? Installing: ceph-mgr-diskprediction-local noarch 2:14.2.3-0.el7 ceph-noarch 1.1 M Installing for dependencies: atlas x86_64 3.10.1-12.el7 base 4.5 M blas x86_64 3.4.2-8.el7 base 399 k lapack x86_64 3.4.2-8.el7 base 5.4 M libgfortran x86_64 4.8.5-39.el7 cr 300 k libquadmath x86_64 4.8.5-39.el7 cr 190 k numpy x86_64 1:1.7.1-13.el7 base 2.8 M numpy-f2py x86_64 1:1.7.1-13.el7 base 206 k python-devel x86_64 2.7.5-86.el7 cr 398 k python-nose noarch 1.3.7-1.el7 base 276 k python-rpm-macros noarch 3-32.el7 cr 8.8 k python-srpm-macros noarch 3-32.el7 cr 8.4 k python2-rpm-macros noarch 3-32.el7 cr 7.7 k scipy x86_64 0.12.1-6.el7 base 9.3 M suitesparse x86_64 4.0.2-10.el7 base 928 k tbb x86_64 4.1-9.20130314.el7 base 124 k I've seen https://tracker.ceph.com/issues/38088 but didn't find the sklearn package in any standard repo. Thanks! Dan

4 years, 8 months

1
0
0 0

MDSs report slow metadata IOs

by burcarjo＠gmail.com

I have a cephFS 13.2.6 setup composed by 3 OSDs nodes + 1 MDS + 1 monitor. All the nodes are working with CentOS Linux release 7.6.1810 (Core) When writing multiple lot of files (500MB) then the "ls" command of this directory works very slow from an external client (if I list the directory from the same client node that is writing the files then the operation returns immediately): [cephuser@stor2demo ~]$ time ll -h /mnt/cephfs/dir1/dir2 real 2m40.246s user 0m0.002s sys 0m0.003s The Ceph logs shows this information: [cephuser@stor1demo ~]$ ceph health detail HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow requests; 1/10269 objects misplaced (0.010%) MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs mdsstor1demo(mds.0): 5 slow metadata IOs are blocked > 30 secs, oldest blocked for 34 secs MDS_SLOW_REQUEST 1 MDSs report slow requests mdsstor1demo(mds.0): 2 slow requests are blocked > 30 secs OBJECT_MISPLACED 1/10269 objects misplaced (0.010%) Why this behaviour ? Thanks.

4 years, 8 months

2
2
0 0

Re: ceph-volume lvm create leaves half-built OSDs lying around

by Jan Fajerski

On Wed, Sep 11, 2019 at 11:17:47AM +0100, Matthew Vernon wrote: >Hi, > >We keep finding part-made OSDs (they appear not attached to any host, >and down and out; but still counting towards the number of OSDs); we >never saw this with ceph-disk. On investigation, this is because >ceph-volume lvm create makes the OSD (ID and auth at least) too early in >the process and is then unable to roll-back cleanly (because the >bootstrap-osd credential isn't allowed to remove OSDs). > >As an example (very truncated): > >Running command: /usr/bin/ceph --cluster ceph --name >client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring >-i - osd new 20cea174-4c1b-4330-ad33-505a03156c33 >Running command: vgcreate --force --yes >ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e /dev/sdbh > stderr: Device /dev/sdbh not found (or ignored by filtering). > Unable to add physical volume '/dev/sdbh' to volume group >'ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e'. >--> Was unable to complete a new OSD, will rollback changes >--> OSD will be fully purged from the cluster, because the ID was generated >Running command: ceph osd purge osd.828 --yes-i-really-mean-it > stderr: 2019-09-10 15:07:53.396528 7fbca2caf700 -1 auth: unable to find >a keyring on >/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: >(2) No such file or directory > stderr: 2019-09-10 15:07:53.397318 7fbca2caf700 -1 monclient: >authenticate NOTE: no keyring found; disabled cephx authentication >2019-09-10 15:07:53.397334 7fbca2caf700 0 librados: client.admin >authentication error (95) Operation not supported > >This is annoying to have to clear up, and it seems to me could be >avoided by either: > >i) ceph-volume should (attempt to) set up the LVM volumes &c before >making the new OSD id >or >ii) allow the bootstrap-osd credential to purge OSDs > >i) seems like clearly the better answer...? Agreed. Would you mind opening a bug report on https://tracker.ceph.com/projects/ceph-volume. I have found other situation where a roll-back is working as it should, though not with as much impact as this. > >Regards, > >Matthew > >_______________________________________________ >ceph-users mailing list >ceph-users(a)lists.ceph.com >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jan Fajerski Senior Software Engineer Enterprise Storage SUSE Software Solutions Germany GmbH (HRB 247165, AG München) Geschäftsführer: Felix Imendörffer

4 years, 8 months

2
1
0 0

subscriptions from lists.ceph.com now on lists.ceph.io?

by Matthias Ferdinand

Hi, sorry to disturb you with list admin stuff. I haven't received any new ceph-users@ mail after August, 28th (I was subscribed to daily digest). lists.ceph.com now is defunct, mails are bounced. I hope that is only temporary, because most of my annotated ceph bookmarks point to lists.ceph.com... From the website it looks like the list was moved to ceph.io. I don't remember reading any announcement of this move (might be my fault). Tried re-subscribing at lists.ceph.io, but it says I am already subscribed. Tried logging in to check preferences, but my old password from lists.ceph.com does not work anymore. Created a new account, logged in and to me the subscription settings look ok. Can you help me here? Maybe it is just the digests that do not work? Please answer to me directly, as I am currently not receiving any list messages. Thank you Matthias Ferdinand

4 years, 8 months

2
1
0 0

Re: regurlary 'no space left on device' when deleting on cephfs

by Burkhard Linke

Hi, do you use hard links in your workload? The 'no space left on device' message may also refer to too many stray files. Strays are either files that are to be deleted (e.g. the purge queue), but also files which are deleted, but hard links are still pointing to the same content. Since cephfs does not use an indirect layer between inodes and data, and the data chunks are named after the inode id, removing the original file will leave stray entries since cephfs is not able to rename the underlying rados objects. There are 10 hidden directories for stray files, and given a maximum size of 100.000 entries you can store only up to 1 million entries. I don't know exactly how entries are distributed among the 10 directories, so the limit may be reached earlier for a single stray directory. The performance counters contains some values for stray, so they are easy to check. The daemonperf output also shows the current value. The problem of the upper limit of directory entries was solved by directory fragmentation, so you should check whether fragmentation is allowed in your filesystem. You can also try to increase the upper directory entry limit, but this might lead to other problems (too large rados omap objects....). Regards, Burkhard -- Dr. rer. nat. Burkhard Linke Bioinformatics and Systems Biology Justus-Liebig-University Giessen 35392 Giessen, Germany Phone: (+49) (0)641 9935810

4 years, 8 months

3
3
0 0

Re: FileStore OSD, journal direct symlinked, permission troubles.

by Marco Gaiarin

Riprendo quanto scritto nel suo messaggio del 29/08/2019... > Another possibilty is to convert the MBR to GPT (sgdisk --mbrtogpt) and > give the partition its UID (also sgdisk). Then it could be linked by > its uuid. and, in another email: > And I forgot that you can also re-create the journal by itself. I can't > recall the command ATM though. Ahem, i stated the jornal disk are also the OS disks, and i'm using old server, so i think that converting to GPT will lead to an unbootable node... But, the 'code' that identify (and change permission) for journal dev are PVE specific? or Ceph generic? I suppose the latter... Also, i've done: adduser ceph disk and partition devices are '660 root:disk': why still i get 'permission denied'? > Or if you are not in need of filestore OSDs, re-create them as bluestore > ones. AFAICS, Ceph has laid more focus on bluestore and it might be > better to do a conversion sooner than later. (my opinion) Not for now; bluestore migration need a bit more time/study/knowledge... -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)

4 years, 8 months

2
6
0 0

Warning: 1 pool nearfull and unbalanced data distribution

by Thomas

Hi, the output of ceph health details gives me a warning that concerns me a little. I'll explain in a second. root@ld3955:/mnt/rbd# ceph health detail HEALTH_WARN 1 nearfull osd(s); 1 pool(s) nearfull; 4 pools have too many placement groups OSD_NEARFULL 1 nearfull osd(s) osd.122 is near full POOL_NEARFULL 1 pool(s) nearfull pool 'hdb_backup' is nearfull POOL_TOO_MANY_PGS 4 pools have too many placement groups Pool pve_cephfs_data has 128 placement groups, should have 16 Pool hdd has 512 placement groups, should have 64 Pool pve_cephfs_metadata has 32 placement groups, should have 4 Pool backup has 1024 placement groups, should have 4 I'm writing +90% of the data in pool "hdb_backup", and this is ongoing. Therefore I can hardly afford that this pool is full. When I check the output of ceph osd status the relevand osd is somehow overutilized: root@ld3955:/mnt/rbd# ceph osd status | grep nearfull +-----+--------+-------+-------+--------+---------+--------+---------+--------------------+ | id | host | used | avail | wr ops | wr data | rd ops | rd data | state | +-----+--------+-------+-------+--------+---------+--------+---------+--------------------+ | 122 | ld5505 | 1448G | 227G | 0 | 0 | 0 | 0 | exists,nearfull,up | This looks like an inconsistency, but I can check the usage with ceph osd df. Here I can see that the OSDs are not really balanced: root@ld3955:/mnt/rbd# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 272 hdd 7.28000 1.00000 7.3 TiB 2.3 TiB 2.3 TiB 136 KiB 3.7 GiB 5.0 TiB 31.23 0.77 121 up 273 hdd 7.28000 1.00000 7.3 TiB 2.9 TiB 2.9 TiB 28 KiB 5.1 GiB 4.4 TiB 39.35 0.97 152 up 274 hdd 7.28000 1.00000 7.3 TiB 2.9 TiB 2.9 TiB 168 KiB 4.5 GiB 4.4 TiB 39.41 0.97 152 up 275 hdd 7.28000 1.00000 7.3 TiB 2.2 TiB 2.2 TiB 139 KiB 3.5 GiB 5.1 TiB 29.77 0.73 115 up 276 hdd 7.28000 1.00000 7.3 TiB 2.8 TiB 2.8 TiB 48 KiB 5.7 GiB 4.5 TiB 38.81 0.96 150 up 277 hdd 7.28000 1.00000 7.3 TiB 2.5 TiB 2.5 TiB 276 KiB 4.3 GiB 4.8 TiB 34.68 0.85 134 up 278 hdd 7.28000 1.00000 7.3 TiB 2.8 TiB 2.8 TiB 36 KiB 4.4 GiB 4.5 TiB 38.74 0.95 150 up 279 hdd 7.28000 1.00000 7.3 TiB 2.6 TiB 2.6 TiB 156 KiB 4.1 GiB 4.7 TiB 35.80 0.88 138 up 280 hdd 7.28000 1.00000 7.3 TiB 2.7 TiB 2.7 TiB 156 KiB 4.3 GiB 4.6 TiB 37.03 0.91 143 up 281 hdd 7.28000 1.00000 7.3 TiB 2.4 TiB 2.4 TiB 172 KiB 3.8 GiB 4.9 TiB 32.67 0.80 126 up 282 hdd 7.28000 1.00000 7.3 TiB 2.9 TiB 2.9 TiB 120 KiB 4.5 GiB 4.4 TiB 39.39 0.97 152 up 283 hdd 7.28000 1.00000 7.3 TiB 2.7 TiB 2.7 TiB 32 KiB 5.9 GiB 4.5 TiB 37.57 0.93 145 up [...] 76 hdd 1.64000 1.00000 1.6 TiB 1.4 TiB 1.4 TiB 88 KiB 2.4 GiB 268 GiB 84.02 2.07 73 up 77 hdd 1.64000 1.00000 1.6 TiB 1.1 TiB 1.1 TiB 188 KiB 2.0 GiB 560 GiB 66.59 1.64 58 up 78 hdd 1.64000 1.00000 1.6 TiB 1.0 TiB 1023 GiB 164 KiB 1.9 GiB 651 GiB 61.15 1.51 53 up 79 hdd 1.64000 1.00000 1.6 TiB 1.0 TiB 1.0 TiB 176 KiB 1.9 GiB 636 GiB 62.02 1.53 54 up 80 hdd 1.64000 1.00000 1.6 TiB 1.0 TiB 1.0 TiB 80 KiB 2.5 GiB 636 GiB 62.07 1.53 54 up 81 hdd 1.64000 1.00000 1.6 TiB 886 GiB 885 GiB 128 KiB 1.7 GiB 790 GiB 52.89 1.30 46 up 82 hdd 1.64000 1.00000 1.6 TiB 967 GiB 965 GiB 240 KiB 1.8 GiB 709 GiB 57.70 1.42 50 up 83 hdd 1.64000 1.00000 1.6 TiB 1.2 TiB 1.2 TiB 64 KiB 2.2 GiB 420 GiB 74.94 1.85 65 up 84 hdd 1.64000 1.00000 1.6 TiB 1.1 TiB 1.1 TiB 108 KiB 2.0 GiB 597 GiB 64.37 1.59 56 up 85 hdd 1.64000 1.00000 1.6 TiB 811 GiB 810 GiB 176 KiB 1.6 GiB 865 GiB 48.42 1.19 42 up 86 hdd 1.64000 1.00000 1.6 TiB 1.0 TiB 1.0 TiB 72 KiB 2.0 GiB 613 GiB 63.43 1.56 55 up 87 hdd 1.64000 1.00000 1.6 TiB 791 GiB 789 GiB 68 KiB 1.6 GiB 885 GiB 47.17 1.16 41 up 88 hdd 1.64000 1.00000 1.6 TiB 908 GiB 906 GiB 168 KiB 1.8 GiB 768 GiB 54.18 1.33 47 up [...] 113 hdd 1.64000 1.00000 1.6 TiB 1.3 TiB 1.3 TiB 100 KiB 3.0 GiB 342 GiB 79.60 1.96 69 up 114 hdd 1.64000 1.00000 1.6 TiB 1001 GiB 999 GiB 184 KiB 1.9 GiB 675 GiB 59.70 1.47 52 up 115 hdd 1.64000 1.00000 1.6 TiB 1.2 TiB 1.2 TiB 120 KiB 2.2 GiB 407 GiB 75.70 1.86 66 up 116 hdd 1.64000 1.00000 1.6 TiB 1.1 TiB 1.1 TiB 92 KiB 2.0 GiB 597 GiB 64.39 1.59 56 up 117 hdd 1.64000 1.00000 1.6 TiB 1.2 TiB 1.2 TiB 76 KiB 2.7 GiB 480 GiB 71.34 1.76 62 up 118 hdd 1.64000 1.00000 1.6 TiB 1.1 TiB 1.1 TiB 48 KiB 2.6 GiB 574 GiB 65.74 1.62 57 up 119 hdd 1.64000 1.00000 1.6 TiB 1.0 TiB 1.0 TiB 152 KiB 1.9 GiB 634 GiB 62.19 1.53 54 up 120 hdd 1.64000 1.00000 1.6 TiB 1.1 TiB 1.1 TiB 48 KiB 2.0 GiB 541 GiB 67.73 1.67 59 up 121 hdd 1.64000 1.00000 1.6 TiB 1.1 TiB 1.1 TiB 48 KiB 2.0 GiB 556 GiB 66.82 1.65 58 up 122 hdd 1.64000 0.95001 1.6 TiB 1.4 TiB 1.4 TiB 184 KiB 2.5 GiB 227 GiB 86.44 2.13 75 up I assume that this is a result of my Ceph Cluster history, means I started with 4 OSD nodes with 48 drives @1.8TB. This was 345TB in total. I started to fill this storage up to 75%. Then I added 2 OSD nodes with 48 drives @8TB. I was not expecting that pool "hdb_backup" would be filled up in the near future. This is the ouput of ceph df: root@ld3955:/mnt/rbd# ceph df RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 1.1 PiB 661 TiB 467 TiB 468 TiB 41.43 nvme 23 TiB 23 TiB 681 MiB 8.7 GiB 0.04 TOTAL 1.1 PiB 685 TiB 467 TiB 468 TiB 40.59 POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL backup 4 0 B 0 0 B 0 35 TiB hdb_backup 11 154 TiB 40.39M 154 TiB 64.18 29 TiB hdd 30 1.1 TiB 281.21k 1.1 TiB 1.01 35 TiB pve_cephfs_data 32 318 GiB 91.83k 318 GiB 0.30 35 TiB pve_cephfs_metadata 33 155 MiB 61 155 MiB 0 35 TiB nvme 35 0 B 0 0 B 0 7.4 TiB Question: How can I start rebalancing data in order to have more data in the larger drives (8TB)? Or is it ok that the smaller drives (1.8TB) are filled by +60%? THX for your advice Thomas

4 years, 8 months

1
0
0 0

unsubscribe

by Gökhan Kocak

4 years, 8 months

2
1
0 0

2 OpenStack environment, 1 Ceph cluster

by vladimir franciz blando

I have 2 OpenStack environment that I want to integrate to an existing ceph cluster. I know technically it can be done but has anyone tried this? - Vlad ᐧ

4 years, 8 months

5
5
0 0

[nautilus] Dashboard & RADOSGW

by DHilsbos＠performair.com

All; We're trying to add a RADOSGW instance to our new production cluster, and it's not showing in the dashboard, or in ceph -s. The cluster is running 14.2.2, and the RADOSGW got 14.2.3. systemctl status ceph-radosgw@ rgw.s700037 returns: active (running). ss -ntlp does NOT show port 80. Here's the ceph.conf on the system: [global] fsid = effc5134-e0cc-4628-a079-d67b60071f90 mon initial members = s700034,s700035,s700036 mon host = [v1:10.0.80.10:6789/0,v2:10.0.80.10:3300/0],[v1:10.0.80.11:6789/0,v2:10.0.80.11:3300/0],[v1:10.0.80.12:6789/0,v2:10.0.80.12:3300/0] public network = 10.0.80.0/24 cluster network = 10.0.88.0/24 auth cluster required = cephx auth service required = cephx auth client required = cephx osd journal size = 1024 osd pool default size = 3 osd pool default min size = 2 osd pool default pg num = 8 osd pool default pgp num = 8 [client.rgw.s700037] host = s700037.performair.local rgw frontends = "civetweb port=80" rgw dns name = radosgw.performair.local Any thoughts on what I'm missing? I'm also seeing these in the manager's logs: 2019-09-10 15:49:43.946 7efe6eee1700 0 mgr[dashboard] [10/Sep/2019:15:49:43] ENGINE Error in HTTPServer.tick Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", line 1837, in start self.tick() File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", line 1902, in tick s, ssl_env = self.ssl_adapter.wrap(s) File "/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/ssl_builtin.py", line 52, in wrap keyfile=self.private_key, ssl_version=ssl.PROTOCOL_SSLv23) File "/usr/lib64/python2.7/ssl.py", line 934, in wrap_socket ciphers=ciphers) File "/usr/lib64/python2.7/ssl.py", line 609, in __init__ self.do_handshake() File "/usr/lib64/python2.7/ssl.py", line 831, in do_handshake self._sslobj.do_handshake() SSLError: [SSL: SSLV3_ALERT_CERTIFICATE_UNKNOWN] sslv3 alert certificate unknown (_ssl.c:618) Thoughts on this? Thank you, Dominic L. Hilsbos, MBA Director - Information Technology Perform Air International Inc. DHilsbos(a)PerformAir.com www.PerformAir.com

4 years, 8 months

1
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users