Hello,
I have a Nautilus (14.2.8) cluster and I'd like to give access to a pool with librados to a user.
Here what I have
> # ceph osd pool ls detail | grep user1
> pool 5 'user1' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode warn last_change 108 flags hashpspool max_bytes 1099511627776 stripe_width 0 application user1
> # ceph auth get client.user1
> exported keyring for client.user1> client.user1
> key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX==
> caps: [mon] allow r
> caps: [osd] allow rw pool=user1 namespace=user1
On the client
> $ cat ~/ceph.conf> [global]
> mon host = [v2:10.90.36.16:3300,v1:10.90.36.16:6789],[v2:10.90.36.17:3300,v1:10.90.36.17:6789],[v2:10.90.36.18:3300,v1:10.90.36.18:6789]
> keyring = ~/user1.keyring
> $ cat ~/user1.keyring
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX==
> $ rados -c ~/ceph.conf -p pool ls
> 2020-04-02 12:44:59.900 7fd78aea3700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
> 2020-04-02 12:44:59.900 7fd789ea1700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
> 2020-04-02 12:44:59.900 7fd78a6a2700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
> failed to fetch mon config (--no-mon-config to skip)
Is there something I missed?
Thanks for your help.
Best regards,
--
Yoann Moulin
EPFL IC-IT
I already have the time logged, I do not need it a second time.
Mar 31 13:39:59 c01 ceph-mgr: 2020-03-31 13:39:59.518 7f554edc8700 0
log_channel(cluster) log [DBG] : pgmap v672065: 384 pgs: 384
active+clean;
I already have the time logged, I do not need it a second time.
Mar 31 13:39:59 c01 ceph-mgr: 2020-03-31 13:39:59.518 7f554edc8700 0
log_channel(cluster) log [DBG] : pgmap v672065: 384 pgs: 384
active+clean;
I already have the time logged, I do not need it a second time.
Mar 31 13:39:59 c01 ceph-mgr: 2020-03-31 13:39:59.518 7f554edc8700 0
log_channel(cluster) log [DBG] : pgmap v672065: 384 pgs: 384
active+clean;
I already have the time logged, I do not need it a second time.
Mar 31 13:39:59 c01 ceph-mgr: 2020-03-31 13:39:59.518 7f554edc8700 0
log_channel(cluster) log [DBG] : pgmap v672065: 384 pgs: 384
active+clean;
Hi,
I'm trying to understand the "LARGE_OMAP_OBJECTS 1 large omap objects"
warning for out cephfs metadata pool.
It seems that pg 5.26 has a large omap object with > 200k keys
[WRN] : Large omap object found. Object:
5:654134d2:::mds0_openfiles.0:head PG: 5.4b2c82a6 (5.26) Key count:
286083 Size (bytes): 14043228
I guess this object is related to the open files on the (cephfs
mds0_openfiles.0). But what exactly does it tell me? Is the number of
keys the number of currently open files?
If yes, this is not matching the sum of open files over all clients
obtained with lsof (which is less than 1000).
So how can I get rid of this? (Reboot the clients?)
Thanks for your help
Dietmar
--
_________________________________________
D i e t m a r R i e d e r, Mag.Dr.
Innsbruck Medical University
Biocenter - Institute of Bioinformatics
Innrain 80, 6020 Innsbruck
Email: dietmar.rieder(a)i-med.ac.at
Web: http://www.icbi.at
Hi,
I am currently building a 10 node Ceph cluster, each OSD node has 2x 25 Gbit/s nics, and I have 2 TOR switches (mlag not supported).
enp179s0f0 -> sw1
enp179s0f1 -> sw2
vlan 323 is used for ‘public network’
vlan 324 is used for ‘cluster network’
My desired configuration is to create two bond interfaces in active-backup mode:
bond0
- enp179s0f0.323 (active)
- enp179s0f1.323 (backup)
bond1
- enp179s0f0.324 (backup)
- enp179s0f1.324 (active)
This way, the public network will use switch1, and the cluster network will use switch2, under normal operation.
I am, however, having an issue implementing this configuration in Ubuntu 18.04 with netplan (see configuration at the end of this post).
When I reboot a node with the below netplan configuration, the bond interface is created, but the vlan interfaces are not added to the bond.
I see the following errors in the log:
systemd-networkd[1641]: enp179s0f0.323: Enslaving by 'bond0’
systemd-networkd[1641]: bond0: Enslaving link 'enp179s0f0.323’
systemd-networkd[1641]: enp179s0f1.323: Enslaving by 'bond0’
systemd-networkd[1641]: bond0: Enslaving link 'enp179s0f1.323’
systemd-networkd[1643]: enp179s0f1.323: Could not join netdev: Operation not permitted
systemd-networkd[1643]: enp179s0f1.323: Failed
systemd-networkd[1643]: enp179s0f0.323: Could not join netdev: Operation not permitted
systemd-networkd[1643]: enp179s0f0.323: Failed
If I manually run ’systemctl restart systemd-networkd’ after boot has completed, then the bond is successfully created with the vlan interfaces.
Does anybody have a similar configuration working specifically with netplan/networkd? Could you please share your configuration?
Netplan config that doesn’t work at boot time:
network:
version: 2
renderer: networkd
ethernets:
enp179s0f0: {}
enp179s0f1: {}
bonds:
bond0:
dhcp4: false
dhcp6: false
interfaces:
- enp179s0f0.323
- enp179s0f1.323
parameters:
mode: active-backup
primary: enp179s0f0.323
mii-monitor-interval: 1
addresses: [insert address here]
bond1:
dhcp4: false
dhcp6: false
interfaces:
- enp179s0f0.324
- enp179s0f1.324
parameters:
mode: active-backup
primary: enp179s0f1.324
mii-monitor-interval: 1
addresses: [insert address here]
vlans:
enp179s0f0.323:
id: 323
link: enp179s0f0
enp179s0f1.323:
id: 323
link: enp179s0f1
enp179s0f0.324:
id: 324
link: enp179s0f0
enp179s0f1.324:
id: 324
link: enp179s0f1
Hi everyone,
I'm working on replacing OSDs node with the newer one. The new host has the new hostname and new disk (faster one but the same size with old disk). My plan is
- Reweight the OSD to zero to spread all existed data to the rest nodes to keep data availability
- set flag noout, norebalance, norecover, nobackfill, destroy the OSD and join the new OSD as the same ID of the old one.
By above approach, the cluster will remap PGs of all nodes. Each data will be moved twice times until it reach the new OSD (reweight and join new node as same ID)
I also did the other way that only set flags and destroy OSD. But the result is still the same (degraded objects from destroyed osd and misplaced object after joining new osd)
Are there any ways to replace the OSD node directly without remapping PGs of the whole cluster?
Many thanks!
Nghia.
Hi all,
Using sendfile function to write data to cephfs, the data doesn't end up being written.
From the client that writes the file, it looks correct at first, but from all other ceph clients, the size is 0 bytes. Re-mounting the filesystem, the data is lost.
I didn't see any errors, the data just doesn't get written, as if it's just cached in cephfs client.
Writing just an extra byte at the end of the file (without sendfile), it seems to trigger the actual write of all the data.
Could someone else confirm if they are also seeing such issue? I'm on ceph 13.2.8, using kernel module for mounting on CentOS7.
I've used this sendfile-example for the example below:
https://github.com/pijewski/sendfile-example/blob/master/sendfile.c
Using a small 27 byte source file.
# ls -lh examples/
-rw-r--r-- 1 root c3-staff 27 Mar 24 18:04 src
# ./sendfile examples/src examples/dst 27
# ls -lh examples/
------x--- 1 root c3-staff 27 Mar 24 18:12 dst
-rw-r--r-- 1 root c3-staff 27 Mar 24 18:04 src
But, directory is still on 27 bytes:
# ls -lhd examples
drwxr-sr-x 1 root c3-staff 27 Mar 24 18:15 examples
and on all other cephfs clients, the file is empty:
# ls -lh examples/
------x--- 1 root c3-staff 0 Mar 24 18:12 dst
-rw-r--r-- 1 root c3-staff 27 Mar 24 18:04 src
Is this a bug in cephfs, or should I not expect sendfile to work (as it is not posix compliant). There are no error reported from what i can see, and it is 100% reproducible
Best regards, Mikael
Doh, I hope so!
On Wed, Apr 1, 2020 at 5:35 PM Marc Roos <M.Roos(a)f1-outsourcing.eu> wrote:
>
> April fools day!!!!!! :)
>
>
> -----Original Message-----
> Sent: 01 April 2020 17:28
> To: ceph-users(a)ceph.io
> Subject: [ceph-users] [Octopus] Beware the on-disk conversion
>
> Hi,
>
> As the upgrade documentation tells:
> > Note that the first time each OSD starts, it will do a format
> > conversion to improve the accounting for omap data. This may take
> a
> > few minutes to as much as a few hours (for an HDD with lots of omap
> > data). You can disable this automatic conversion with:
>
> What the documentation does not say is that this process takes a lot of
> memory
>
> I am upgrading a rusty cluster from Nautilus, you can check out the ram
> consumption as attachment
>
> First, we have a 3TB osd conversion: it tooks ~15min, and 19GB of memory
>
> Then, we have a larger 6TB osd conversion: it tooks more than 2 hours,
> and 35GB of memory
>
> Finally, you have the largest 10TB osd: only 1H15, but 52GB of memory
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
> email to ceph-users-leave(a)ceph.io
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
Dear all,
I have two observations regarding bluestore compression config:
1) ceph.conf settings seem to be ignored.
2) The SSD default values seem not to save space using compression.
To 1) We are running a mimic 13.2.8 cluster with OSDs deployed under mimic 13.2.2. Back then the interpretation of compression parameters was messed up, which has been fixed along the way from 13.2.2 to 13.2.8. To get it to work properly under 13.2.2 I needed to include these settings in ceph.conf:
[osd]
bluestore compression mode = aggressive
bluestore compression min blob size hdd = 262144
and then also enable compression on all pools that should use compression. These settings are still present in ceph.conf, but they seem to be ignored when populating the config data base on mon startup or querying config parameters:
# ceph config get osd.16 bluestore_compression_min_blob_size_hdd
131072
However:
# ceph tell osd.16 config get bluestore_compression_min_blob_size_hdd
262144
and:
# ceph config show osd.16
NAME VALUE SOURCE OVERRIDES IGNORES
bluestore_compression_min_blob_size_hdd 262144 file
This is really confusing. Is this intended? Which values will be used when deploying new OSDs?
In general, it would really be helpful if one could query daemon/parameter groups as in " ceph config get osd bluestore_compression_min_blob_size_hdd" to get a list right away.
To 2) In a long-long-ago discussion about how compression works, I was told that a blob of bluestore_compression_min_blob_size will be compressed and then distributed over a number of allocations of bluestore_min_alloc_size. The defaults for HDD and SSD are:
bluestore_compression_min_blob_size_hdd 131072
bluestore_min_alloc_size_hdd 65536
bluestore_compression_min_blob_size_ssd 8192
bluestore_min_alloc_size_ssd 16384
If this explanation of the compression method is correct, these defaults allow up to 50% savings for HDD, but, erm, 0% for SSD as the uncompressed blob will use the same amount of space as the compressed one as both will require the same allocation size.
Did something change here? Are compressed blobs now co-located in allocations?
Thanks for your help,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14