Hello List,
first of all: Yes - i made mistakes. Now i am trying to recover :-/
I had a healthy 3 node cluster which i wanted to convert to a single one.
My goal was to reinstall a fresh 3 Node cluster and start with 2 nodes.
I was able to healthy turn it from a 3 Node Cluster to a 2 Node cluster.
Then the problems began.
I started to change size=1 and min_size=1. (i know, i know, i will
never ever to that again!)
Health was okay until here. Then over sudden both nodes got
fenced...one node refused to boot, mons where missing, etc...to make
long story short, here is where i am right now:
root@node03:~ # ceph -s
cluster b3be313f-d0ef-42d5-80c8-6b41380a47e3
health HEALTH_WARN
53 pgs stale
53 pgs stuck stale
monmap e4: 2 mons at {0=10.15.15.3:6789/0,1=10.15.15.2:6789/0}
election epoch 298, quorum 0,1 1,0
osdmap e6097: 14 osds: 9 up, 9 in
pgmap v93644673: 512 pgs, 1 pools, 1193 GB data, 304 kobjects
1088 GB used, 32277 GB / 33366 GB avail
459 active+clean
53 stale+active+clean
root@node03:~ # ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 32.56990 root default
-2 25.35992 host node03
0 3.57999 osd.0 up 1.00000 1.00000
5 3.62999 osd.5 up 1.00000 1.00000
6 3.62999 osd.6 up 1.00000 1.00000
7 3.62999 osd.7 up 1.00000 1.00000
8 3.62999 osd.8 up 1.00000 1.00000
19 3.62999 osd.19 up 1.00000 1.00000
20 3.62999 osd.20 up 1.00000 1.00000
-3 7.20998 host node02
3 3.62999 osd.3 up 1.00000 1.00000
4 3.57999 osd.4 up 1.00000 1.00000
1 0 osd.1 down 0 1.00000
9 0 osd.9 down 0 1.00000
10 0 osd.10 down 0 1.00000
17 0 osd.17 down 0 1.00000
18 0 osd.18 down 0 1.00000
my main mistakes seemd to be:
--------------------------------
ceph osd out osd.1
ceph auth del osd.1
systemctl stop ceph-osd@1
ceph osd rm 1
umount /var/lib/ceph/osd/ceph-1
ceph osd crush remove osd.1
As far as i can tell, ceph waits and needs data from that OSD.1 (which
i removed)
root@node03:~ # ceph health detail
HEALTH_WARN 53 pgs stale; 53 pgs stuck stale
pg 0.1a6 is stuck stale for 5086.552795, current state
stale+active+clean, last acting [1]
pg 0.142 is stuck stale for 5086.552784, current state
stale+active+clean, last acting [1]
pg 0.1e is stuck stale for 5086.552820, current state
stale+active+clean, last acting [1]
pg 0.e0 is stuck stale for 5086.552855, current state
stale+active+clean, last acting [1]
pg 0.1d is stuck stale for 5086.552822, current state
stale+active+clean, last acting [1]
pg 0.13c is stuck stale for 5086.552791, current state
stale+active+clean, last acting [1]
[...] SNIP [...]
pg 0.e9 is stuck stale for 5086.552955, current state
stale+active+clean, last acting [1]
pg 0.87 is stuck stale for 5086.552939, current state
stale+active+clean, last acting [1]
When i try to start ODS.1 manually, i get:
--------------------------------------------
2020-02-10 18:48:26.107444 7f9ce31dd880 0 ceph version 0.94.10
(b1e0532418e4631af01acbc0cedd426f1905f4af), process ceph-osd, pid
10210
2020-02-10 18:48:26.134417 7f9ce31dd880 0
filestore(/var/lib/ceph/osd/ceph-1) backend xfs (magic 0x58465342)
2020-02-10 18:48:26.184202 7f9ce31dd880 0
genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
FIEMAP ioctl is supported and appears to work
2020-02-10 18:48:26.184209 7f9ce31dd880 0
genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
FIEMAP ioctl is disabled via 'filestore fiemap' config option
2020-02-10 18:48:26.184526 7f9ce31dd880 0
genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2020-02-10 18:48:26.184585 7f9ce31dd880 0
xfsfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_feature: extsize
is disabled by conf
2020-02-10 18:48:26.309755 7f9ce31dd880 0
filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal
mode: checkpoint is not enabled
2020-02-10 18:48:26.633926 7f9ce31dd880 1 journal _open
/var/lib/ceph/osd/ceph-1/journal fd 20: 5367660544 bytes, block size
4096 bytes, directio = 1, aio = 1
2020-02-10 18:48:26.642185 7f9ce31dd880 1 journal _open
/var/lib/ceph/osd/ceph-1/journal fd 20: 5367660544 bytes, block size
4096 bytes, directio = 1, aio = 1
2020-02-10 18:48:26.664273 7f9ce31dd880 0 <cls>
cls/hello/cls_hello.cc:271: loading cls_hello
2020-02-10 18:48:26.732154 7f9ce31dd880 0 osd.1 6002 crush map has
features 1107558400, adjusting msgr requires for clients
2020-02-10 18:48:26.732163 7f9ce31dd880 0 osd.1 6002 crush map has
features 1107558400 was 8705, adjusting msgr requires for mons
2020-02-10 18:48:26.732167 7f9ce31dd880 0 osd.1 6002 crush map has
features 1107558400, adjusting msgr requires for osds
2020-02-10 18:48:26.732179 7f9ce31dd880 0 osd.1 6002 load_pgs
2020-02-10 18:48:31.939810 7f9ce31dd880 0 osd.1 6002 load_pgs opened 53 pgs
2020-02-10 18:48:31.940546 7f9ce31dd880 -1 osd.1 6002 log_to_monitors
{default=true}
2020-02-10 18:48:31.942471 7f9ce31dd880 1 journal close
/var/lib/ceph/osd/ceph-1/journal
2020-02-10 18:48:31.969205 7f9ce31dd880 -1 ESC[0;31m ** ERROR: osd
init failed: (1) Operation not permittedESC[0m
Its mounted:
/dev/sdg1 3.7T 127G 3.6T 4% /var/lib/ceph/osd/ceph-1
Is there any way i can get the OSD.1 back in?
Thanks a lot,
mario
We have been using ceph-deploy in our existing cluster running as a non root user with sudo permissions. I've been working on getting an octopus cluster working using cephadm. During bootstrap I ran into a "execnet.gateway_bootstrap.HostNotFound" issue. It turns out that the problem was caused by an sshd setting we use: "PermitRootLogin no". Since we do not allow root ssh login directly, is there a way to make cephadm use ssh as a nonroot user with sudo permissions like we did with ceph-deploy?
Hello ,
Cephfs operations are slow in our cluster , I see low number of operations or throughput in the pools and all other resources as well. I think it is MDS operations that are causing the issue. I increased mds_cache_memory_limit to 3 GB from 1 GB but not seeing any improvements in the user access times.
How do I monitor the MDS operations like metadata operations latencies including inode access times update time and directory operations latencies ?
we am using 14.2.3 ceph version.
I have increased mds_cache_memory_limit but not sure how to check what is been used and how effectively we are using it.
# ceph config get mds.0 mds_cache_memory_limit
3221225472
I also see this , we are maninging PG using autoscale , however I see BIAS as 4.0 where as all pools have 1.0 not sure what is this number exactly and how it effect cluster .
# ceph osd pool autoscale-status | egrep "cephfs|POOL"
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
cephfs01-metadata 1775M 3.0 167.6T 0.0000 4.0 8 on
cephfs01-data0 739.5G 3.0 167.6T 0.0129 1.0 32 on
There is one large OMAP.
[root@knode25 /]# ceph health detail
HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
1 large objects found in pool 'cephfs01-metadata'
Search the cluster log for 'Large omap object found' for more details.
I recently had similar one and I was able to remove that by running deep scrub , not sure why they are keep forming and how to solve this for good ?
Thanks,
Uday.
Hi!
I've been running CephFS for a while now and ever since setting it up, I've seen unexpectedly large write i/o on the CephFS metadata pool.
The filesystem is otherwise stable and I'm seeing no usage issues.
I'm in a read-intensive environment, from the clients' perspective and throughput for the metadata pool is consistently larger than that of the data pool.
For example:
# ceph osd pool stats
pool cephfs_data id 1
client io 7.6 MiB/s rd, 19 KiB/s wr, 404 op/s rd, 1 op/s wr
pool cephfs_metadata id 2
client io 338 KiB/s rd, 43 MiB/s wr, 84 op/s rd, 26 op/s wr
I realise, of course, that this is a momentary display of statistics, but I see this unbalanced r/w activity consistently when monitoring it live.
I would like some insight into what may be causing this large imbalance in r/w, especially since I'm in a read-intensive (web hosting) environment.
Some of it may be expected in when considering details of my environment and CephFS implementation specifics, so please ask away if more details are needed.
With my experience using NFS, I would start by looking at client io stats, like `nfsstat` and tuning e.g. mount options, but I haven't been able to find such statistics for CephFS clients.
Is there anything of the sort for CephFS? Are similar stats obtainable in some other way?
This might be a somewhat broad question and shallow description, so yeah, let me know if there's anything you would like more details on.
Thanks a lot,
Samy
want to auto mount the ceph blockdriver at the boot time .becauseI use RBD-mirror,I just only can use nbd type to mount the blockdriver.I try to use /etc/ceph/rbdmap and /etc/fstab with _netdev.But I find it cann't mount at boot time.It just map for ndb type as /dev/nbd0.I need to mount manually use command "mount /dev/nbd0 /mnt".Has anyone solved this problem?
Presently I have about 1.2B objects (400M w/3 Replicas) and I'm finding the PG scrubbing and deep scrubbing are not completing. There is only 1 client accessing the data, a Samba server. I found large disparities in PG Distribution and Drive utilization. I enabled pg_autoscaler and found that it was reducing the number of PGs per OSD from 116 to 104.5 at this time, but it wasn't helping with space consumption equalization. I found loadbalancer and enabled that and it is in the process of evening out. As we were also having mds crashes even after increasing mds memory, i tried enabling multimds with rank:2
I currently have only 1 spare and would like to potentially enable the mds component on the fourth node (no mon present) but am having some difficulty.
Is mon a requirement?
I tried ceph-deploy mds create node4 but am having errors. I tried manually creating the /var/lib/ceph/mds/node4 directory and the command to create the keyring but still no joy.
What am i missing?
Thanks,
I made a bug report here: https://tracker.ceph.com/issues/44023
I updated from 14.2.6 yesterday and after the update my MDS daemons would
not start. I looked at the logs and seemed to initially have an auth error.
Setting the keyring location manually in ceph.conf fixed that, but I now
get an error where my MDS daemons try to reconnect replay and rejoin, but
then crash.
Any suggestions on what I can do to trouble shoot? The big tracker post has
some logs attached. Rolling back did not fix things.
Thank you.
-Michael
Hi.
before I descend into what happened and why it happened: I'm talking about a
test-cluster so I don't really care about the data in this case.
We've recently started upgrading from luminous to nautilus, and for us that
means we're retiring ceph-disk in favour of ceph-volume with lvm and
dmcrypt.
Our setup is in containers and we've got DBs separated from Data.
When testing our upgrade-path we discovered that running the host on
ubuntu-xenial and the containers on centos-7.7 leads to lvm inside the
containers not using lvmetad because it's too old. That in turn means that
not running `vgscan --cache` on the host before adding a LV to a VG
essentially zeros the metadata for all LVs in that VG.
That happened on two out of three hosts for a bunch of OSDs and those OSDs
are gone. I have no way of getting them back, they've been overwritten
multiple times trying to figure out what went wrong.
So now I have a cluster that's got 16 pgs in 'incomplete', 14 of them with 0
objects, 2 with about 150 objects each.
I have found a couple of howtos that tell me to use ceph-objectstore-tool to
find the pgs on the active osds and I've given that a try, but
ceph-objectstore-tool always tells me it can't find the pg I am looking for.
Can I tell ceph to re-init the pgs? Do I have to delete the pools and
recreate them?
There's no data I can't get back in there, I just don't feel like
scrapping and redeploying the whole cluster.
--
Cheers,
Hardy