Max 2x listed
["17.36",{"oid":"rbd_data.1f114174b0dc51.0000000000000974","key":"","sna
pid":-2,"hash":1357874486,"max":0,"pool":17,"namespace":"","max":0}]
Hi,
we have cluster which running Ceph Luminous 12.2.12. Rados Gateway Only
(S3).
Pool with data is placed on SAS HDDs (1430 pcs) and the rest of pools is
placed on the SSD (72 pcs) disks - 72 hosts with OSD role (3 rows, 2 racks
per row, and 12 hosts per rack). BlueStore of course.
The question is: how many PG we need for default.rgw.metadata? Any ideas?
Example statistics from this cluster:
pool default.rgw.buckets.data id 15
189946/9416348469 objects misplaced (0.002%)
recovery io 13.5MiB/s, 83objects/s
client io 531MiB/s rd, 51.4MiB/s wr, 13.24kop/s rd, 4.84kop/s wr
pool .rgw.root id 16
nothing is going on
pool default.rgw.control id 17
nothing is going on
pool default.rgw.meta id 18
client io 47.0MiB/s rd, 0B/s wr, 57.95kop/s rd, 450op/s wr
pool default.rgw.log id 19
nothing is going on
pool default.rgw.buckets.index id 20
client io 3.12MiB/s rd, 0B/s wr, 3.19kop/s rd, 1.92kop/s wr
Regards,
Jarek
Hi Steve,
I was just about to follow your steps[0] with the ceph-objectstore-tool,
(I do not want to remove more snapshots)
So I have this error
pg 17.36 is active+clean+inconsistent, acting [7,29,12]
2019-09-02 14:17:34.175139 7f9b3f061700 -1 log_channel(cluster) log
[ERR] : deep-scrub 17.36
17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:head : expected
clone 17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4 1 missing
I removed the snapshot with snapshot id 4, did a pg repair without any
result.
I am trying to understand this command of yours,
ceph-objectstore-tool --type bluestore --data-path
/var/lib/ceph/osd/ceph-229/ --pgid 2.9a6
'{"oid":"rb.0.2479b45.238e1f29","snapid":-2,"hash":2320771494,"max":0,"p
ool":2,"namespace":"","max":0}'
I think you are getting this info from the --op list not? And grep for
the "rbd_data.1f114174b0dc51.0000000000000974" occurance? I have these
entries on osd.29
["17.36",{"oid":"rbd_data.1f114174b0dc51.0000000000000974","key":"","sna
pid":63,"hash":1357874486,"max":0,"pool":17,"namespace":"","max":0}]
["17.36",{"oid":"rbd_data.1f114174b0dc51.0000000000000974","key":"","sna
pid":-2,"hash":1357874486,"max":0,"pool":17,"namespace":"","max":0}]
So I guess snapid's with -2 are bad? I have noticed actually quite a few
-2 listings in these op list output, and do not understand why there are
so many and the cluster is healthy except for this pg 17.36.
[0]
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg47212.html
-----Original Message-----
From: Steve Anthony [mailto:sma310@lehigh.edu]
Sent: vrijdag 16 november 2018 17:44
To: ceph-users(a)lists.ceph.com
Subject: Re: [ceph-users] pg 17.36 is active+clean+inconsistent head
expected clone 1 missing?
Looks similar to a problem I had after a several OSDs crashed while
trimming snapshots. In my case, the primary OSD thought the snapshot was
gone, but some of the replicas are still there, so scrubbing flags it.
First I purged all snapshots and then ran ceph pg repair on the
problematic placement groups. The first time I encountered this, that
action was sufficient to repair the problem. The second time however, I
ended up having to manually remove the snapshot objects.
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-June/027431.html
Once I had done that, repair the placement group fixed the issue.
-Steve
On 11/16/2018 04:00 AM, Marc Roos wrote:
>
>
> I am not sure that is going to work, because I have this error quite
> some time, from before I added the 4th node. And on the 3 node cluster
> it was:
>
> osdmap e18970 pg 17.36 (17.36) -> up [9,0,12] acting [9,0,12]
>
> If I understand correctly what you intent to do, moving the data
around.
> This was sort of accomplished by adding the 4th node.
>
>
>
> -----Original Message-----
> From: Frank Yu [mailto:flyxiaoyu@gmail.com]
> Sent: vrijdag 16 november 2018 3:51
> To: Marc Roos
> Cc: ceph-users
> Subject: Re: [ceph-users] pg 17.36 is active+clean+inconsistent head
> expected clone 1 missing?
>
> try to restart osd.29, then use pg repair. If this doesn't work or it
> appear again after a while, scan your HDD which used for osd.29, maybe
> there is bad sector of your disks, just replace the disk with new one.
>
>
>
> On Thu, Nov 15, 2018 at 5:00 PM Marc Roos <M.Roos(a)f1-outsourcing.eu>
> wrote:
>
>
>
> Forgot, these are bluestore osds
>
>
>
> -----Original Message-----
> From: Marc Roos
> Sent: donderdag 15 november 2018 9:59
> To: ceph-users
> Subject: [ceph-users] pg 17.36 is active+clean+inconsistent head
> expected clone 1 missing?
>
>
>
> I thought I will give it another try, asking again here since
there
> is
> another thread current. I am having this error since a year or
so.
>
> This I of course already tried:
> ceph pg deep-scrub 17.36
> ceph pg repair 17.36
>
>
> [@c01 ~]# rados list-inconsistent-obj 17.36
> {"epoch":24363,"inconsistents":[]}
>
>
> [@c01 ~]# ceph pg map 17.36
> osdmap e24380 pg 17.36 (17.36) -> up [29,12,6] acting [29,12,6]
>
>
> [@c04 ceph]# zgrep ERR ceph-osd.29.log*gz
> ceph-osd.29.log-20181114.gz:2018-11-13 14:19:55.766604
7f25a05b1700
> -1
> log_channel(cluster) log [ERR] : deep-scrub 17.36
> 17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:head
> expected
> clone 17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4 1
> missing
> ceph-osd.29.log-20181114.gz:2018-11-13 14:24:55.943454
7f25a05b1700
> -1
> log_channel(cluster) log [ERR] : 17.36 deep-scrub 1 errors
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users(a)lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users(a)lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
--
Steve Anthony
LTS HPC Senior Analyst
Lehigh University
sma310(a)lehigh.edu
_______________________________________________
ceph-users mailing list
ceph-users(a)lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Hi,
I have come across this issue on one of ceph cluster storage nodes. (in the
production OpenStack).
Now, I would like to redploy the CEPH and start OSD on this node again with
the same version.
I will try to upgrade to a newer version, after increasing the capacity of
cluster storage and bring the ceph health status to "OK" status.
Has any one used, kolla ansible to deploy or add new node the existing ceph
cluster storage. let me know the steps.
What I have tried:
1. Ran "ceph playbook from kolla-ansible $ kolla-ansible bootstrap-servers
-i <path/multinode> --limit nodeXX.maas
Output: success
2. Unable to go through the "prechecks" as docker deamon is not running.
3. Could not start the docker, Uninstalled docker and during installation
of docker using the docker-install/site.yml the docker could not start
docker neither install nor start the docker because it is looking for
dependencies of OSD device (disk)
4. Trying to uninstall the docker completely from physical node and
insgtall docker from scratch and reset up the CEPH on this node?
Has anyone tried resetup CEPH OSD on the existing ceph cluster storge.
Thanks,
Reddi Prasad YENDLURI
Cloud Specialist
M +65 8345 9599 | D +65 6220 9908
Office: *51B Circular Road Singapore 049406*
[image: PALO IT]
I have an EC RBD pool I want to add compression on.
I have the meta pool and the data pool, do I need to enable compression on both for it to function correctly or only on one pool?
Thanks
Hi,
my ceph cluster is recovering / rebalancing PGs since some OSDs are
marked out.
Question:
Which network is used for this task be default?
Is there any configuration for the network to be used in
//etc/ceph/ceph.conf/?
THX
Hey,
What is the current status of Kinetic KV support in Ceph?
I'm asking because:
https://www.crn.com.au/news/seagate-quietly-bins-open-storage-project-519345 ..
and the fact that kinetic-cpp-client hasn't been updated in four years and
only compiles against OpenSSL 1.0.2, which will become EOL by the end of
2019.
Or am I totally wrong? ^
Thank you for reply in advance,
/Johan
Hi folks,
Originally our osd tree looked like this:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT
PRI-AFF
-1 2073.15186 root default
-14 176.63100 rack s01-rack
-19 176.63100 host s01
<snip osds>
-15 171.29900 rack s02-rack
-20 171.29900 host s02
<snip osds>
etc. You get the idea. It was a legacy thing as we've been upgrading
this cluster
since probably firefly, and started with way less hardware.
The crush rule was set up like this originally:
step take default
step chooseleaf firstn 0 type rack
which we have modified to
step take default
step chooseleaf firstn 0 type host
taking advantage of chooseleaf's behavior (eg searching in depth instead
of just
a single level).
Now we thought we could get rid of the rack buckets simply by moving the
host buckets to the root using "ceph osd crush move s01 root=default",
however
this resulted in a bunch of data movement.
Swapping the IDs manually in the crushmap seems to work (verified via
crushtool's
--compare), eg. changing the ID of s01 to s01-rack's and vice versa,
including
all shadow trees.
Looking around I saw that there is a swap-bucket command but that does
not swap
the IDs just bucket contents, so would result in data movement.
Other than manually editing the crushmap is there a better way to
achieve this?
Is this way the most optimal?
Cheers,
Zoltan
HI,
I recently upgraded my cluster from 12.2 to 14.2 and I'm having some
trouble getting the mgr dashboards for grafana working.
I setup Prometheus and Grafana per
https://docs.ceph.com/docs/nautilus/mgr/prometheus/#mgr-prometheus
However, for the osd disk performance statistics graphs on the host details
dashboard I'm getting the following error:
"found duplicate series for the match group {device="dm-5",
instance=":9100"} on the right hand-side of the operation:
[{name="ceph_disk_occupation", ceph_daemon="osd.13", db_device="/dev/dm-8",
device="dm-5", instance=":9100", job="ceph"}, {name="ceph_disk_occupation",
ceph_daemon="osd.15", db_device="/dev/dm-10", device="dm-5",
instance=":9100", job="ceph"}];many-to-many matching not allowed: matching
labels must be unique on one side"
This also happens on the following graphs:
Host Overview/AVG Disk Utilization
Host Details/OSD Disk Performance Statistics/*
Also the following graphs show no data points:
OSD Details/Physical Device Performance/*
prometheus version: 2.12.0
node exporter: 0.15.2
grafana version: 6.3.3
note that my osds all have separate data and rocks db devices. I have also
upgraded all the osds to nautilus via ceph-bluestore-tool repair.
Any idea what's needed to fix this?
Thanks
below are the Prometheus config files
prometheus.yml
global:
scrape_interval: 5s
evaluation_interval: 5s
scrape_configs:
- job_name: 'node'
file_sd_configs:
- files:
- node_targets.yml
- job_name: 'ceph'
honor_labels: true
file_sd_configs:
- files:
- ceph_targets.yml
----
node_targets.yml:
[
{
"targets": [ "nas-osd-01:9100" ],
"labels": {
"instance": "nas-osd-01"
}
},
{
"targets": [ "nas-osd-02:9100" ],
"labels": {
"instance": "nas-osd-02"
}
},
{
"targets": [ "nas-osd-02:9100" ],
"labels": {
"instance": "nas-osd-03"
}
}
]
---
ceph_targets.yml:
[
{
"targets": [ "nas-osd-01:9283" ],
"labels": {}
},
{
"targets": [ "nas-osd-02:9283" ],
"labels": {}
},
{
"targets": [ "nas-osd-03:9283" ],
"labels": {}
}
]