Hello experts,
I have accidentally created a situation where the only monitor in a cluster has been moved to a new node without it’s /var/lib/ceph contents. Not realizing what I had done, I decommissioned the original node, but still have the contents of it’s /var/lib/ceph.
Can I shut down the monitor running on the new node, copy monitor data from the original node to the new node and restart the monitor? Or is there information in the monitor database that is tied to the original node? If that’s the case, I suspect I need to somehow recommission the original node.
Thanks for any feedback on this situation!
Brian
Hello!
Today, I started the morning with a WARNING STATUS on our Ceph cluster.
# ceph health detail
HEALTH_WARN Too many repaired reads on 1 OSDs
[WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 1 OSDs
osd.67 had 399911 reads repaired
I made "ceph osd out 67" and PGs where migrated to another OSDs.
I stopped the osd.67 daemon, inspected the logs, etc...
Then I started the daemon and made "# ceph osd in 67".
OSD started backfilling with some PGs and no other error appeared in the
rest of the day, but Warning status still remains.
Can I clear it? Shoud I remove the osd and start with a new one?
Thanks in advance for your time!
Javier.-
On Fri, Oct 9, 2020 at 3:12 PM Marc Roos <M.Roos(a)f1-outsourcing.eu> wrote:
>
>
>
>
>
> >1. The pg log contains 3000 entries by default (on nautilus). These
> >3000 entries can legitimately consume gigabytes of ram for some
> >use-cases. (I haven't determined exactly which ops triggered this
> >today).
>
> How can I check how much ram my pg_logs are using?
ceph daemon osd.x dump_mempools | jq .mempool.by_pool.osd_pglog
>
>
>
> -----Original Message-----
> Cc: ceph-users
> Subject: [ceph-users] Re: another osd_pglog memory usage incident
>
> On 09.10.20 13:55, Dan van der Ster wrote:
> [...]
> > I also noticed a possible relationship with scrubbing -- One week ago
> > we increased to osd_max_scrubs=5 to clear out a scrubbing backlog; I
> > wonder if the increased read/write ratio somehow led to an exploding
> > buffer_anon. Do things stabilize on your side if you temporarily
> > disable scrubbing?
>
> During the worst periods, we had disabled scrubbing. When we re-enabled,
> we had our write-job to mitigate the problems. And currently, scrub load
> is low. So I cannot tell, but it is very plausible.
>
> Cheers
> Harry
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
> email to ceph-users-leave(a)ceph.io
>
>
Hello everybody,
We have two Ceph object clusters replicating over a very long-distance WAN link. Our version of Ceph is 14.2.10.
Currently, replication speed seems to be capped around 70 MiB/s even if there's a 10Gb WAN link between the two clusters.
The clusters themselves don't seem to suffer from any performance issue.
The replication traffic leverages HAProxy VIPs, which means there's a single endpoint (the HAProxy VIP) in the multisite replication configuration.
So, my questions are:
- Is it possible to improve replication speed by adding more endpoints in the multisite replication configuration? The issue we are facing is that the secondary cluster is way behind the master cluster because of the relatively slow speed.
- Is there anything else I can do to optimize replication speed ?
Thanks for your comments !
Nicolas
Hi all,
This morning some osds in our S3 cluster started going OOM, after
restarting them I noticed that the osd_pglog is using >1.5GB per osd.
(This is on an osd with osd_memory_target = 2GB, hosting 112PGs, all
PGs are active+clean).
After reading through this list and trying a few things, I'd like to
share the following observations for your feedback:
1. The pg log contains 3000 entries by default (on nautilus). These
3000 entries can legitimately consume gigabytes of ram for some
use-cases. (I haven't determined exactly which ops triggered this
today).
2. The pg log length is decided by the primary osd -- setting
osd_max_pg_log_entries/osd_min_pg_log_entries on one single OSD does
not have a big effect (because most of the PGs are primaried somewhere
else). You need to set it on all the osds for it to be applied to all
PGs.
3. We eventually set osd_max_pg_log_entries = 500 everywhere. This
decreased the osd_pglog mempool from more than 1.5GB on our largest
osds to less that 500MB.
4. The osd_pglog mempool is not accounted for in the osd_memory_target
(in nautilus).
5. I have opened a feature request to limit the pg_log length by
memory size (https://tracker.ceph.com/issues/47775). This way we could
allocate a fraction of memory to the pg log and it would shorten the
pglog length (budget) accordingly.
6. Would it be feasible to add an osd option to 'trim pg log at boot'
? This way we could avoid the cumbersome ceph-objectstore-tool
trim-pg-log in cases of disaster (osds going oom at boot).
For those that had pglog memory usage incidents -- does this match
your experience?
Thanks!
Dan
Hi,
Most of it is described here: https://tracker.ceph.com/issues/22928
Buckets created under Jewel don't always have the *placement_rule* set
in their bucket metadata and this causes Nautilus RGWs to not serve
requests for them.
Snippet from the metadata:
{
"key": "bucket.instance:pbx:ams02.446941181.1",
"ver": {
"tag": "86lc3iVtQpPiJYkh95YCTnhu",
"ver": 2
},
"mtime": "2020-10-09 09:12:04.744423Z",
"data": {
"bucket_info": {
"bucket": {
"name": "pbx",
"marker": "ams02.241978.4",
"bucket_id": "ams02.446941181.1",
"tenant": "",
"explicit_placement": {
"data_pool": ".rgw.buckets",
"data_extra_pool": "",
"index_pool": ".rgw.buckets"
}
},
"creation_time": "2014-02-16 12:32:15.000000Z",
"owner": "vdvm",
"flags": 0,
"zonegroup": "eu",
"placement_rule": "",
Notice that *placement_rule* is empty and that this bucket has
*explicit_placement* set.
There is no way to update the bucket.instance metadata as far as I know,
otherwise I could have set a placement rule for the bucket.
Earlier on the ML this has been discussed:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/ULKK5RU2VXL…
People there compiled a manual version of RGW, something I'd rather stay
away from.
Has anybody seen this and if so: Have you found a solution?
The commit that breaks these buckets is this one:
https://github.com/ceph/ceph/commit/2a8e8a98d8c56cc374ec671846a20e2b0484bc75
14.2.0 was the first release with that code in there.
So two things I'm thinking about and I don't know which one is best:
- Update RGW and modify the if-statement added by commit 2a8e8a
- Enhance 'bucket check --fix' to update the placement_rule if none is
set for a bucket
Any hints or suggestions?
Wido
Hello,
I have a ceph cluster running 14.2.11. I am running benchmark tests with
FIO concurrently on ~2000 volumes of 10G each. During the time initial
warm-up FIO creates a 10G file on each volume before it runs the actual
read/write I/O operations. During this time, I start seeing the Ceph
cluster reporting about 35GiB/s write throughput for a while, but after
some time I start seeing "long heartbeat" and "slow ops" warnings and in a
few mins the throughput drops to ~1GB/s and stays there until all FIO runs
complete.
The cluster has 5 monitor nodes and 10 data nodes - each with 10x3.2TB NVME
drives. I have setup 3 OSD for each NVME, so there are a total of 300 OSDs.
Each server has 200GB uplink and there's no apparent network bottleneck as
the network is set up to support over 1Tbps bandwidth. I dont see any CPU
or memory issues also on the servers.
There is a single manager instance running on one of the mons.
The pool is configured for 3 replication factor with min_size of 2. I tried
to use pg_num of 8192 and 16384 and saw the issue with both settings.
Could you please suggest if this is a known issue or if I can tune any
parameters?
Long heartbeat ping times on back interface seen, longest is
1202.120 msec
Long heartbeat ping times on front interface seen, longest is
1535.191 msec
35 slow ops, oldest one blocked for 122 sec, daemons
[osd.135,osd.14,osd.141,osd.143,osd.149,osd.15,osd.151,osd.153,osd.157,osd.162]...
have slow ops.
Regards,
Shridhar
We had built some rpms locally for ceph-fuse, but AFAIR luminous needs
systemd so the server rpms would be difficult.
-- dan
On Thu, Oct 8, 2020 at 11:12 AM Marc Roos <M.Roos(a)f1-outsourcing.eu> wrote:
>
>
> Nobody ever used luminous on el6?
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
Wondering if anyone knows or has put together a way to wipe an Octopus install? I’ve looked for documentation on the process, but if it exists, I haven’t found it yet. I’m going through some test installs - working through the ins and outs of cephadm and containers and would love an easy way to tear things down and start over.
In previous releases managed through ceph-deploy there were three very convenient commands that nuked the world. I am looking for something as complete for Octopus.
Thanks,
Sam Liston (sam.liston(a)utah.edu)
==========================================
Center for High Performance Computing - Univ. of Utah
155 S. 1452 E. Rm 405
Salt Lake City, Utah 84112 (801)232-6932
==========================================