On Fri, Oct 11, 2019 at 5:57 PM Rui Chang (Arm Technology China)
> Hi, Kefu
> I am new to ceph. Currently I changed some code, and take a build. After that I can “make install” the image on my build machine which I make it as one node.
> But how can I install both dependent package and my ceph package on other nodes? How could I do a packaging easily and install on other nodes?
i encourage you to re-read
https://github.com/ceph/ceph/blob/master/README.md . i don't think
this problem is specific to Ceph. what do you do when you want to "do
a packaging easily and install on other nodes" when it comes to other
software? i'd just package it for whatever distro(s) the other nodes
> Will dpkg-buildpackage build my changes and dependent packages? Or should I use make-debs.sh?
dpkg-buildpackage builds the tree with debian/ directory. if the tree
contains your change, i think dpkg-buildpackage will build it. but i
don't think dpkg-buildpackage builds the dependent packages. the
package management system is supposed to fulfill the runtime
dependencies. make-debs.sh is a very thin wrapper around
actually, i'd recommend you read the document and then trial and error
a little bit. all of these are well documented, IMHO.
> I am running on Ubuntu. Thanks a lot in advance.
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Below url says: "Switching from a standalone deployment to a multi-site
replicated deployment is not supported.
On Thu, Oct 3, 2019 at 3:28 PM M Ranga Swami Reddy <swamireddy(a)gmail.com>
> Iam using the 2 ceph clusters in diff DCs (away by 500 KM) with ceph
> 12.2.11 version.
> Now, I want to setup rgw multisite using the above 2 ceph clusters.
> is it possible? if yes, please share good document to do the same.
I'm trying to read the source code of BlueStore. My question is why it
is sufficient to only flush the log in BlueRocksDirectory::Fsync?
Shouldn't it flush the file data first? Is it because rocksdb always
flush file data before doing fsync? Thanks:-)
On Wed, 9 Oct 2019, Aaron Johnson wrote:
> Hi all
> I have a smallish test cluster (14 servers, 84 OSDs) running 14.2.4.
> Monthly OS patching and reboots that go along with it have resulted in
> the cluster getting very unwell.
> Many of the servers in the cluster are OOM-killing the ceph-osd
> processes when they try to start. (6 OSDs per server running on
> filestore.). Strace shows the ceph-osd processes are spending hours
> reading through the 220k osdmap files after being started.
Is the process size growing during this time? There should be a cap to
the size of the OSDMap cache; perhaps there is a regression there.
One common thing to do here is 'ceph osd set noup' and restart the OSD,
and then monitor the OSD's progress catching up on maps with 'ceph daemon
osd.NN status' (compare the epoch to what you get from 'ceph osd dump |
head'). This will take a while if you are really 220k maps (!!!) behind,
but the memory usage during that period should be relatively constant.
> This behavior started after we recently made it about 72% full to see
> how things behaved. We also upgraded it to Nautilus 14.2.2 at about the
> same time.
> I’ve tried starting just one OSD per server at a time in hopes of
> avoiding the OOM killer. Also tried setting noin, rebooting the whole
> cluster, waiting a day, then marking each of the OSDs in manually. The
> end result is the same either way. About 60% of PGs are still down, 30%
> are peering, and the rest are in worse shape.
Usually in instances like this in the past, getting all OSDs to catch up
on maps and then unsetting 'noup' will let them all come up and peer at
the same time. But usually what has happened is many of the OSDs are not
caught up and it's not immediately obvious, so PGs don't peer. So setting
noup and waiting for all osds to be caught up (as per 'ceph daemon osd.NNN
status') first generally helps.
But none of that explains why you're seeing OOM, so I'm curious what you
see with memory usage while OSDs are catching up...
> Anyone out there have suggestions about how I should go about getting
> this cluster healthy again? Any ideas appreciated.
> - Aaron