Hi,
I am trying to copy the contents of our storage server into a CephFS,
but am experiencing stability issues with my MDSs. The CephFS sits on
top of an erasure-coded pool with 5 MONs, 5 MDSs and a max_mds setting
of two. My Ceph cluster version is Nautilus, the client is Mimic and
uses the kernel module to mount the FS.
The index of filenames to copy is about 23GB and I am using 16 parallel
rsync processes over a 10G link to copy the files over to Ceph. This
works perfectly for a while, but then the MDSs start reporting oversized
caches (between 20 and 50GB, sometimes more) and an inode count between
1 and 4 million. Particularly the Inode count seems quite high to me.
Each rsync job has 25k files to work with, so if all 16 processes open
all their files at the same time, I should not exceed 400k. Even if I
double this number to account for the client's page cache, I should get
nowhere near that number of inodes (a sync flush takes about 1 second).
Then after a few hours, my MDSs start failing with messages like this:
-21> 2019-07-22 14:00:05.877 7f67eacec700 1 heartbeat_map
is_healthy 'MDSRank' had timed out after 15
-20> 2019-07-22 14:00:05.877 7f67eacec700 0 mds.beacon.XXX Skipping
beacon heartbeat to monitors (last acked 24.0042s ago); MDS internal
heartbeat is not healthy!
The standby nodes try to take over, but take forever to become active
and will fail as well eventually.
During my research, I found this related topic:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/015959.html,
but I tried everything in there from increasing to lowering my cache
size, the number of segments etc. I also played around with the number
of active MDSs and two appears to work the best, whereas one cannot keep
up with the load and three seems to be the worst of all choices.
Do you have any ideas how I can improve the stability of my MDS damons
to handle the load properly? single 10G link is a toy and we could query
the cluster with a lot more requests per second, but it's already
yielding to 16 rsync processes.
Thanks
Hi there,
Sorry for asking a question, which may be of very basic nature and asked
many times before. But much Google search can not satisfy me.
The question is about RBD Cache in write-back mode using KVM/libvirt. If we
enable this, it uses local KVM Host's RAM as cache for VM's write requests.
And KVM Host immediately responds to VM's OS that data has been written to
Disk (Actually it is still not on OSD's yet). Then how can be it power
failure safe?
Is my understanding correct? If not, pls correct. This is very important
for me. Thank you very much in advance.
Best regards.
Muhammad Junaid
Hi guys:
I had setup ceph cluster and mount rbd on one machine. I delete ceph cluster and reinstall follow the manual.
but I still have rbd device mount on my machine. I can not access mount point.
This is my detail info, I want to delete all old rbd device, what should I do?
node1 $> rbd device list
id pool namespace image snap device
0 rbd foo - /dev/rbd0
1 kube kubernetes-dynamic-pvc-1cc43c5b-ade1-11e9-9a92-863e3c12afd1 - /dev/rbd1
node1 $> df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/rootvg-lv_root 10995712 2528388 8467324 23% /
devtmpfs 2012656 0 2012656 0% /dev
tmpfs 2023588 0 2023588 0% /dev/shm
tmpfs 2023588 207340 1816248 11% /run
tmpfs 2023588 0 2023588 0% /sys/fs/cgroup
/dev/sda1 520868 116936 403932 23% /boot
/dev/mapper/rootvg-lv_var 5232640 3226816 2005824 62% /var
/dev/mapper/rootvg-lv_tmp 5232640 33060 5199580 1% /tmp
/dev/rbd0 3997376 16392 3754888 1% /mnt
tmpfs 404720 0 404720 0% /run/user/1001
node1 $> rbd trush list
rbd: error opening default pool 'rbd'
Ensure that the default pool has been created or specify an alternate pool name.
node1 $> rbd info rbd/foo
rbd: error opening default pool 'rbd'
Ensure that the default pool has been created or specify an alternate pool name.
[https://ipmcdn.avast.com/images/icons/icon-envelope-tick-round-orange-anima…]<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campai…> 无病毒。www.avast.com<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campai…>
Hi there,
Sorry for asking a question, which may be of very basic nature and asked
many times before. But much Google search can not satisfy me.
The question is about RBD Cache in write-back mode using KVM/libvirt. If we
enable this, it uses local KVM Host's RAM as cache for VM's write requests.
And KVM Host immediately responds to VM's OS that data has been written to
Disk (Actually it is still not on OSD's yet). Then how can be it power
failure safe?
Is my understanding correct? If not, pls correct. This is very important
for me. Thank you very much in advance.
Best regards.
Muhammad Junaid
Hi All,
Just a reminder, there's only a few days left to submit talks for this
most excellent conference; the CFP is open until Sunday 28 July Anywhere
on Earth.
(I've submitted a Data Storage miniconf day, fingers crossed...)
Regards,
Tim
On 6/26/19 2:09 PM, Tim Serong wrote:
> Here we go again! As usual the conference theme is intended to
> inspire, not to restrict; talks on any topic in the world of free and
> open source software, hardware, etc. are most welcome, and Ceph talks
> definitely fit.
>
> I've added this to https://pad.ceph.com/p/cfp-coordination as well.
>
> -------- Forwarded Message --------
> Subject: [lca-announce] linux.conf.au 2020 - Call for Sessions and
> Miniconfs now open!
> Date: Tue, 25 Jun 2019 21:19:43 +1000
> From: linux.conf.au Announcements <lca-announce(a)lists.linux.org.au>
> Reply-To: lca-announce(a)lists.linux.org.au
> To: lca-announce(a)lists.linux.org.au
>
>
> The linux.conf.au 2020 organising team is excited to announce that the
> linux.conf.au 2020 Call for Sessions and Call for Miniconfs are now open!
> These will stay open from now until Sunday 28 July Anywhere on Earth
> (AoE) (https://en.wikipedia.org/wiki/Anywhere_on_Earth).
>
> Our theme for linux.conf.au 2020 is "Who's Watching", focusing on
> security, privacy and ethics.
> As big data and IoT-connected devices become more pervasive, it's no
> surprise that we're more concerned about privacy and security than ever
> before.
> We've set our sights on how open source could play a role in maximising
> security and protecting our privacy in times of uncertainty.
> With the concept of privacy continuing to blur, open source could be the
> solution to give us '2020 vision'.
>
> Call for Sessions
>
> Would you like to talk in the main conference of linux.conf.au 2020?
> The main conference runs from Wednesday to Friday, with multiple streams
> catering for a wide range of interest areas.
> We welcome you to submit a session
> (https://linux.conf.au/programme/sessions/) proposal for either a talk
> or tutorial now.
>
> Call for Miniconfs
>
> Miniconfs are dedicated day-long streams focusing on single topics,
> creating a more immersive experience for delegates than a session.
> Miniconfs are run on the first two days of the conference before the
> main conference commences on Wednesday.
> If you would like to organise a miniconf
> (https://linux.conf.au/programme/miniconfs/) at linux.conf.au, we want
> to hear from you.
>
> Have we got you interested?
>
> You can find out how to submit your session or miniconf proposals at
> https://linux.conf.au/programme/proposals/.
> If you have any other questions you can contact us via email at
> contact(a)lca2020.linux.org.au.
>
> We are looking forward to reading your submissions.
>
> linux.conf.au 2020 Organising Team
>
>
> ---
> Read this online at
> https://lca2020.linux.org.au/news/call-for-sessions-miniconfs-now-open/
> _______________________________________________
> lca-announce mailing list
> lca-announce(a)lists.linux.org.au
> http://lists.linux.org.au/mailman/listinfo/lca-announce
>
>
Hello everybody!
I test the performance of ceph cluster using different osd_op_num_shards and different osd_op_num_threads_per_shard configuration. And I found that using multi-thread(with a single shard) can get the same improvement of performance with multi-shard. However, the document of ceph osd config(http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/… said that using lower shard number may have deleterious effects. I want to know what the deleterious effects are.
Thanks for the assistance!
Fibird
chaoyanglius(a)gmail.com
Hi,
I have an existing Luminous installation, 3 nodes with 8x4TB HDD and a
1x200GB SSD which was used as journal before. On a default Luminous
installation via ceph-deploy, I forgot to prepare the OSD with the WAL and
DB on separate SSD. The environment is running and in production and I
want to configure it to use the SSD as the WAL device or maybe for DB also,
since the environment is in production I am hesitant to do it because it
may cause some problems along the way. What should I do for me to
reconfigure it without downtime, or if there are downtime at least minimal?
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 101.86957 root default
-3 29.10559 host ceph01
0 hdd 3.63820 osd.0 up 1.00000 1.00000
1 hdd 3.63820 osd.1 up 1.00000 1.00000
2 hdd 3.63820 osd.2 up 1.00000 1.00000
3 hdd 3.63820 osd.3 up 1.00000 1.00000
4 hdd 3.63820 osd.4 up 1.00000 1.00000
5 hdd 3.63820 osd.5 up 1.00000 1.00000
6 hdd 3.63820 osd.6 up 1.00000 1.00000
7 hdd 3.63820 osd.7 up 1.00000 1.00000
-5 29.10559 host ceph02
8 hdd 3.63820 osd.8 up 1.00000 1.00000
9 hdd 3.63820 osd.9 up 1.00000 1.00000
10 hdd 3.63820 osd.10 up 1.00000 1.00000
11 hdd 3.63820 osd.11 up 1.00000 1.00000
12 hdd 3.63820 osd.12 up 1.00000 1.00000
13 hdd 3.63820 osd.13 up 1.00000 1.00000
14 hdd 3.63820 osd.14 up 1.00000 1.00000
15 hdd 3.63820 osd.15 up 1.00000 1.00000
-7 29.10559 host ceph03
16 hdd 3.63820 osd.16 up 1.00000 1.00000
17 hdd 3.63820 osd.17 up 1.00000 1.00000
18 hdd 3.63820 osd.18 up 1.00000 1.00000
19 hdd 3.63820 osd.19 up 1.00000 1.00000
20 hdd 3.63820 osd.20 up 1.00000 1.00000
21 hdd 3.63820 osd.21 up 1.00000 1.00000
22 hdd 3.63820 osd.22 up 1.00000 1.00000
23 hdd 3.63820 osd.23 up 1.00000 1.00000
- Vlad
ᐧ
Hi!
I am trying to setup ceph cluster from existing ubuntu boxes (I'll purchase
it from VSP).
How to create a cluster from it using existing folder like "/cephfs" in my
Ubuntu box
Thanks,
Konstantin
Hi team,
I am pretty new into distributed storage world and learning ceph day by day. I run a openstack hypervisor with ceph as backend block device..
There is a need to backup the instances running inside openstack. I have explored an option of creating COW incremental snapshots on ceph for rbd objects and revert to it whenever something broke..
The snaps that are created in the ceph gets stored in ceph storage itself which makes it as a single point of failure.
Now i need to know if there is any possibility to explicitly specify a location for the snapshot as to where it should be saved. Consider storing the snapshot alone on a separate server which has its own internal disks to eliminate single point of failure and reverting from it on the ceph storage cluster.
Something like below to store only the layered image snap outside of ceph to a server to less consume the storage..??
$ rbd snap add (pool name)/(image name)@(snap name) user@10.x.x.x:/data
I know that we can use the below command to store the current rbd image state and export it completely as independent image not as a layered snapshot which will take up exact space of the base rbd object.
$ rbd -p (pool name) export (rbd image name)@(snap name) (local file name)
Once its done and the image is stored outside can a snap revert be triggered from this whole image on ceph?
While i agree that ceph as a distributed storage device is not designed basically to do a backup solutioning, i just need to know the possibilities of rbd image layering..
Any comments on my above questions is greatly appretiated..
Regards,
Prasanna,
Ceph-user.