Has anyone added the 'conf.d' modules (and on the centos/rhel/fedora
world done the selinux work) so that initramfs/dracut can 'direct kernel
boot' cephfs as a guest image root file system? It took some work for
the nfs folks to manage being the root filesystem.
Harry
Dear all ,
IS it possible to configure and run ISCSI when you are deploying ceph using
ansible running on ubuntu 18.04 OS? please help me to know and if possible
provide helpful links on that.
Best Regard
Michel
Hi,
during an OS upgrade from Ubuntu 18.04 to 20.04 we seem to have
triggered a bcache bug on three OSD hosts. These hosts are used with a
6+2 EC pool used with CephFS, so a number of PGs are affected by the
bug. We were able to restart two of the three hosts (and will run some
extra scrubs on all PGs), but at least 7 PGs have unfound objects now.
I'm currently trying to find out which files are affected to restore
them from backup or inform the users about data corruption for files in
dedicated scratch directories.
This works quite well for most of the files. The 'parent' xattr attached
to the file's first chunks contains the complete path of the file within
the filesystem, so locating the files should be easy with a little help
from ceph-dencoder. But there are some files that do not have the
'parent' xattr:
for pg in $(ceph health detail | grep active\+recovery_unfound | cut -d'
' -f 6); do echo $pg; for obj in $(ceph pg $pg list_unfound | jq -r
'.objects | .[] | .oid.oid'| cut -c1-11); do rados -p bcf_fs_data_rep
getxattr $obj.00000000 parent > $obj.parent; done; done
gives
77.1
error getting xattr bcf_fs_data_rep/1002a143927.00000000/parent: (2) No
such file or directory
.....
'bcf_fs_data_rep' is the first data pool of the filesystem which should
contain the xattr data. But for a number of objects (58 of 928) the
above command is not able to retrieve the information.
Questions:
1. If the 'parent' xattr is not available in the first data pool (and
neither in the meta data pool), what might be the state of these
objects? Can they be deleted by using 'mark_unfound delete'?
2. the list_unfound command only prints 256 objects; how can this limit
be lifted, since some pools have more unfound objects?
Regards,
Burkhard
Hi,
Am I doing something wrong or the 21 update is missing for buster?
Thank you
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
What’s the proper way to track down where this error is coming from? Thanks.
6/7/21 12:40:00 AM
[WRN]
[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
6/7/21 12:40:00 AM
[WRN]
Health detail: HEALTH_WARN 1 failed cephadm daemon(s)
Hello Samuel. Thanks for the answer.
Yes the Intel S4510 series is a good choice but it's expensive.
I have 21 server and data distribution is quite well.
At power loss I don't think I'll lose data. All the VM's using same
image and the rest is cookie.
In this case I'm not sure I should spend extra money on PLP.
Actually I like Samsung 870 EVO. It's cheap and I think 300TBW will be
enough for 5-10years.
Do you know any better ssd with the same price range as 870 EVO?
Samsung 870 evo (500GB) = 5 Years or 300 TBW - $64.99
Samsung 860 pro (512GB) = 5 Years or 600 TBW - $99
>
> I would recommend Intel S4510 series, which has power loss protection (PLP).
>
> If you do not care about PLP, lower-cost Samsung 870EVO and Crucial MX500 should also be OK (with separate DB/WAL on enterprise SSD with PLP)
>
> Samuel
>
> ________________________________
> huxiaoyu(a)horebdata.cn
>
>
> From: by morphin
> Date: 2021-05-30 02:48
> To: Anthony D'Atri
> CC: Ceph Users
> Subject: [ceph-users] Re: SSD recommendations for RBD and VM's
> Hello Anthony.
>
> I use Qemu and I don't need size.
> I've 1000 vm and usually they're clones from the same rbd image. The
> image is 30GB.
> Right now I've 7TB Stored data. rep x3 = 20TB data. It's mostly read
> intensive. Usage is stable and does not grow.
> So I need I/O more than capacity. That's why I'm looking for 256-512GB SSD's.
> I think right now 480-512GB is sweet spot for $ / GB. So 60PCS 512GB
> will be enough. Actually 120PCS 256GB will be better but the price
> goes up.
> I have Dell R720-740 and I use SATA Intel DCS3700 for journal. I've
> 40PCS 100GB. I'm gonna make them OSD as well.
> 7 years and DC S3700 still rocks. Not even one of them is dead.
> The SSD must be Low price & High TBW life span. Rest is not important.
>
>
> Anthony D'Atri <anthony.datri(a)gmail.com>, 30 May 2021 Paz, 02:26
> tarihinde şunu yazdı:
> >
> > The choice depends on scale, your choice of chassis / form factor, budget, workload and needs.
> >
> > The sizes you list seem awfully small. Tell us more about your use-case. OpenStack? Proxmox? QEMU? VMware? Converged? Dedicated ?
> > —aad
> >
> >
> > > On May 29, 2021, at 2:10 PM, by morphin <morphinwithyou(a)gmail.com> wrote:
> > >
> > > Hello.
> > >
> > > I have virtualization env and I'm looking new SSD for HDD replacement.
> > > What are the best Performance / Price SSDs in the market right now?
> > > I'm looking 1TB, 512GB, 480GB, 256GB, 240GB.
> > >
> > > Is there a SSD recommendation list for ceph?
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users(a)ceph.io
> > > To unsubscribe send an email to ceph-users-leave(a)ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
Hello.
I'm using rbd disks for zfs and on the zfs I use NFS.
I see 10-50MB/s top. When I use the rbd disk in the VM environment it
can reach 1GB/s RW so there is nothing wrong with the rbd and the
cluster. Also I'm sure about ZFS+NFS performance when zfs on the real
device. But when they work together I can't get even %10 performance.
I think something is wrong, it shouldn't be that bad.
Ceph version: Nautilus 14.2.16
RBD Data = EC Pool
RBD Metadata = SSD replicated
zpool create $testpool rbd0
exportfs /testpool
Is there anyone who has tried it before?
Hi,
I managed to build a Ceph cluster with the help of cephadm tool. It works like a charm.
I have a problem that i’m still not able to fix:
I know that zabbix-sender executable is not integrated into the cephadm image of ceph-mgr pulled and started by podman because of this choice.
https://github.com/ceph/ceph-container/issues/1651
I’m a total newbie about containers tecnologies, but i managed to install zabbix sender manually by executing this with podman
podman ps -a -> looking for the container ID
podman exec -ti [id_container] /bin/bash -> to
Then install repo via rpm and dnf install zabbix-sender.
Two considerations.
1) https://github.com/ceph/ceph-container/issues/1651 -> This answer still leaves me confused. It’s pretty no-sense giving a Zabbix module if then you have to install the executable by yourself and the container it’s overwritten everytime you reboot the physical host giving minimal info about avoiding this behavior.
I know this is open source and you have to know well the environment before doing everything, i know devs want less troubles as possible, but this is about user experience. If you have to choose between the final user or you having an annoying problem, IMHO, i prefer to have a happy user. End of consideration.
I think including zabbix-sender executable into the container wouldn’t kill nobody.
2) Since i’m a total noob about containers, podman, docker etc, do you have any info about "fix" this behavior and avoid overwrite the mgr containers everytime i reboot the host? Please forgive me, i’m totally a newbie about containers.
Thank you in advance for the support.
Best,
Roberto
Kirecom.net
Hello,
I need to upgrade the OS that our Ceph cluster is running on to support new versions of Ceph.
Has anyone devised a model for how you handle this?
Do you just:
Install some new nodes with the new OS
Install the old version of Ceph on the new nodes
Add those nodes/osds to the cluster
Remove the old nodes
Upgrade Ceph on the new nodes
Are there any specific OS that Ceph has said that will have longer future version support? Would like to only touch the OS every 3-4 years if possible.
Thanks,
-Drew
I'm trying to figure out a CRUSH rule that will spread data out across my cluster as much as possible, but not more than 2 chunks per host.
If I use the default rule with an osd failure domain like this:
step take default
step choose indep 0 type osd
step emit
I get clustering of 3-4 chunks on some of the hosts:
# for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r '.pg_stats[].pgid'); do
> echo $pg
> for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do
> ceph osd find $osd | jq -r '.host'
> done | sort | uniq -c | sort -n -k1
8.0
1 harrahs
3 paris
4 aladdin
8.1
1 aladdin
1 excalibur
2 mandalaybay
4 paris
8.2
1 harrahs
2 aladdin
2 mirage
3 paris
...
However, if I change the rule to use:
step take default
step choose indep 0 type host
step chooseleaf indep 2 type osd
step emit
I get the data spread across 4 hosts with 2 chunks per host:
# for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r '.pg_stats[].pgid'); do
> echo $pg
> for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do
> ceph osd find $osd | jq -r '.host'
> done | sort | uniq -c | sort -n -k1
> done
8.0
2 aladdin
2 harrahs
2 mandalaybay
2 paris
8.1
2 aladdin
2 harrahs
2 mandalaybay
2 paris
8.2
2 harrahs
2 mandalaybay
2 mirage
2 paris
...
Is it possible to get the data to spread out over more hosts? I plan on expanding the cluster in the near future and would like to see more hosts get 1 chunk instead of 2.
Also, before you recommend adding two more hosts and switching to a host-based failure domain, the cluster is on a variety of hardware with between 2-6 drives per host and drives that are 4TB-12TB in size (it's part of my home lab).
Thanks,
Bryan