Hi everyone,
we currently have 6 OSDs with 8TB HDDs split across 3 hosts.
The main usage is KVM-Images.
To improve speed we planned on putting the block.db and WAL onto NVMe-SSDs.
The plan was to put 2x1TB in each host.
One option I thought of was to RAID 1 them for better redundancy, I don't know how high the risk is of corrupting the block.db by one failed SSD block.
Or should I just one for WAL+block.db and use the other one as fast storage?
Thank you all very much!
Christian
ceph version: 14.2.5 (ad5bd132e1492173c85fda2cc863152730b16a92) nautilus
(stable)
I set up a ceph cluster and I'm uploading objects through rgw with a speed
of 60 objects/s. I added some lifecycle rules to buckets so that my disks
will not be used up.
However, after I set "debug_rgw" to 5 and run `radosgw-admin lc process`, I
found from the logs that the deletion is not parallel (only objects in one
bucket is deleted in the same time) and speed is much slower than my upload
speed (60 objects/s), the disk usage is still growing bigger and bigger.
Could you please tell me what should I do to speed up the lifecycle
process? Thanks in advance!
Hi,
My turn.
We suddenly have a big outage which is similar/identical to
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036519.html
Some of the osds are runnable, but most crash when they start -- crc
error in osdmap::decode.
I'm able to extract an osd map from a good osd and it decodes well
with osdmaptool:
# ceph-objectstore-tool --op get-osdmap --data-path
/var/lib/ceph/osd/ceph-680/ --file osd.680.map
But when I try on one of the bad osds I get:
# ceph-objectstore-tool --op get-osdmap --data-path
/var/lib/ceph/osd/ceph-666/ --file osd.666.map
terminate called after throwing an instance of 'ceph::buffer::malformed_input'
what(): buffer::malformed_input: bad crc, actual 822724616 !=
expected 2334082500
*** Caught signal (Aborted) **
in thread 7f600aa42d00 thread_name:ceph-objectstor
ceph version 13.2.7 (71bd687b6e8b9424dd5e5974ed542595d8977416) mimic (stable)
1: (()+0xf5f0) [0x7f5ffefc45f0]
2: (gsignal()+0x37) [0x7f5ffdbae337]
3: (abort()+0x148) [0x7f5ffdbafa28]
4: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f5ffe4be7d5]
5: (()+0x5e746) [0x7f5ffe4bc746]
6: (()+0x5e773) [0x7f5ffe4bc773]
7: (()+0x5e993) [0x7f5ffe4bc993]
8: (OSDMap::decode(ceph::buffer::list::iterator&)+0x160e) [0x7f6000f4168e]
9: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f6000f42e31]
10: (get_osdmap(ObjectStore*, unsigned int, OSDMap&,
ceph::buffer::list&)+0x1d0) [0x55d30a489190]
11: (main()+0x5340) [0x55d30a3aae70]
12: (__libc_start_main()+0xf5) [0x7f5ffdb9a505]
13: (()+0x3a0f40) [0x55d30a483f40]
Aborted (core dumped)
I think I want to inject the osdmap, but can't:
# ceph-objectstore-tool --op set-osdmap --data-path
/var/lib/ceph/osd/ceph-666/ --file osd.680.map
osdmap (#-1:b65b78ab:::osdmap.2983572:0#) does not exist.
How do I do this?
Thanks for any help!
dan
hi,all
I had enable the prometheus module on my ceph cluster , the ceph version is ceph version 14.2.5 (ad5bd132e1492173c85fda2cc863152730b16a92) nautilus (stable).
when I eanble this modle, can get exported content from the prometheus module.
but , when I restart all the ceph cluster's node , the module can't expor any content , curl http://host/metrics ; get blan;
the tcp/9283 port is listen; I had disable and enable this module,but still no content export.
黄明友
IT基础架构部经理
V.Photos 云摄影
移动电话: +86 13540630430
客服电话:400 - 806 - 5775
电子邮件: hmy(a)v.photos
官方网址: www.v.photos
上海 黄浦区中山东二路88号外滩SOHO3Q F栋 2层
北京 朝阳区光华路9号光华路SOHO二期南二门SOHO3Q 1层
广州 天河区林和中路136号天誉花园二期3Wcoffice 天誉青创社区
深圳 南山区蛇口网谷科技大厦二期A座102网谷双创街 1层
成都 成华区建设路世贸广场 7层
Hello Casper,
did you found an answer on this topic?
my guess is, that with "ceph pg repair" the copy of primary osd will
overwrite the 2nd and 3rd - in case it is readable.. but what is when it
is not readable? :thinking:
Would be nice to know if there is a way to tell ceph to repair pg with
copy from osd X.
best regards,
Mehmet
Am 04.12.2019 um 13:47 schrieb Caspar Smit:
> Hi all,
>
> I tried to dig in the mailinglist archives but couldn't find a clear
> answer to the following situation:
>
> Ceph encountered a scrub error resulting in HEALTH_ERR
> Two PG's are active+clean+inconsistent. When investigating the PG i see
> a "read_error" on the primary OSD. Both PG's are replicated PG's with 3
> copies.
>
> I'm on Luminous 12.2.5 on this installation, is it safe to just run
> "ceph pg repair" on those PG's or will it then overwrite the two good
> copies with the bad one from the primary?
> If the latter is true, what is the correct way to resolve this?
>
> Kind regards,
> Caspar Smit
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users(a)lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>