another update,
we now took the more destructive route and removed the cephfs pools
(lucky we had only test date in the filesystem)
Our hope was that within the startup-process the osd will delete the
no longer needed PG, But this is NOT the Case.
So we are still have the same issue the only difference is that the PG
does not belong to a pool anymore.
-360> 2019-08-07 14:52:32.655 7fb14db8de00 5 osd.44 pg_epoch: 196586
pg[23.f8s0(unlocked)] enter Initial
-360> 2019-08-07 14:52:32.659 7fb14db8de00 -1
/build/ceph-13.2.6/src/osd/ECUtil.h: In function
'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread
7fb14db8de00 time 2019-08-07 14:52:32.660169
/build/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width %
stripe_size == 0)
we now can take one rout and try to delete the pg by hand in the OSD
(bluestore) how this can be done? OR we try to upgrade to Nautilus and
hope for the beset.
any help hints are welcome,
have a nice one
Ansgar
Am Mi., 7. Aug. 2019 um 11:32 Uhr schrieb Ansgar Jazdzewski
<a.jazdzewski(a)googlemail.com>:
>
> Hi,
>
> as a follow-up:
> * a full log of one OSD failing to start https://pastebin.com/T8UQ2rZ6
> * our ec-pool cration in the fist place https://pastebin.com/20cC06Jn
> * ceph osd dump and ceph osd erasure-code-profile get cephfs
> https://pastebin.com/TRLPaWcH
>
> as we try to dig more into it, it looks like a bug in the cephfs or
> erasure-coding part of ceph.
>
> Ansgar
>
>
> Am Di., 6. Aug. 2019 um 14:50 Uhr schrieb Ansgar Jazdzewski
> <a.jazdzewski(a)googlemail.com>:
> >
> > hi folks,
> >
> > we had to move one of our clusters so we had to boot all servers, now
> > we found an Error on all OSD with the EC-Pool.
> >
> > do we miss some opitons, will an upgrade to 13.2.6 help?
> >
> >
> > Thanks,
> > Ansgar
> >
> > 2019-08-06 12:10:16.265 7fb337b83200 -1
> > /build/ceph-13.2.4/src/osd/ECUtil.h: In function
> > 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread
> > 7fb337b83200 time 2019-08-06 12:10:16.263025
> > /build/ceph-13.2.4/src/osd/ECUtil.h: 34: FAILED assert(stripe_width %
> > stripe_size == 0)
> >
> > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic
> > (stable) 1: (ceph::ceph_assert_fail(char const, char const, int, char
> > const)+0x102) [0x7fb32eeb83c2] 2: (()+0x2e5587) [0x7fb32eeb8587] 3:
> > (ECBackend::ECBackend(PGBackend::Listener, coll_t const&,
> > boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore,
> > CephContext, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned
> > long)+0x4de) [0xa4cbbe] 4: (PGBackend::build_pg_backend(pg_pool_t
> > const&, std::map<std::cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> >,
> > std::cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >, std::less<std::cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > >, std
> > ::allocator<std::pair<std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > const,
> > std::cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> > > > > const&, PGBackend::Listener, coll_t,
> > boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore,
> > CephContext)+0x2f9 ) [0x9474e9] 5:
> > (PrimaryLogPG::PrimaryLogPG(OSDService, std::shared_ptr<OSDMap const>,
> > PGPool const&, std::map<std::cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> >,
> > std::cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >, std::less<std::cxx11::basic_string<char,
> > std::char_tra its<char>, std::allocator<char> > >,
> > std::allocator<std::pair<std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > const,
> > std::cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> > > > > const&, spg_t)+0x138) [0x8f96e8] 6:
> > (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x11d3)
> > [0x753553] 7: (OSD::load_pgs()+0x4a9) [0x758339] 8:
> > (OSD::init()+0xcd3) [0x7619c3] 9: (main()+0x3678) [0x64d6a8] 10:
> > (libc_start_main()+0xf0) [0x7fb32ca68830] 11: (_start()+0x29)
> > [0x717389] NOTE: a copy of the executable, or objdump -rdS
> > <executable> is needed to interpret this.
> I can add RAM ans is there a way to increase rocksdb caching , can I
> increase bluestore_cache_size_hdd to higher value to cache rocksdb?
In recent releases it's governed by the osd_memory_target parameter. In
previous releases it's bluestore_cache_size_hdd. Check release notes to
know for sure.
> This we have planned to add some SSDs and how many OSD's rocks db we
> can add per SSDs and i guess if one SSD is down then all related OSDs
> has to be re-installed.
Yes. At least you'd better not put all 24 block.db's on a single SSD :)
4-8 HDDs per an SSD is usually fine. Also check db_used_bytes in `ceph
daemon osd.0 perf dump` (replace 0 with actual OSD numbers) to figure
out how much space your DBs use. If it's below 30gb you're lucky because
in that case DBs will fit on 30GB SSD partitions.
https://yourcmc.ru/wiki/Ceph_performance#About_block.db_sizing
--
Vitaliy Filippov
On Wed, Aug 7, 2019 at 9:30 AM Robert LeBlanc <robert(a)leblancnet.us> wrote:
>> # ceph osd crush rule dump replicated_racks_nvme
>> {
>> "rule_id": 0,
>> "rule_name": "replicated_racks_nvme",
>> "ruleset": 0,
>> "type": 1,
>> "min_size": 1,
>> "max_size": 10,
>> "steps": [
>> {
>> "op": "take",
>> "item": -44,
>> "item_name": "default~nvme" <------------
>> },
>> {
>> "op": "chooseleaf_firstn",
>> "num": 0,
>> "type": "rack"
>> },
>> {
>> "op": "emit"
>> }
>> ]
>> }
>> ```
>
>
> Yes, our HDD cluster is much like this, but not Luminous, so we created as separate root with SSD OSD for the metadata and set up a CRUSH rule for the metadata pool to be mapped to SSD. I understand that the CRUSH rule should have a `step take default class ssd` which I don't see in your rule unless the `~` in the item_name means device class.
~ is the internal implementation of device classes. Internally it's
still using separate roots, that's how it stays compatible with older
clients that don't know about device classes.
And since it wasn't mentioned here yet: consider upgrading to Nautilus
to benefit from the new and improved accounting for metadata space.
You'll be able to see how much space is used for metadata and quotas
should work properly for metadata usage.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
>
> Thanks
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
> _______________________________________________
> ceph-users mailing list
> ceph-users(a)lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Hi all,
we run a Ceph Luminous 12.2.12 cluster, 7 osds servers 12x4TB disks each.
Recently we redeployed the osds of one of them using bluestore backend,
however, after this, we're facing Out of memory errors(invoked oom-killer)
and the OS kills one of the ceph-osd process.
The osd is restarted automatically and back online after one minute.
We're running Ubuntu 16.04, kernel 4.15.0-55-generic.
The server has 32GB of RAM and 4GB of swap partition.
All the disks are hdd, no ssd disks.
Bluestore settings are the default ones
"osd_memory_target": "4294967296"
"osd_memory_cache_min": "134217728"
"bluestore_cache_size": "0"
"bluestore_cache_size_hdd": "1073741824"
"bluestore_cache_autotune": "true"
As stated in the documentation, bluestore assigns by default 4GB of
RAM per osd(1GB of RAM for 1TB).
So in this case 48GB of RAM would be needed. Am I right?
Are these the minimun requirements for bluestore?
In case adding more RAM is not an option, can any of
osd_memory_target, osd_memory_cache_min, bluestore_cache_size_hdd
be decrease to fit in our server specs?
Would this have any impact on performance?
Thanks
Jaime
--
Jaime Ibar
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | jaime(a)tchpc.tcd.ie
Tel: +353-1-896-3725
Hi All,
ceph mgr module disable balancer
Error EINVAL: module 'balancer' cannot be disabled (always-on)
Whats the way to restart balanacer? Restart MGR service?
I wanna suggest to Balancer developer to setup a ceph-balancer.log for this
module get more information about whats doing.
Regards
Manuel
On Centos7, the option "secretfile" requires installation of ceph-fuse.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: ceph-users <ceph-users-bounces(a)lists.ceph.com> on behalf of Yan, Zheng <ukernel(a)gmail.com>
Sent: 07 August 2019 10:10:19
To: DHilsbos(a)performair.com
Cc: ceph-users
Subject: Re: [ceph-users] Error Mounting CephFS
On Wed, Aug 7, 2019 at 3:46 PM <DHilsbos(a)performair.com> wrote:
>
> All;
>
> I have a server running CentOS 7.6 (1810), that I want to set up with CephFS (full disclosure, I'm going to be running samba on the CephFS). I can mount the CephFS fine when I use the option secret=, but when I switch to secretfile=, I get an error "No such process." I installed ceph-common.
>
> Is there a service that I'm not aware I should be starting?
> Do I need to install another package?
>
mount.ceph is missing. check if it exists and is located in $PATH
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International Inc.
> DHilsbos(a)PerformAir.com
> www.PerformAir.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users(a)lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users(a)lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Hi,
as a follow-up:
* a full log of one OSD failing to start https://pastebin.com/T8UQ2rZ6
* our ec-pool cration in the fist place https://pastebin.com/20cC06Jn
* ceph osd dump and ceph osd erasure-code-profile get cephfs
https://pastebin.com/TRLPaWcH
as we try to dig more into it, it looks like a bug in the cephfs or
erasure-coding part of ceph.
Ansgar
Am Di., 6. Aug. 2019 um 14:50 Uhr schrieb Ansgar Jazdzewski
<a.jazdzewski(a)googlemail.com>:
>
> hi folks,
>
> we had to move one of our clusters so we had to boot all servers, now
> we found an Error on all OSD with the EC-Pool.
>
> do we miss some opitons, will an upgrade to 13.2.6 help?
>
>
> Thanks,
> Ansgar
>
> 2019-08-06 12:10:16.265 7fb337b83200 -1
> /build/ceph-13.2.4/src/osd/ECUtil.h: In function
> 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread
> 7fb337b83200 time 2019-08-06 12:10:16.263025
> /build/ceph-13.2.4/src/osd/ECUtil.h: 34: FAILED assert(stripe_width %
> stripe_size == 0)
>
> ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic
> (stable) 1: (ceph::ceph_assert_fail(char const, char const, int, char
> const)+0x102) [0x7fb32eeb83c2] 2: (()+0x2e5587) [0x7fb32eeb8587] 3:
> (ECBackend::ECBackend(PGBackend::Listener, coll_t const&,
> boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore,
> CephContext, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned
> long)+0x4de) [0xa4cbbe] 4: (PGBackend::build_pg_backend(pg_pool_t
> const&, std::map<std::cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> >,
> std::cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >, std::less<std::cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > >, std
> ::allocator<std::pair<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const,
> std::cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > > > > const&, PGBackend::Listener, coll_t,
> boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore,
> CephContext)+0x2f9 ) [0x9474e9] 5:
> (PrimaryLogPG::PrimaryLogPG(OSDService, std::shared_ptr<OSDMap const>,
> PGPool const&, std::map<std::cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> >,
> std::cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >, std::less<std::cxx11::basic_string<char,
> std::char_tra its<char>, std::allocator<char> > >,
> std::allocator<std::pair<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > const,
> std::cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> > > > > const&, spg_t)+0x138) [0x8f96e8] 6:
> (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x11d3)
> [0x753553] 7: (OSD::load_pgs()+0x4a9) [0x758339] 8:
> (OSD::init()+0xcd3) [0x7619c3] 9: (main()+0x3678) [0x64d6a8] 10:
> (libc_start_main()+0xf0) [0x7fb32ca68830] 11: (_start()+0x29)
> [0x717389] NOTE: a copy of the executable, or objdump -rdS
> <executable> is needed to interpret this.
Including new ceph-users list.
----- Forwarded message from Mike Perez <miperez(a)redhat.com> -----
Date: Fri, 2 Aug 2019 10:08:20 -0700
From: Mike Perez <miperez(a)redhat.com>
To: Kevin Hrpcek <kevin.hrpcek(a)ssec.wisc.edu>
CC: "ceph-users(a)lists.ceph.com" <ceph-users(a)lists.ceph.com>
Subject: Re: [ceph-users] Ceph Scientific Computing User Group
We have scheduled the next meeting on the community calendar for August
28 at 14:30 UTC. Each meeting will then take place on the last
Wednesday of each month.
Here's the pad to collect agenda/notes:
[1]https://pad.ceph.com/p/Ceph_Science_User_Group_Index
--
Mike Perez (thingee)
On Tue, Jul 23, 2019 at 10:40 AM Kevin Hrpcek
<[2]kevin.hrpcek(a)ssec.wisc.edu> wrote:
Update
We're going to hold off until August for this so we can promote it on
the Ceph twitter with more notice. Sorry for the inconvenience if you
were planning on the meeting tomorrow. Keep a watch on the list,
twitter, or ceph calendar for updates.
Kevin
On 7/5/19 11:15 PM, Kevin Hrpcek wrote:
We've had some positive feedback and will be moving forward with
this user group. The first virtual user group meeting is planned for
July 24th at 4:30pm central European time/10:30am American eastern
time. We will keep it to an hour in length. The plan is to use the
ceph bluejeans video conferencing and it will be put on the ceph
community calendar. I will send out links when it is closer to the
24th.
The goal of this user group is to promote conversations and sharing
ideas for how ceph is used in the the scientific/hpc/htc
communities. Please be willing to discuss your use cases, cluster
configs, problems you've had, shortcomings in ceph, etc... Not
everyone pays attention to the ceph lists so feel free to share the
meeting information with others you know that may be interested in
joining in.
Contact me if you have questions, comments, suggestions, or want to
volunteer a topic for meetings. I will be brainstorming some
conversation starters but it would also be interesting to have
people give a deep dive into their use of ceph and what they have
built around it to support the science being done at their facility.
Kevin
On 6/17/19 10:43 AM, Kevin Hrpcek wrote:
Hey all,
At cephalocon some of us who work in scientific computing got
together for a BoF and had a good conversation. There was some
interest in finding a way to continue the conversation focused on
ceph in scientific computing and htc/hpc environments. We are
considering putting together monthly video conference user group
meeting to facilitate sharing thoughts and ideas for this part of
the ceph community. At cephalocon we mostly had teams present from
the EU so I'm interested in hearing how much community interest
there is in a ceph+science/HPC/HTC user group meeting. It will be
impossible to pick a time that works well for everyone but initially
we considered something later in the work day for EU countries.
Reply to me if you're interested and please include your timezone.
Kevin
_______________________________________________
ceph-users mailing list
[3]ceph-users(a)lists.ceph.com
[4]http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[5]ceph-users(a)lists.ceph.com
[6]http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[7]ceph-users(a)lists.ceph.com
[8]http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
References
1. https://pad.ceph.com/p/Ceph_Science_User_Group_Index
2. mailto:kevin.hrpcek@ssec.wisc.edu
3. mailto:ceph-users@lists.ceph.com
4. http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
5. mailto:ceph-users@lists.ceph.com
6. http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
7. mailto:ceph-users@lists.ceph.com
8. http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users(a)lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
----- End forwarded message -----
--
Jan Fajerski
Engineer Enterprise Storage
SUSE Linux GmbH, GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
Hi, All,
When deploying a development cluster, there are three types of OSD objectstore backend: filestore, bluestore and kstore.
But there is no "--kstore" option when using "ceph-deploy osd"command to deploy a real ceph cluster.
Can kstore be used as OSD objectstore backend when deploy a real ceph cluster? If can, how to ?
Thanks a lot
R.R.Yuan