July 2020 - ceph-users - lists.ceph.io

by Abhimnyu Dhobale

Thanks For your reply, Please find the below output and suggest. [root@vpsapohmcs01 ~]# rados -p vpsacephcl01 list-inconsistent-obj 1.3c9 --format=json-pretty { "epoch": 845, "inconsistents": [ { "object": { "name": "rbd_data.515c96b8b4567.000000000000c377", "nspace": "", "locator": "", "snap": "head", "version": 21101 }, "errors": [], "union_shard_errors": [ "read_error" ], "selected_object_info": { "oid": { "oid": "rbd_data.515c96b8b4567.000000000000c377", "key": "", "snapid": -2, "hash": 867656649, "max": 0, "pool": 1, "namespace": "" }, "version": "853'21101", "prior_version": "853'21100", "last_reqid": "client.2317742.0:24909022", "user_version": 21101, "size": 4194304, "mtime": "2020-07-16 21:02:20.564245", "local_mtime": "2020-07-16 21:02:20.572003", "lost": 0, "flags": [ "dirty", "omap_digest" ], "truncate_seq": 0, "truncate_size": 0, "data_digest": "0xffffffff", "omap_digest": "0xffffffff", "expected_object_size": 4194304, "expected_write_size": 4194304, "alloc_hint_flags": 0, "manifest": { "type": 0 }, "watchers": {} }, "shards": [ { "osd": 5, "primary": false, "errors": [], "size": 4194304, "omap_digest": "0xffffffff", "data_digest": "0x8ebd7de4" }, { "osd": 19, "primary": true, "errors": [], "size": 4194304, "omap_digest": "0xffffffff", "data_digest": "0x8ebd7de4" }, { "osd": 24, "primary": false, "errors": [ "read_error" ], "size": 4194304 } ] } ] } On Tue, Jul 14, 2020 at 6:40 PM Eric Smith <Eric.Smith(a)vecima.com> wrote: > If you run (Substitute your pool name for <pool>): > > rados -p <pool> list-inconsistent-obj 1.574 --format=json-pretty > > You should get some detailed information about which piece of data > actually has the error and you can determine what to do with it from there. > > -----Original Message----- > From: Abhimnyu Dhobale <adhobale8(a)gmail.com> > Sent: Tuesday, July 14, 2020 5:13 AM > To: ceph-users(a)ceph.io > Subject: [ceph-users] 1 pg inconsistent > > Good Day, > > Ceph is showing below error frequently. every time after pg repair it is > resolved. > > [root@vpsapohmcs01 ~]# ceph health detail HEALTH_ERR 1 scrub errors; > Possible data damage: 1 pg inconsistent OSD_SCRUB_ERRORS 1 scrub errors > PG_DAMAGED Possible data damage: 1 pg inconsistent pg 1.574 is > active+clean+inconsistent, acting [19,25,2] > > [root@vpsapohmcs02 ~]# cat /var/log/ceph/ceph-osd.19.log | grep error > 2020-07-12 11:42:11.824 7f864e0b2700 -1 log_channel(cluster) log [ERR] : > 1.574 shard 25 soid > 1:2ea0a7a3:::rbd_data.515c96b8b4567.0000000000007a7c:head : candidate had > a read error > 2020-07-12 11:42:15.035 7f86520ba700 -1 log_channel(cluster) log [ERR] : > 1.574 deep-scrub 1 errors > > [root@vpsapohmcs01 ~]# ceph --version > ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic > (stable) > > Request you to please suggest. > > -- > Thanks & Regards > Abhimnyu Dhobale > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an > email to ceph-users-leave(a)ceph.io > -- Thanks & Regards Abhimnyu Dhobale 8149311170

3 years, 11 months

2
1
0 0

Repo sync

by Szabo, Istvan (Agoda)

Hi, Earlier I've synced the repos the following way: rsync -avSH rsync://hk.ceph.com/rpm-luminous/el7 . Today I've tried to sync from us-west and hk, but I got an error: rsync -avSH rsync://us-west.ceph.com/rpm-nautilus/el7/noarch . @ERROR: Unknown module 'rpm-nautilus' rsync error: error starting client-server protocol (code 5) at main.c(1503) [receiver=3.0.6] rsync -avSH rsync://us-west.ceph.com/rpm-nautilus/el7/noarch . @ERROR: Unknown module 'rpm-nautilus' rsync error: error starting client-server protocol (code 5) at main.c(1503) [receiver=3.0.6] rsync -avSH rsync://download.ceph.com/rpm-nautilus/el7/noarch . @ERROR: Unknown module 'rpm-nautilus' rsync error: error starting client-server protocol (code 5) at main.c(1503) [receiver=3.0.6] Also the HK repos doesn't have the newest nautilus, just 14.2.9. Any issue? Thank you ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

3 years, 11 months

1
1
0 0

Technician’s help to solve canon printer offline error

by michaels1990johnson＠gmail.com

When you give a right command to your canon printer for printing the documents, your canon printer is showing offline status. Your canon printer is displaying canon printer offline error. If you’re receiving this offline error again and again, you can get connected with our live printer technicians to get full technical guidance for solving canon printer offline error. As a technically recognized third party support guide, we recognize the exact reasons of this error and try to use the effective troubleshooting solutions to resolve this offline problem. Our technician’s help is available round the clock to provide instant support for any issue. https://www.canonprintersupportpro.us/blog/fix-canon-printer-offline-issue/

3 years, 11 months

1
0
0 0

Technician’s help to solve canon printer offline error

by michaels1990johnson＠gmail.com

When you give a right command to your canon printer for printing the documents, your canon printer is showing offline status. Your canon printer is displaying canon printer offline error. If you’re receiving this offline error again and again, you can get connected with our live printer technicians to get full technical guidance for solving canon printer offline error. As a technically recognized third party support guide, we recognize the exact reasons of this error and try to use the effective troubleshooting solutions to resolve this offline problem. Our technician’s help is available round the clock to provide instant support for any issue. https://www.canonprintersupportpro.us/blog/fix-canon-printer-offline-issue/

3 years, 11 months

1
0
0 0

Re: Cephfs multiple active-active MDS stability and optimization

by Dan van der Ster

Hi, We are using 10 active MDS's with v12.2.12 -- so it is "stable" but we have several measures and lots of experience to make it like that. If I were starting a new cluster now, I would use the latest nautilus or octopus and test the hell out of it before going into prod. Don't start with mimic now, it's end-of-line. First, are you really sure you need multi-active MDS? We only use it where the metadata workload clearly exceeds the abilities of a single active MDS. Evidence of this would be a high, flat-lined CPU usage on the active mds, or better would be to track the "hcr" or "handle_client_request" metric with your monitoring or locally on an MDS with "ceph daemonperf mds.`hostname -s`". A single MDS can normally achieve a few thousand hcr/second at best. Otherwise, here are some relatively advanced things to try to validate the setup... understanding and succeeding in these things should help with your nerves: - Start the cluster, run some workloads, try increasing and decreasing max_mds on the fly and make sure this is working well - is the metadata balancing working well with your common workloads? run your test workloads for hours or days and check that the RSS of each MDS is not growing unexpectedly - does mds balancing make sense for your workload, or are there some places where pinning to subdirs to a rank is worthwhile? - with fully active mds, fully loaded metadata caches, test the failover to standby several times. Try "nice" failovers (e.g. systemctl stop ceph-mds.target on an active) as well as "not-so-nice" failovers (e.g. killall -9 ceph-mds) - Try the cephfs scrub features. Maybe even intentionally corrupt a file or direntry object then check if cephfs scrub behaves as expected Hope that helps! Dan On Thu, Jul 16, 2020 at 3:01 PM huxiaoyu(a)horebdata.cn <huxiaoyu(a)horebdata.cn> wrote: > > Dear Cepher， > > I am planning a cephfs cluster with ca. 100 OSD nodes, each of which has 12 disks, and 2 NVMe (for db wal and cephfs metadata pool). Fpr performance and scalability reasons, i would like to try multi MDS working ative-active. From what i learned in the past, i am not sure about the following questions. > > 1 Which Ceph version should i run? I had a good experience with Luminous 12.2.13, and not familiar yet with Mimic and Nautilus. Is Lumious 12.2.13 stable enouth to run multiple active-active MDS servers for CephFS? > > 2 If i had to go Mimic or Nautilus for CephFS, which one is perferable? > > 3 I did has some experience with Ceph RBD, but not CephFS, So my question is, what should i pay attention to whening running CephFS? I am somehow nervous...... > > best regards, > > Samuel > > > > > > huxiaoyu(a)horebdata.cn > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

3 years, 11 months

1
0
0 0

CEPH performance issues running as Spark storage layer

by arabtura＠gmail.com

Hi All, We’re using CEPH cluster (Nautulis 14.2.10) as a S3 object storage layer for Spark 3 with Yarn running in distributed environment. The issue we see however is slow performance when running even simple spark query on data stored on large number of objects, for example 50.000 objects. We’re aware of slow object listing in S3, but should that really kill the performance while using spark for reading\analyzing\writing the data on S3? Running the same query on the same dataset content but stored in 100 files is multiple times faster. Bucket and bucket index we use for spark are stored on OSDs with SSDs (we’re having 150 of them) , we’re using 12 RGW instances, each limited to 32 concurrent connection by custom app to prevent RGW queue blowing up). When running spark queries, RGW queues rises – depending on number of spark executors – to around 30 per instance providing the number of executors per RGW instance is also in similar. We don’t see any other bottlenecks on infra side than RGW queues. We applied various tuning options for CEPH regarding RGW\OSD\Bluestore performance (ie: objecter_inflight_op_bytes, objecter_inflight_ops, rgw_bucket_index_max_aio, rgw_cache_lru_size ) but the spark works still really slow in above mentioned scenario. The other problem we observed is that when using 4 RGW instead of 12 we see performance degradation only in about 40-50%. We would expect that RGW scaling will behave more efficiently. Does anyone using CEPH in similar way as a storage layer for Spark? Do you observer similar behavior and maybe have some workarounds\solutions for slowness when working with high number of objects?

3 years, 11 months

1
0
0 0

Cephfs multiple active-active MDS stability and optimization

by huxiaoyu＠horebdata.cn

Dear Cepher， I am planning a cephfs cluster with ca. 100 OSD nodes, each of which has 12 disks, and 2 NVMe (for db wal and cephfs metadata pool). Fpr performance and scalability reasons, i would like to try multi MDS working ative-active. From what i learned in the past, i am not sure about the following questions. 1 Which Ceph version should i run? I had a good experience with Luminous 12.2.13, and not familiar yet with Mimic and Nautilus. Is Lumious 12.2.13 stable enouth to run multiple active-active MDS servers for CephFS? 2 If i had to go Mimic or Nautilus for CephFS, which one is perferable? 3 I did has some experience with Ceph RBD, but not CephFS, So my question is, what should i pay attention to whening running CephFS? I am somehow nervous...... best regards, Samuel huxiaoyu(a)horebdata.cn

3 years, 11 months

1
0
0 0

AdminSocket occurs segment fault with samba vfs ceph plugin

by 380562518＠qq.com

Ceph Version: 14.2.5 Samba Version:4.10.4 OS: Centos 7.6.1810 Procedure: 1，Setup a ceph cluster and create a file system. 2，Setup a samba share with samba vfs ceph plugin. 3，Use 'ceph daemon /var/run/ceph/ceph-client.admin.*****.asok help' to get debug command. 4，It will occurs a segment fault when exec the 3rd command. The call trace list belows: #0 0x00007f35425cb387 in raise () from /lib64/libc.so.6 #1 0x00007f35425cca78 in abort () from /lib64/libc.so.6 #2 0x00007f3545a8fff3 in dump_core () at ../source3/lib/dumpcore.c:338 #3 0x00007f3545a8109b in smb_panic_s3 (why=<optimized out>) at ../source3/lib/util.c:839 #4 0x00007f354627297f in smb_panic (why=why@entry=0x7f35462bb55b "internal error") at ../lib/util/fault.c:174 #5 0x00007f3546272bb6 in fault_report (sig=<optimized out>) at ../lib/util/fault.c:88 #6 sig_fault (sig=<optimized out>) at ../lib/util/fault.c:99 #7 <signal handler called> #8 0x00007f35226d026f in acquire (this=0x564a0a014050) at /usr/src/debug/ceph-14.2.5-1.0.8/build/boost/include/boost/smart_ptr/detail/shared_count.hpp:426 #9 acquire_object_id (this=0x7f350fffe220) at /usr/src/debug/ceph-14.2.5-1.0.8/build/boost/include/boost/spirit/home/classic/core/non_terminal/impl/object_with_id.ipp:157 #10 object_with_id (this=0x7f350fffe220) at /usr/src/debug/ceph-14.2.5-1.0.8/build/boost/include/boost/spirit/home/classic/core/non_terminal/impl/object_with_id.ipp:79 #11 grammar (this=0x7f350fffe220) at /usr/src/debug/ceph-14.2.5-1.0.8/build/boost/include/boost/spirit/home/classic/core/non_terminal/grammar.hpp:51 #12 Json_grammer (semantic_actions=..., this=0x7f350fffe220) at /usr/src/debug/ceph-14.2.5-1.0.8/src/json_spirit/json_spirit_reader_template.h:401 #13 json_spirit::read_range_or_throw<__gnu_cxx::__normal_iterator<char const*, std::string>, json_spirit::Value_impl<json_spirit::Config_map<std::string> > > (begin=123 '{', end=0 '\000', value=...) at /usr/src/debug/ceph-14.2.5-1.0.8/src/json_spirit/json_spirit_reader_template.h:585 #14 0x00007f35226d0c3c in json_spirit::read_range<__gnu_cxx::__normal_iterator<char const*, std::string>, json_spirit::Value_impl<json_spirit::Config_map<std::string> > > ( begin=123 '{', end=..., value=...) at /usr/src/debug/ceph-14.2.5-1.0.8/src/json_spirit/json_spirit_reader_template.h:607 #15 0x00007f35226bb07d in read_string<std::basic_string<char>, json_spirit::Value_impl<json_spirit::Config_map<std::basic_string<char> > > > (value=..., s=<error reading variable: Cannot access memory at address 0x7869666572702263>) at /usr/src/debug/ceph-14.2.5-1.0.8/src/json_spirit/json_spirit_reader.cpp:78 #16 json_spirit::read (s="{\"prefix\": \"get_command_descriptions\"}", value=...) at /usr/src/debug/ceph-14.2.5-1.0.8/src/json_spirit/json_spirit_reader.cpp:78 #17 0x00007f35222953a7 in cmdmap_from_json(std::vector<std::string, std::allocator<std::string> >, std::map<std::string, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > >*, std::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >&) () at /usr/src/debug/ceph-14.2.5-1.0.8/src/common/cmdparse.cc:294 #18 0x00007f3522255492 in AdminSocket::execute_command(std::string const&, ceph::buffer::v14_2_0::list&) () at /opt/rh/devtoolset-8/root/usr/include/c++/8/new:169 #19 0x00007f35222567ef in AdminSocket::do_accept() () at /usr/src/debug/ceph-14.2.5-1.0.8/src/common/admin_socket.cc:341 #20 0x00007f3522258cc8 in AdminSocket::entry (this=0x564a09f3a950) at /usr/src/debug/ceph-14.2.5-1.0.8/src/common/admin_socket.cc:241 #21 0x00007f352270f68f in execute_native_thread_routine () from /usr/lib64/ceph/libceph-common.so.0 #22 0x00007f35466d1ea5 in start_thread () from /lib64/libpthread.so.0 #23 0x00007f35426938dd in clone () from /lib64/libc.so.6

3 years, 11 months

2
2
0 0

client - monitor communication.

by Budai Laszlo

Hello everybody, I'm trying to figure out how often the ceph client is contacting the monitors for updating its own information about the cluster map. Can anyone point me to a document describing this client <-> monitor communication? Thank you, Laszlo

3 years, 11 months

4
6
0 0

Monitor IPs

by Will Payne

I need to change the network my monitors are on. It seems this is not a trivial thing to do. Are there any up-to-date instructions for doing so on a cephadm-deployed cluster? I’ve found some steps in older versions of the docs but not sure if these are still correct - they mention using the ceph-mon command which I don’t have. Will

3 years, 11 months

5
4
0 0

2024

2023

2022

2021

2020

2019

ceph-users July 2020