May 2021 - ceph-users - lists.ceph.io

Re: RGW failed to start after upgrade to pacific

by 胡玮文

Hi Casey, I still see this error after upgrade to 16.2.3 > 在 2021年4月7日，00:54，Casey Bodley <cbodley(a)redhat.com> 写道： > > thanks for the details. this is a regression from changes to the > datalog storage for multisite - this -5 error is coming from the new > 'fifo' backend. as a workaround, you can set the new > 'rgw_data_log_backing' config variable back to 'omap' > > Adam has fixes already merged to the pacific branch; be aware that the > first pacific point release will change the name of > 'rgw_data_log_backing' to 'rgw_default_data_log_backing' and default > back to 'fifo' > >> On Tue, Apr 6, 2021 at 2:37 AM Martin Verges <martin.verges(a)croit.io> wrote: >> Hello, >> we see same problems. Deleting all the pools and redeploy rgw solved it on >> that test cluster, however that is no solution for production ;) >> systemd[1]: Started Ceph rados gateway. >> radosgw[7171]: 2021-04-04T14:37:51.508+0000 7fc6641efc00 0 deferred set >> uid:gid to 167:167 (ceph:ceph) >> radosgw[7171]: failed to chown /dev/null: (30) Read-only file system >> radosgw[7171]: 2021-04-04T14:37:51.508+0000 7fc6641efc00 0 ceph version >> 16.2.0-31-g5922b2b9c1 (5922b2b9c17f0877f84b0b3f2557ab72a628cbfe) pacific >> (stable), process radosgw, pid 7171 >> radosgw[7171]: 2021-04-04T14:37:51.508+0000 7fc6641efc00 0 framework: >> beast >> radosgw[7171]: 2021-04-04T14:37:51.508+0000 7fc6641efc00 0 framework conf >> key: ssl_port, val: 443 >> radosgw[7171]: 2021-04-04T14:37:51.508+0000 7fc6641efc00 0 framework conf >> key: port, val: 80 >> radosgw[7171]: 2021-04-04T14:37:51.508+0000 7fc6641efc00 0 framework conf >> key: ssl_certificate, val: /etc/ceph/rgwcert.pem >> radosgw[7171]: 2021-04-04T14:37:51.508+0000 7fc6641efc00 1 radosgw_Main >> not setting numa affinity >> radosgw[7171]: 2021-04-04T14:37:51.680+0000 7fc6641efc00 -1 static int >> rgw::cls::fifo::FIFO::create(librados::v14_2_0::IoCtx, >> std::__cxx11::string, std::unique_ptr<rgw::cls::fifo::FIFO>*, >> optional_yield, std::optional<rados::cls::fifo::objv>, >> std::optional<std::basic_string_view<char> >, bool, uint64_t, uint64_t):925 >> create_meta failed: r=-5 >> radosgw[7171]: 2021-04-04T14:37:51.680+0000 7fc6641efc00 -1 static int >> rgw::cls::fifo::FIFO::create(librados::v14_2_0::IoCtx, >> std::__cxx11::string, std::unique_ptr<rgw::cls::fifo::FIFO>*, >> optional_yield, std::optional<rados::cls::fifo::objv>, >> std::optional<std::basic_string_view<char> >, bool, uint64_t, uint64_t):925 >> create_meta failed: r=-5 >> radosgw[7171]: 2021-04-04T14:37:51.680+0000 7fc6641efc00 -1 int >> RGWDataChangesLog::start(const RGWZone*, const RGWZoneParams&, RGWSI_Cls*, >> librados::v14_2_0::Rados*): Error when starting backend: Input/output error >> radosgw[7171]: 2021-04-04T14:37:51.680+0000 7fc6641efc00 0 ERROR: failed >> to start datalog_rados service ((5) Input/output error >> radosgw[7171]: 2021-04-04T14:37:51.680+0000 7fc6641efc00 -1 int >> RGWDataChangesLog::start(const RGWZone*, const RGWZoneParams&, RGWSI_Cls*, >> librados::v14_2_0::Rados*): Error when starting backend: Input/output error >> radosgw[7171]: 2021-04-04T14:37:51.680+0000 7fc6641efc00 0 ERROR: failed >> to init services (ret=(5) Input/output error) >> radosgw[7171]: 2021-04-04T14:37:51.700+0000 7fc6641efc00 -1 Couldn't init >> storage provider (RADOS) >> radosgw[7171]: 2021-04-04T14:37:51.700+0000 7fc6641efc00 -1 Couldn't init >> storage provider (RADOS) >> systemd[1]: ceph-radosgw(a)rgw.new-croit-host-C0DE01.service: Main process >> exited, code=exited, status=5/NOTINSTALLED >> systemd[1]: ceph-radosgw(a)rgw.new-croit-host-C0DE01.service: Unit entered >> failed state. >> systemd[1]: ceph-radosgw(a)rgw.new-croit-host-C0DE01.service: Failed with >> result 'exit-code'. >> -- >> Martin Verges >> Managing director >> Mobile: +49 174 9335695 >> E-Mail: martin.verges(a)croit.io >> Chat: https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ft.me%2FM… >> croit GmbH, Freseniusstr. 31h, 81247 Munich >> CEO: Martin Verges - VAT-ID: DE310638492 >> Com. register: Amtsgericht Munich HRB 231263 >> Web: https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcroit.io… >> YouTube: https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgoo.gl%2… >> -- >> Martin Verges >> Managing director >> Mobile: +49 174 9335695 >> E-Mail: martin.verges(a)croit.io >> Chat: https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ft.me%2FM… >> croit GmbH, Freseniusstr. 31h, 81247 Munich >> CEO: Martin Verges - VAT-ID: DE310638492 >> Com. register: Amtsgericht Munich HRB 231263 >> Web: https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcroit.io… >> YouTube: https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgoo.gl%2… >> On Mon, 5 Apr 2021 at 19:59, Robert Sander <r.sander(a)heinlein-support.de> >> wrote: >>> Hi, >>>> Am 04.04.21 um 15:22 schrieb 胡玮文: >>>> bash[9823]: debug 2021-04-04T13:01:04.995+0000 7ff80f172440 -1 static >>> int rgw::cls::fifo::FIFO::create(librados::v14_2_0::IoCtx, >>> std::__cxx11::string, std::unique_ptr<rgw::cls::fifo::FIFO>*, >>> optional_yield, std::optional<rados::cls::fifo::objv>, >>> std::optional<std::basic_string_view<char> >, bool, uint64_t, uint64_t):925 >>> create_meta failed: r=-5 >>>> bash[9823]: debug 2021-04-04T13:01:04.995+0000 7ff80f172440 -1 int >>> RGWDataChangesLog::start(const RGWZone*, const RGWZoneParams&, RGWSI_Cls*, >>> librados::v14_2_0::Rados*): Error when starting backend: Input/output error >>>> bash[9823]: debug 2021-04-04T13:01:04.995+0000 7ff80f172440 0 ERROR: >>> failed to start datalog_rados service ((5) Input/output error >>>> bash[9823]: debug 2021-04-04T13:01:04.995+0000 7ff80f172440 0 ERROR: >>> failed to init services (ret=(5) Input/output error) >>> I see the same issues on an upgraded cluster. >>> Regards >>> -- >>> Robert Sander >>> Heinlein Consulting GmbH >>> Schwedter Str. 8/9b, 10119 Berlin >>> https://apac01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.heinl… >>> Tel: 030 / 405051-43 >>> Fax: 030 / 405051-19 >>> Zwangsangaben lt. §35a GmbHG: >>> HRB 93818 B / Amtsgericht Berlin-Charlottenburg, >>> Geschäftsführer: Peer Heinlein -- Sitz: Berlin >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users(a)ceph.io >>> To unsubscribe send an email to ceph-users-leave(a)ceph.io >> _______________________________________________ >> ceph-users mailing list -- ceph-users(a)ceph.io >> To unsubscribe send an email to ceph-users-leave(a)ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

2 years, 11 months

1
0
0 0

Re: Building ceph clusters with 8TB SSD drives?

by Erik Lindahl

The main reason for SSDs is typically to improve IOPS for small writes, but for that usage most (all) consumer SSDs we have tested perform badly in Ceph. The reason for this is that Ceph requires SYNC writes, and since consumer SSDs (and now even some cheap datacenter ones) don't have capacitors for power-loss-protection they cannot use the volatile caches that give them (semi-fake) good performance on desktops. If that sounds bad, you should be even more careful of you shop around until you find a cheap drive that performs well - because there have historically been consumer drives that lie and acknowledge a sync even if the data is just in volatile memory rather than safe :-) Samsung PM883 is likely one of the cheapest drives that you can still fully trust - at least if your application is not highly write-intensive. Now, having said that, we have had pretty good experience with a way to partly cheat around these limitations: since we have large servers with mixed HDDs we also have 2-3 NVMe samsung PM983 M.2 drives per server on PCIe cards for the DB/wal. It seems to work remarkably well to do this for consumer SSDs to, I.e. let each 4TB el cheapo SATA SSD (we used Samsung 860) use a ~100GB db/wal partition on an NVMe drive. This gives very nice low latencies in rados benchmarks, although they are still ~50% higher than with proper enterprise SSDs. Caveats: - Think about balancing IOPS. If you have 10 SSD OSDs share a single NVMe WAL device you will likely be limited by the NVMe. - if the NVMe drive dies, all the corresponding OSDs die. - This might work for read-intensive applications, but if you try it for write-intensive applications you will wear out the consumer SSDs (check their write endurance). - You will still see latency/bandwidth go up/down and periodically throttle for consumer SSDs. In comparison, even the relatively cheap pm883 "just works" at constant high bandwidth close to the bus limit, and the latency is a constant low fraction of a millisecond in ceph. In summary, while somewhat possible, I don't think it's worth the hassle/risk/complex setup with consumer drives, but if I absolutely had to i would at least avoid the absolutely cheapest QVO models - and if you don't put the WAL on a better device I predict you'll regret it once you start doing benchmarks in RADOS. Cheers, Erik

2 years, 11 months

1
0
0 0

x-amz-request-id logging with beast + rgw (ceph 15.2.10/containerized)?

by David Orman

Hi, Is there any way to log the x-amz-request-id along with the request in the rgw logs? We're using beast and don't see an option in the configuration documentation to add headers to the request lines. We use centralized logging and would like to be able to search all layers of the request path (edge, lbs, ceph, etc) with a x-amz-request-id. Right now, all we see is this: debug 2021-04-01T15:55:31.105+0000 7f54e599b700 1 beast: 0x7f5604c806b0: x.x.x.x - - [2021-04-01T15:55:31.105455+0000] "PUT /path/object HTTP/1.1" 200 556 - "aws-sdk-go/1.36.15 (go1.15.3; linux; amd64)" - We've also tried this: ceph config set global rgw_enable_ops_log true ceph config set global rgw_ops_log_socket_path /tmp/testlog After doing this, inside the rgw container, we can socat - UNIX-CONNECT:/tmp/testlog and see the log entries being recorded that we want, but there has to be a better way to do this, where the logs are emitted like the request logs above by beast, so that we can handle it using journald. If there's an alternative that would accomplish the same thing, we're very open to suggestions. Thank you, David

2 years, 11 months

3
6
0 0

Upgrade and lost osds Operation not permitted

by Behzad Khoshbakhti

Hi all, As I have upgrade my Ceph cluster from 15.2.10 to 16.2.0, during the manual upgrade using the precompiled packages, the OSDs was down with the following messages: root@osd03:/var/lib/ceph/osd/ceph-2# ceph-volume lvm activate --all --> Activating OSD ID 2 FSID 2d3ffc61-e430-4b89-bcd4-105b2df26352 Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2 Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-9d37674b-a269-4239-aa9e-66a3c74df76c/osd-block-2d3ffc61-e430-4b89-bcd4-105b2df26352 --path /var/lib/ceph/osd/ceph-2 --no-mon-config Running command: /usr/bin/ln -snf /dev/ceph-9d37674b-a269-4239-aa9e-66a3c74df76c/osd-block-2d3ffc61-e430-4b89-bcd4-105b2df26352 /var/lib/ceph/osd/ceph-2/block Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-2/block Running command: /usr/bin/chown -R ceph:ceph /dev/dm-1 Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2 Running command: /usr/bin/systemctl enable ceph-volume@lvm-2-2d3ffc61-e430-4b89-bcd4-105b2df26352 Running command: /usr/bin/systemctl enable --runtime ceph-osd@2 Running command: /usr/bin/systemctl start ceph-osd@2 --> ceph-volume lvm activate successful for osd ID: 2 Content of /var/log/ceph/ceph-osd.2.log 2021-04-04T14:54:56.625+0430 7f4afbac0f00 0 set uid:gid to 64045:64045 (ceph:ceph) 2021-04-04T14:54:56.625+0430 7f4afbac0f00 0 ceph version 16.2.0 (0c2054e95bcd9b30fdd908a79ac1d8bbc3394442) pacific (stable), process ceph-osd, pid 5484 2021-04-04T14:54:56.625+0430 7f4afbac0f00 0 pidfile_write: ignore empty --pid-file 2021-04-04T14:54:56.625+0430 7f4afbac0f00 -1* bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-2/block: (1) Operation not permitted* 2021-04-04T14:54:56.625+0430 7f4afbac0f00 -1 *** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-2: (2) No such file or directory* root@osd03:/var/lib/ceph/osd/ceph-2# systemctl status ceph-osd@2 â— ceph-osd(a)2.service - Ceph object storage daemon osd.2 Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Sun 2021-04-04 14:55:06 +0430; 50s ago Process: 5471 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 2 (code=exited, status=0/SUCCESS) Process: 5484 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id 2 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE) Main PID: 5484 (code=exited, status=1/FAILURE) Apr 04 14:55:06 osd03 systemd[1]: ceph-osd(a)2.service: Scheduled restart job, restart counter is at 3. Apr 04 14:55:06 osd03 systemd[1]: Stopped Ceph object storage daemon osd.2. Apr 04 14:55:06 osd03 systemd[1]: ceph-osd(a)2.service: Start request repeated too quickly. Apr 04 14:55:06 osd03 systemd[1]: ceph-osd(a)2.service: Failed with result 'exit-code'. Apr 04 14:55:06 osd03 systemd[1]: Failed to start Ceph object storage daemon osd.2. root@osd03:/var/lib/ceph/osd/ceph-2# root@osd03:~# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT fd0 2:0 1 4K 0 disk loop0 7:0 0 55.5M 1 loop /snap/core18/1988 loop1 7:1 0 69.9M 1 loop /snap/lxd/19188 loop2 7:2 0 55.5M 1 loop /snap/core18/1997 loop3 7:3 0 70.4M 1 loop /snap/lxd/19647 loop4 7:4 0 32.3M 1 loop /snap/snapd/11402 loop5 7:5 0 32.3M 1 loop /snap/snapd/11107 sda 8:0 0 80G 0 disk ├─sda1 8:1 0 1M 0 part ├─sda2 8:2 0 1G 0 part /boot └─sda3 8:3 0 79G 0 part └─ubuntu--vg-ubuntu--lv 253:0 0 69.5G 0 lvm / sdb 8:16 0 16G 0 disk └─sdb1 8:17 0 16G 0 part └─ceph--9d37674b--a269--4239--aa9e--66a3c74df76c-osd--block--2d3ffc61--e430--4 b89--bcd4--105b2df26352 253:1 0 16G 0 lvm root@osd03:~# root@osd03:/var/lib/ceph/osd/ceph-2# mount | grep -i ceph tmpfs on /var/lib/ceph/osd/ceph-2 type tmpfs (rw,relatime) root@osd03:/var/lib/ceph/osd/ceph-2# any help is much appreciated -- Regards Behzad Khoshbakhti Computer Network Engineer (CCIE #58887)

2 years, 11 months

6
14
0 0

Natutilus - not unmapping

by Joe Comeau

Nautilus cluster is not unmapping ceph 14.2.16 ceph report |grep "osdmap_.*_committed" report 1175349142 "osdmap_first_committed": 285562, "osdmap_last_committed": 304247, we've set osd_map_cache_size = 20000 but its is slowly growing to that difference as well OSD map first committed is not changing for some strange reason Cluster has been around and upgraded since either firefly or jewel I have seen a few other with this problem to no solution to it Any suggestions ? Thanks Joe

2 years, 11 months

2
2
0 0

How to trim RGW sync errors

by by morphin

Hello. I was have multisite RGW (14.2.16 nautilus) setup and some of the bucket couldn't finish bucket sync due to overfill buckets, There was different needs and the sync started purpose of migration. I made the secondary zone the master and removed the old master zone from zonegroup. Now I still have sync errors and sync error trim do not work. radosgw-admin --id radosgw.srv1 sync error list | grep name | wc -l 32000 Thats a lot of errors. Sync error trim does nothing. When I run period update commit I saw sync status field has a lot of records as below. radosgw-admin --id radosgw.srv1 period update --commit { "id": "e5d30f8f", "epoch": 7, "predecessor_uuid": "1d0b7132", "sync_status": [ "1_1611733356.499643_1448979853.1", "1_1611225916.734727_865381974.1", "1_1611648125.876993_1659659292.1", "1_1608194415.061001_737663090.1", "1_1605880458.143435_1259922694.1", "1_1611225999.087089_1887995199.1", "1_1586035175.626619_488028.1", "", "", "1_1611057887.910246_973493243.1", "1_1612180963.822684_807349060.1", "", "", "1_1612180818.328001_807344892.1", "1_1611058156.662721_1887884194.1", "1_1611057588.159455_1887883796.1", "1_1611647015.874625_1129837262.1", "1_1586035175.602419_753756.1", "", "1_1606215091.912960_988474411.1", "", "1_1600418137.932356_1027064325.1", "1_1609926537.036681_832230841.1", "", "", "1_1611057624.857485_1658280806.1", "1_1600419671.553723_365405366.1", "", "1_1611057662.014628_859134308.1", "1_1611057665.933662_843443436.1", "1_1605879154.805811_700811071.1", "1_1602509494.904964_696294030.1", "", "1_1611057618.891024_1150752303.1", "1_1611440831.055432_1458827253.1", "1_1611451128.857514_806931659.1", "", "1_1611057597.877068_1785564634.1", "1_1611057860.565465_1785564826.1", "1_1585821684.950844_61616.1", "", "", "", "1_1601647994.988107_511440126.1", "", "1_1608194424.578834_777512349.1", "1_1605879126.845904_958578574.1", "", "1_1590061636.162223_183644368.1", "1_1609834839.884870_1076396513.1", "", "1_1612430017.546386_612493167.1", "1_1605879158.230856_1635059634.1", "", "1_1612420115.322098_1468865033.1", "1_1611057731.182423_817020944.1", "1_1611225026.887795_806142997.1", "1_1612188490.428048_1152864210.1", "1_1612187913.914410_861646554.1", "1_1609393942.952120_574675578.1", "1_1611733086.223927_861322773.1", "1_1605880394.928467_759903023.1", "1_1600418082.175862_556536400.1", "1_1605879150.320951_1210709666.1" ], "period_map": { "id": "e5d30f8f", "zonegroups": [ { "id": "667afef", "name": "xy", "api_name": "xy", "is_master": "true", "endpoints": [ "http://dns:80" ], "hostnames": [], "hostnames_s3website": [], "master_zone": "fe8ee939", "zones": [ { "id": "fe8ee939", "name": "prod", "endpoints": [ "http://dns:80" ], "log_meta": "false", "log_data": "false", "bucket_index_max_shards": 101, "read_only": "false", "tier_type": "", "sync_from_all": "false", "sync_from": [], "redirect_zone": "" } ], "placement_targets": [ { "name": "default-placement", "tags": [], "storage_classes": [ "STANDARD" ] } ], "default_placement": "default-placement", "realm_id": "234837df" } ], "short_zone_ids": [ { "key": "fe8ee939", "val": 2970845644 } ] }, "master_zonegroup": "667afefc", "master_zone": "fe8ee939", "period_config": { "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 } }, "realm_id": "234837df", "realm_name": "rep", "realm_epoch": 3 } I need to clean these errors before re-add the secondary zone to zonegroup. Do you have any opinion? If I delete old periods what will happen?

2 years, 11 months

1
0
0 0

[v15.2.11] radosgw / RGW crash at start, Segmentation Fault

by Gilles Mocellin

Hello, Since I upgrade to Ceph Octopus v15.2.11, on Ubuntu 18.04, Radosgw crash straight at start. On Two clusters, one Lab, and some test on a production cluster, shows the same crash for radosgw. As I don't find any similar bug in the Tracker, neither in this mailing list... Am I alone ? The logs are : May 07 14:04:59 fidcl-mrs4-sto-sds-07 systemd[1]: Started Ceph rados gateway. May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: *** Caught signal (Segmentation fault) ** May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: in thread 7ff556655140 thread_name:radosgw May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable) May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 1: (()+0x3f040) [0x7ff554e19040] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 2: (std::locale::operator=(std::locale const&)+0x28) [0x55aadc0ade88] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 3: (std::ios_base::imbue(std::locale const&)+0x2e) [0x55aadc14ec3e] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 4: (std::basic_ios<char, std::char_traits<char> >::imbue(std::locale const&)+0x44) [0x55aadc0f5c54] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 5: (std::basic_ostream<char, std::char_traits<char> >& boost::asio::ip::operator<< <char, std::char_trait May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 6: (()+0x40ce0a) [0x7ff555847e0a] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 7: (radosgw_Main(int, char const**)+0x3430) [0x7ff5559c8430] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 8: (__libc_start_main()+0xe7) [0x7ff554dfbbf7] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 9: (_start()+0x2a) [0x55aadc0a836a] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 2021-05-07T14:05:00.309+0200 7ff556655140 -1 *** Caught signal (Segmentation fault) ** May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: in thread 7ff556655140 thread_name:radosgw May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable) May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 1: (()+0x3f040) [0x7ff554e19040] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 2: (std::locale::operator=(std::locale const&)+0x28) [0x55aadc0ade88] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 3: (std::ios_base::imbue(std::locale const&)+0x2e) [0x55aadc14ec3e] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 4: (std::basic_ios<char, std::char_traits<char> >::imbue(std::locale const&)+0x44) [0x55aadc0f5c54] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 5: (std::basic_ostream<char, std::char_traits<char> >& boost::asio::ip::operator<< <char, std::char_trait May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 6: (()+0x40ce0a) [0x7ff555847e0a] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 7: (radosgw_Main(int, char const**)+0x3430) [0x7ff5559c8430] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 8: (__libc_start_main()+0xe7) [0x7ff554dfbbf7] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 9: (_start()+0x2a) [0x55aadc0a836a] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 0> 2021-05-07T14:05:00.309+0200 7ff556655140 -1 *** Caught signal (Segmentation fault) ** May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: in thread 7ff556655140 thread_name:radosgw May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable) May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 1: (()+0x3f040) [0x7ff554e19040] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 2: (std::locale::operator=(std::locale const&)+0x28) [0x55aadc0ade88] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 3: (std::ios_base::imbue(std::locale const&)+0x2e) [0x55aadc14ec3e] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 4: (std::basic_ios<char, std::char_traits<char> >::imbue(std::locale const&)+0x44) [0x55aadc0f5c54] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 5: (std::basic_ostream<char, std::char_traits<char> >& boost::asio::ip::operator<< <char, std::char_trait May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 6: (()+0x40ce0a) [0x7ff555847e0a] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 7: (radosgw_Main(int, char const**)+0x3430) [0x7ff5559c8430] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 8: (__libc_start_main()+0xe7) [0x7ff554dfbbf7] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 9: (_start()+0x2a) [0x55aadc0a836a] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: -3> 2021-05-07T14:05:00.309+0200 7ff556655140 -1 *** Caught signal (Segmentation fault) ** May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: in thread 7ff556655140 thread_name:radosgw May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable) May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 1: (()+0x3f040) [0x7ff554e19040] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 2: (std::locale::operator=(std::locale const&)+0x28) [0x55aadc0ade88] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 3: (std::ios_base::imbue(std::locale const&)+0x2e) [0x55aadc14ec3e] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 4: (std::basic_ios<char, std::char_traits<char> >::imbue(std::locale const&)+0x44) [0x55aadc0f5c54] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 5: (std::basic_ostream<char, std::char_traits<char> >& boost::asio::ip::operator<< <char, std::char_trait May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 6: (()+0x40ce0a) [0x7ff555847e0a] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 7: (radosgw_Main(int, char const**)+0x3430) [0x7ff5559c8430] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 8: (__libc_start_main()+0xe7) [0x7ff554dfbbf7] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: 9: (_start()+0x2a) [0x55aadc0a836a] May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. May 07 14:05:00 fidcl-mrs4-sto-sds-07 systemd[1]: ceph-radosgw(a)rgw.fidcl-mrs4-sto-sds-07.rgw0.service: Main process exited, code=killed, status=11/SEGV May 07 14:05:00 fidcl-mrs4-sto-sds-07 systemd[1]: ceph-radosgw(a)rgw.fidcl-mrs4-sto-sds-07.rgw0.service: Failed with result 'signal'. May 07 14:05:00 fidcl-mrs4-sto-sds-07 systemd[1]: ceph-radosgw(a)rgw.fidcl-mrs4-sto-sds-07.rgw0.service: Service hold-off time over, scheduling restart. May 07 14:05:00 fidcl-mrs4-sto-sds-07 systemd[1]: ceph-radosgw(a)rgw.fidcl-mrs4-sto-sds-07.rgw0.service: Scheduled restart job, restart counter is at 653. May 07 14:05:00 fidcl-mrs4-sto-sds-07 systemd[1]: Stopped Ceph rados gateway. May 07 14:05:00 fidcl-mrs4-sto-sds-07 systemd[1]: ceph-radosgw(a)rgw.fidcl-mrs4-sto-sds-07.rgw0.service: Start request repeated too quickly. May 07 14:05:00 fidcl-mrs4-sto-sds-07 systemd[1]: ceph-radosgw(a)rgw.fidcl-mrs4-sto-sds-07.rgw0.service: Failed with result 'signal'. May 07 14:05:00 fidcl-mrs4-sto-sds-07 systemd[1]: Failed to start Ceph rados gateway.

2 years, 11 months

2
2
0 0

Monitor gets removed from monmap when host down

by contact＠maelgui.fr

Hi everyone, I'm new to ceph, and I'm currently doing some tests with cephadm and a few virtual machines. I've deployed ceph with cephadm on 5 VMs, each one have a 10GB virtual disk attached, and everything is working perfectly. (So 5 osd and 5 monitor in my cluster) However when I turn down a node, wait a few minutes and turn it up again, I was expecting the services to be running again automatically, but this is not happening... The monitor service does not restart. It looks like the monmap gets changed when a node is offline, causing the monitor service to refuse to restart. I need to remove the monitor service with `ceph orch daemon rm <name>` so that a new service is automatically deploy again on this node. Is that the expected behaviour ? Best regards, Maël

2 years, 11 months

1
0
0 0

orch upgrade mgr starts too slow and is terminated?

by Kai Börnert

Hi all, upon updating to 16.2.2 via cephadm the upgrade is being stuck on the first mgr Looking into this via docker logs I see that it is still loading modules when it is apparently terminated and restarted in a loop When pausing the update, the mgr succeeds to start with the new version, however when resuming the update, it seems to try to update it again even tho it already has the new version, leading to the exact same loop. Is there some setting or workaround to increase the time before it is attempted to be redeployed, or can this behavior be caused by something else? Greetings, Kai

2 years, 11 months

3
5
0 0

Re: fixing future rctimes

by David Rivera

Hi guys, Did anyone ever figure out how to fix rctime? I had a directory that was robocopied from a windows host that contained files with modified times in the future. Now the directory tree up to the root will not update rctime. Thanks, David

2 years, 11 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users May 2021