ceph -s reports an error in the ceph-mgr process.
Looking at the logfile I see:
Jun 30 16:07:09 al111 bash: debug 2021-06-30T14:07:09.939+0000 7f2a31d64700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'devicehealth' while running on mgr.al111: 'NoneType' object has no attribute 'get'
Jun 30 16:07:09 al111 bash: debug 2021-06-30T14:07:09.939+0000 7f2a31d64700 -1 devicehealth.serve:
Jun 30 16:07:09 al111 bash: debug 2021-06-30T14:07:09.939+0000 7f2a31d64700 -1 Traceback (most recent call last):
Jun 30 16:07:09 al111 bash: File "/usr/share/ceph/mgr/devicehealth/module.py", line 330, in serve
Jun 30 16:07:09 al111 bash: self.scrape_all()
Jun 30 16:07:09 al111 bash: File "/usr/share/ceph/mgr/devicehealth/module.py", line 390, in scrape_all
Jun 30 16:07:09 al111 bash: self.put_device_metrics(ioctx, device, data)
Jun 30 16:07:09 al111 bash: File "/usr/share/ceph/mgr/devicehealth/module.py", line 477, in put_device_metrics
Jun 30 16:07:09 al111 bash: wear_level = get_ata_wear_level(data)
Jun 30 16:07:09 al111 bash: File "/usr/share/ceph/mgr/devicehealth/module.py", line 33, in get_ata_wear_level
Jun 30 16:07:09 al111 bash: if page.get("number") != 7:
Jun 30 16:07:09 al111 bash: AttributeError: 'NoneType' object has no attribute 'get'
This is a containerized Ceph 16.4.2.
What is happening here?
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin
Tel: 030 / 405051-43
Fax: 030 / 405051-19
Amtsgericht Berlin-Charlottenburg - HRB 93818 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
If I set in a running cluster the rgw_bucket_default_quota_max_objects with the ceph config set (mgr/mon not sure which to make it global) rgw_bucket_default_quota_max_objects 1000000 will it overwrite the existing special buckets where I've already set higher values? Or keep in untouched?
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
I know that this might be a dumb question, but asking just for future
If *bluestore_min_alloc_size* is set to *64KB*, will a *70KB* object be
stored as *128KB* or *70KB*?
> I’ve been running a staging Ceph environment on CentOS 7/Nautilus for quite a while now. Because of many good reasons that you can probably guess, I am currently trying to move this staging environment to Octopus on Ubuntu 20.04.2.
What made you decide to chose Ubuntu and not this rocky Linux or centos8 stream?
It may be Bug 50556 <https://tracker.ceph.com/issues/50556>. I am having
this problem, although I don't think that characters in the bucket name
Backport 51001 <https://tracker.ceph.com/issues/51001>has just been
updated so looks as though it will be in 16.2.5.
At a glance your symptoms sound similar but I'm not sure if the crash
info is the same.
On 29/06/2021 22:35, Chu, Vincent wrote:
> Hi, I'm running into an issue with RadosGW where multipart uploads crash, but only on buckets with a hyphen, period or underscore in the bucket name and with a bucket policy applied. We've tested this in pacific 16.2.3 and pacific 16.2.4.
> Anyone run into this before?
> ubuntu@ubuntu:~/ubuntu$ aws --endpointhttp://placeholder.com:7480 s3 cp ubuntu.iso s3://bucket.test
> upload failed: ./ubuntu.iso to s3://bucket.test/ubuntu.iso Connection was closed before we received a valid response from endpoint URL:"http://placeholder.com:7480/bucket.test/ubuntu.iso?uploads".
> Here is the crash log.
> -12> 2021-06-29T20:44:10.940+0000 7fae1f4ec700 1 ====== starting new request req=0x7fadf8998620 =====
> -11> 2021-06-29T20:44:10.940+0000 7fae1f4ec700 2 req 2403 0.000000000s initializing for trans_id = tx000000000000000000963-0060db861a-17e77ee-default
> -10> 2021-06-29T20:44:10.940+0000 7fae1f4ec700 2 req 2403 0.000000000s getting op 4
> -9> 2021-06-29T20:44:10.940+0000 7fae1f4ec700 2 req 2403 0.000000000s s3:init_multipart verifying requester
> -8> 2021-06-29T20:44:10.948+0000 7fae1f4ec700 2 req 2403 0.008000608s s3:init_multipart normalizing buckets and tenants
> -7> 2021-06-29T20:44:10.948+0000 7fae1f4ec700 2 req 2403 0.008000608s s3:init_multipart init permissions
> -6> 2021-06-29T20:44:10.954+0000 7faedf66c700 0 Supplied principal is discarded: arn:aws:iam::default:user
> -5> 2021-06-29T20:44:10.954+0000 7faedf66c700 2 req 2403 0.014001064s s3:init_multipart recalculating target
> -4> 2021-06-29T20:44:10.954+0000 7faedf66c700 2 req 2403 0.014001064s s3:init_multipart reading permissions
> -3> 2021-06-29T20:44:10.954+0000 7faedf66c700 2 req 2403 0.014001064s s3:init_multipart init op
> -2> 2021-06-29T20:44:10.954+0000 7faedf66c700 2 req 2403 0.014001064s s3:init_multipart verifying op mask
> -1> 2021-06-29T20:44:10.955+0000 7faedf66c700 2 req 2403 0.015001140s s3:init_multipart verifying op permissions
> 0> 2021-06-29T20:44:10.964+0000 7faedf66c700 -1 *** Caught signal (Segmentation fault) **
> in thread 7faedf66c700 thread_name:radosgw
> ceph version 16.2.3 (381b476cb3900f9a92eb95d03b4850b953cfd79a) pacific (stable)
> 1: /lib64/libpthread.so.0(+0x12b20) [0x7faf2dd05b20]
> 2: (rgw_bucket::rgw_bucket(rgw_bucket const&)+0x23) [0x7faf38b4d083]
> 3: (rgw::sal::RGWObject::get_obj() const+0x20) [0x7faf38b7bcf0]
> 4: (RGWInitMultipart::verify_permission(optional_yield)+0x6c) [0x7faf38e6608c]
> 5: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, optional_yield, bool)+0x86a) [0x7faf38b2db1a]
> 6: (process_request(rgw::sal::RGWRadosStore*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*, optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*, int*)+0x26dd) [0x7faf38b3232d]
> 7: /lib64/libradosgw.so.2(+0x4a1c0b) [0x7faf38a83c0b]
> 8: /lib64/libradosgw.so.2(+0x4a36a4) [0x7faf38a856a4]
> 9: /lib64/libradosgw.so.2(+0x4a390e) [0x7faf38a8590e]
> 10: make_fcontext()
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
> Vincent Chu
> A-4: Advanced Research in Cyber Systems
> Los Alamos National Laboratory
> ceph-users mailing list --ceph-users(a)ceph.io
> To unsubscribe send an email toceph-users-leave(a)ceph.io
I have set up a multisite. Data (pools, buckets,users) from the master zone
is synchronized to the secondary zone
on the master
radosgw-admin sync status
realm 2194d6d2-0df4-400c-be8b-71dc74405ec2 (multisite-realm)
zonegroup b41c6159-16e5-456f-a4e1-fb3dd280158f ((multisite-zg)
zone 6843d5b5-5c6f-4ea3-a85b-12d8c3d58af8 (zone1)
metadata sync no sync (zone is master)
2021-06-29T21:18:23.981+0300 7f2755878040 0 data sync zone:ca179040 ERROR:
failed to fetch datalog info
data sync source: ca179040-683f-4d9f-b8cc-00c7872dda35 (zone2) failed to
retrieve sync info: (13) Permission denied
radosgw-admin sync --rgw-realm=multisite-realm status
realm 2194d6d2-0df4-400c-be8b-71dc74405ec2 (multisite-realm)
zonegroup b41c6159-16e5-456f-a4e1-fb3dd280158f (multisite-zg)
zone ca179040-683f-4d9f-b8cc-00c7872dda35 (zone2)
metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
data sync source: 6843d5b5-5c6f-4ea3-a85b-12d8c3d58af8 (zone1)
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
access key and secret key is the same in both zones
ceph version octopus 15.2.13
Any ideas =)
I’ve been running a staging Ceph environment on CentOS 7/Nautilus for quite a while now. Because of many good reasons that you can probably guess, I am currently trying to move this staging environment to Octopus on Ubuntu 20.04.2.
Since I’m trying to keep the data, but don’t mind downtime at all, my plan was to reinstall one server at a time, removing them from the cluster and adding them back to the cluster, since Octopus and Nautilus should still be compatible with each other. I started with one monitor and now I’m stuck in a weird state. Essentially :
-Monitors see each other and are establishing connections to each other.
-Monitor clocks are synchronized
-Monmap was injected in the reinstalled monitor
-New monitor is recognized in Ceph -s but stuck out of the quorum
Is there something that could prevent a new monitor from establishing quorum if the monmap is the same, the clock is synchronized and it can contact other monitors on the network?
Senior Openstack system administrator
Administrateur système Openstack sénior
Hi- I recently upgraded to pacific, and I am now getting an error connecting on my windows 10 machine:
The error is the handle_auth_bad_method, I tried a few combinations of cephx,none on the monitors, but I keep getting the same error.
The same config(With paths updated) and key ring works on my WSL instance running an old luminous client (I can't seem to get it to install a newer client )
Do you have any suggestions on where to look?
PS C:\Program Files\Ceph\bin> .\ceph-dokan.exe --id rob -l Q
2021-05-14T12:19:58.172Eastern Daylight Time 5 -1 monclient(hunting): handle_auth_bad_method server allowed_methods  but i only support 
failed to fetch mon config (--no-mon-config to skip)
PS C:\Program Files\Ceph\bin> cat c:/ProgramData/ceph/ceph.client.rob.keyring
key = <REDACTED>
caps mon = "allow rwx"
caps osd = "allow rwx"
PS C:\Program Files\Ceph\bin> cat C:\ProgramData\Ceph\ceph.conf
# minimal ceph.conf
log to stderr = true
; Uncomment the following in order to use the Windows Event Log
log to syslog = true
run dir = C:/ProgramData/ceph/out
crash dir = C:/ProgramData/ceph/out
; Use the following to change the cephfs client log level
debug client = 2
fsid = <redacted>
mon_host = [<redacted>]
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
keyring = c:/ProgramData/ceph/ceph.client.rob.keyring
log file = C:/ProgramData/ceph/out/$name.$pid.log
admin socket = C:/ProgramData/ceph/out/$name.$pid.asok
I'm currently trying to setup the OSDs in a fresh cluster using cephadm.
The underlying devices are NVMe and I'm trying to provision 2 OSDs per device
with the following spec:
Since `ceph orch apply` just creates the service but not the daemons when `unmanaged: True`.
Is there a way to enfore the `osds_per_device` setting when using `ceph orch daemon add osd`?
Thanks in advance