Oops.. sent this to the "wrong" list previously. (lists.ceph.com)
Lets try the proper one this time :-/
not sure if this an actual bug or I'm doing something else wrong. but in Octopus, I have on the master node
# ceph --version
ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable)
# tail -3 /etc/ceph/ceph.conf
[client]
rbd cache = false
rbd cache writethrough until flush = false
but I have restarted ALL nodes.. and yet rbd cache is still on.
# ceph --admin-daemon `find /var/run/ceph -name 'ceph-mon*'` config show |grep rbd_cache
"rbd_cache": "true",
What else am I supposed to do???
--
Philip Brown| Sr. Linux System Administrator | Medata, Inc.
5 Peters Canyon Rd Suite 250
Irvine CA 92606
Office 714.918.1310| Fax 714.918.1325
pbrown(a)medata.com| www.medata.com
Hi all,
the rados_stat() function has a TODO in the comments:
* TODO: when are these set, and by whom? can they be out of date?
Can anyone help with this? How reliably is the pmtime updated? Is there a minimum update interval?
Thank you,
Peter
Hello.
Had been someone starting using namespaces for real production for
multi-tenancy?
How good is it at isolating tenants from each other? Can they see each
other presence, quotas, etc?
Is is safe to give access via cephx to (possibly hostile to each other)
users to the same pool with restrictions 'user per namespace'?
How badly can one user affect others? Quotas restrict space overuse, but
what about IO and omaps overuse?
Hi all,
in various Rook operated Ceph clusters I have seen OSDs going into a CrashLoop due to
debug 2020-12-16T13:19:25.500+0000 7fc4c3f13f40 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1608124765507105, "job": 1, "event": "recovery_started", "log_files": [1400, 1402]}
debug 2020-12-16T13:19:25.500+0000 7fc4c3f13f40 4 rocksdb: [db/db_impl_open.cc:583] Recovering log #1400 mode 0
debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40 1 bluefs _allocate failed to allocate 0x43ce43d on bdev 1, free 0x2e50000; fallback to bdev 2
debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40 1 bluefs _allocate unable to allocate 0x43ce43d on bdev 2, free 0xffffffffffffffff; fallback to slow device expander
debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40 -1 bluestore(/var/lib/ceph/osd/ceph-1) allocate_bluefs_freespace failed to allocate on 0x3d1b0000 min_size 0x43d0000 > allocated total 0x300000 bluefs_shared_alloc_size 0x10000 allocated 0x300000 available 0x b019c8000
debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40 -1 bluefs _allocate failed to expand slow device to fit +0x43ce43d
debug 2020-12-16T13:19:27.724+0000 7fc4c3f13f40 -1 bluefs _flush_range allocated: 0x0 offset: 0x0 length: 0x43ce43d
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.6/rpm/el8/BUILD/ceph-15.2.6/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 7fc4c3f13f40 time 2020-12-16T13:19:27.731533+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.6/rpm/el8/BUILD/ceph-15.2.6/src/os/bluestore/BlueFS.cc: 2721: ceph_abort_msg("bluefs enospc")
The OSD is not really full:
# ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-1
inferring bluefs devices from bluestore path
1 : device size 0x18ffc00000 : own 0x[bffe10000~fffe0000] = 0xfffe0000 : using 0xfd190000(4.0 GiB) : bluestore has 0x82b360000(33 GiB) available
Expanding DB/WAL...
Expanding the underlying block device by just 1 gig followed by "ceph-bluestore-tool bluefs-bdev-expand" and "ceph-bluestore-tool repair" resolves the situation. In general, larger OSDs seem to reduce the likeliness for this issue.
Ceph version is v15.2.6.
Is this a known bug?
Ceph report and logs are attached.
Thanks for your help
Stephan
Hi all,
Have an issue with my three monitors, they keep getting " e3
handle_auth_request failed to assign global_id" errors, subsequently,
commands like 'ceph status' just hang. Any ideas on what the errors means?
many thanks
Darrin
--
CONFIDENTIALITY NOTICE: This email is intended for the named recipients only. It may contain privileged, confidential or copyright information. If you are not the named recipients, any use, reliance upon, disclosure or copying of this email or any attachments is unauthorised. If you have received this email in error, please reply via email or telephone +61 2 8004 5928.
Hi Cephers,
I'm using VSCode remote development with a docker server. It worked OK
but fails to start the debugger after /root mounted by ceph-fuse. The
log shows that the binary passes access X_OK check but cannot be
actually executed. see:
```
strace_log: access("/root/.vscode-server/extensions/ms-vscode.cpptools-1.1.3/debugAdapters/OpenDebugAD7",
X_OK) = 0
root@develop:~# ls -alh
.vscode-server/extensions/ms-vscode.cpptools-1.1.3/debugAdapters/OpenDebugAD7
-rw-r--r-- 1 root root 978 Dec 10 13:06
.vscode-server/extensions/ms-vscode.cpptools-1.1.3/debugAdapters/OpenDebugAD7
```
I also test the access syscall on ext4, xfs and even cephfs kernel
client, all of them return -EACCES, which is expected (the extension
will then explicitly call chmod +x).
After some digging in the code, I found it is probably caused by
https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L5549-L5550.
So here come two questions:
1. Is this a bug or is there any concern I missed?
2. It works again with fuse_default_permissions=true, any drawbacks if
this option is set?
We're happy to announce the fourth bugfix release in the Octopus series.
In addition to a security fix in RGW, this release brings a range of fixes
across all components. We recommend that all Octopus users upgrade to this
release. For a detailed release notes with links & changelog please
refer to the official blog entry at https://ceph.io/releases/v15-2-4-octopus-released
Notable Changes
---------------
* CVE-2020-10753: rgw: sanitize newlines in s3 CORSConfiguration's ExposeHeader
(William Bowling, Adam Mohammed, Casey Bodley)
* Cephadm: There were a lot of small usability improvements and bug fixes:
* Grafana when deployed by Cephadm now binds to all network interfaces.
* `cephadm check-host` now prints all detected problems at once.
* Cephadm now calls `ceph dashboard set-grafana-api-ssl-verify false`
when generating an SSL certificate for Grafana.
* The Alertmanager is now correctly pointed to the Ceph Dashboard
* `cephadm adopt` now supports adopting an Alertmanager
* `ceph orch ps` now supports filtering by service name
* `ceph orch host ls` now marks hosts as offline, if they are not
accessible.
* Cephadm can now deploy NFS Ganesha services. For example, to deploy NFS with
a service id of mynfs, that will use the RADOS pool nfs-ganesha and namespace
nfs-ns::
ceph orch apply nfs mynfs nfs-ganesha nfs-ns
* Cephadm: `ceph orch ls --export` now returns all service specifications in
yaml representation that is consumable by `ceph orch apply`. In addition,
the commands `orch ps` and `orch ls` now support `--format yaml` and
`--format json-pretty`.
* Cephadm: `ceph orch apply osd` supports a `--preview` flag that prints a preview of
the OSD specification before deploying OSDs. This makes it possible to
verify that the specification is correct, before applying it.
* RGW: The `radosgw-admin` sub-commands dealing with orphans --
`radosgw-admin orphans find`, `radosgw-admin orphans finish`, and
`radosgw-admin orphans list-jobs` -- have been deprecated. They have
not been actively maintained and they store intermediate results on
the cluster, which could fill a nearly-full cluster. They have been
replaced by a tool, currently considered experimental,
`rgw-orphan-list`.
* RBD: The name of the rbd pool object that is used to store
rbd trash purge schedule is changed from "rbd_trash_trash_purge_schedule"
to "rbd_trash_purge_schedule". Users that have already started using
`rbd trash purge schedule` functionality and have per pool or namespace
schedules configured should copy "rbd_trash_trash_purge_schedule"
object to "rbd_trash_purge_schedule" before the upgrade and remove
"rbd_trash_purge_schedule" using the following commands in every RBD
pool and namespace where a trash purge schedule was previously
configured::
rados -p <pool-name> [-N namespace] cp rbd_trash_trash_purge_schedule rbd_trash_purge_schedule
rados -p <pool-name> [-N namespace] rm rbd_trash_trash_purge_schedule
or use any other convenient way to restore the schedule after the
upgrade.
Getting Ceph
------------
* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-14.2.10.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: 7447c15c6ff58d7fce91843b705a268a1917325c
--
David Galloway
Systems Administrator, RDU
Ceph Engineering
IRC: dgalloway
Dear All,
We have a 38 node HP Apollo cluster with 24 3.7T Spinning disk and 2 NVME
for journal. This is one of our 13 clusters which was upgraded from
Luminous to Nautilus (14.2.11). When one of our openstack customers uses
elastic search (they offer Logging as a Service) to their end users
reported IO latency issues, our SME rebooted two nodes that he felt were
doing memory leak. The reboot didn't help rather worsen the situation and
he went ahead and recycled the entire cluster one node a time as to fix the
slow ops reported by OSDs. This caused a huge issue and MONS were not
able to withstand the spam and started crashing.
1) We audited the network (inspecting TOR, iperf, MTR) and nothing was
indicating any issue but OSD logs were keep complaining about
BADAUTHORIZER
2020-12-13 15:32:31.607 7fea5e3a2700 0 --1- 10.146.126.200:0/464096978 >>
v1:10.146.127.122:6809/1803700 conn(0x7fea3c1ba990 0x7fea3c1bf600 :-1
s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2
connect got BADAUTHORIZER
2020-12-13 15:32:31.607 7fea5e3a2700 0 --1- 10.146.126.200:0/464096978 >>
v1:10.146.127.122:6809/1803700 conn(0x7fea3c1c1e20 0x7fea3c1bcdf0 :-1
s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2
connect got BADAUTHORIZER
2020-12-13 15:32:31.607 7fea5e3a2700 0 --1- 10.146.126.200:0/464096978 >>
v1:10.146.127.122:6809/1803700 conn(0x7fea3c1ba990 0x7fea3c1bf600 :-1
s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2
connect got BADAUTHORIZER
2) Made sure no clock skew and we use timesyncd. After taking out a
couple of OSDs that were indicating slow in the ceph health response the
situation didn't improve. After 3 days of troubleshooting we upgraded the
MONS to 14.2.15 and seems the situation improved a little but still
reporting 61308 slow ops which we really struggled to isolate with bad
OSDs as moving a couple of them didn't improve. One of the MON(2) failed
to join the cluster and always doing compact and never was able to join
(see the size below). I suspect that could be because the key value store
information between 1 and 3 is not up to date with 2. At times, we had
to stop and start to compress to get a better response from Ceph MON
(keeping them running in one single MON).
root@pistoremon-as-c01:~# du -sh /var/lib/ceph/mon
391G /var/lib/ceph/mon
root@pistoremon-as-c03:~# du -sh /var/lib/ceph/mon
337G /var/lib/ceph/mon
root@pistoremon-as-c02:~# du -sh /var/lib/ceph/mon
13G /var/lib/ceph/mon
cluster:
id: bac20301-d458-4828-9dd9-a8406acf5d0f
health: HEALTH_WARN
noout,noscrub,nodeep-scrub flag(s) set
1 pools have many more objects per pg than average
10969 pgs not deep-scrubbed in time
46 daemons have recently crashed
61308 slow ops, oldest one blocked for 2572 sec, daemons
[mon.pistoremon-as-c01,mon.pistoremon-as-c03] have slow ops.
mons pistoremon-as-c01,pistoremon-as-c03 are using a lot of
disk space
1/3 mons down, quorum pistoremon-as-c01,pistoremon-as-c03
services:
mon: 3 daemons, quorum pistoremon-as-c01,pistoremon-as-c03 (age 52m),
out of quorum: pistoremon-as-c02
mgr: pistoremon-as-c01(active, since 2h), standbys: pistoremon-as-c03,
pistoremon-as-c02
osd: 911 osds: 888 up (since 68m), 888 in
flags noout,noscrub,nodeep-scrub
rgw: 2 daemons active (pistorergw-as-c01, pistorergw-as-c02)
task status:
data:
pools: 17 pools, 32968 pgs
objects: 62.98M objects, 243 TiB
usage: 748 TiB used, 2.4 PiB / 3.2 PiB avail
pgs: 32968 active+clean
io:
client: 56 MiB/s rd, 95 MiB/s wr, 1.78k op/s rd, 4.27k op/s wr
3) When looking through ceph.log on the mon with tailf, I was getting a lot
of different time stamp reported in the ceph logs in MON1 which is master.
Confused on why the live log report various timestamps?
stat,write 2166784~4096] snapc 0=[] ondisk+write+known_if_redirected
e951384) initiated 2020-12-13 06:16:58.873964 currently delayed
2020-12-13 06:39:37.169504 osd.1224 (osd.1224) 325855 : cluster [WRN] slow
request osd_op(client.461445583.0:8881223 1.16aa
1:55684db0:::rbd_data.9ede65fc7af15.0000000000000000:head [stat,write
3547136~4096] snapc 0=[] ondisk+write+known_if_redirected e951384)
initiated 2020-12-13 06:16:59.082012 currently delayed
2020-12-13 06:39:37.169510 osd.1224 (osd.1224) 325856 : cluster [WRN] slow
request osd_op(client.461445583.0:8881191 1.16aa
1:55684db0:::rbd_data.9ede65fc7af15.0000000000000000:head [stat,write
2314240~4096] snapc 0=[] ondisk+write+known_if_redirected e951384)
initiated 2020-12-13 06:16:58.874031 currently delayed
2020-12-13 06:39:37.169513 osd.1224 (osd.1224) 325857 : cluster [WRN] slow
request osd_op(client.461445583.0:8881224 1.16aa
1:55684db0:::rbd_data.9ede65fc7af15.0000000000000000:head [stat,write
3571712~8192] snapc 0=[] ondisk+write+known_if_redirected e951384)
initiated 2020-12-13 06:16:59.082094 currently delayed
^C
root@pistoremon-as-c03:~# date
Tue Dec 15 20:12:02 UTC 2020
root@pistoremon-as-c03:~# timedatectl
Local time: Tue 2020-12-15 20:12:04 UTC
Universal time: Tue 2020-12-15 20:12:04 UTC
RTC time: Tue 2020-12-15 20:12:04
Time zone: Etc/UTC (UTC, +0000)
Network time on: yes
NTP synchronized: yes
RTC in local TZ: no
4) Keep seeing the error below and assume that's due to the size of the MON
DB with high workload the clients are not able to fetch the mon map ?
2020-12-13 13:16:56.197 7f946cddc700 -1 monclient: get_monmap_and_config
failed to get config
2020-12-13 13:17:00.613 7f946cddc700 -1 monclient: get_monmap_and_config
failed to get config
We are getting all sorts of errors with primary concern on how to get the
slow ops down and improve the IO latency. MONs using the SSD for the
storage with NOOP scheduler with nr_request (128).
Please provide some guidance on all 4 issues reported above. Thank you for
your valueable time.
--
Regards,
Suresh