Is something broken here? (From ceph-users)
---------- Forwarded message ---------
From: Konstantin Shalygin <k0ste(a)k0ste.ru>
Date: Mon, Mar 11, 2024 at 6:46 AM
Subject: [ceph-users] Telemetry endpoint down?
To: ceph-users <ceph-users(a)ceph.io>
Hi, seems telemetry endpoint is down for a some days? We have
connection errors from multiple places
1:ERROR Mar 10 00:46:10.653 [564383]: opensock: Could not
establish a connection to telemetry.ceph.com:443
2:ERROR Mar 10 01:48:20.061 [564383]: opensock: Could not
establish a connection to telemetry.ceph.com:443
3:ERROR Mar 10 02:50:29.473 [564383]: opensock: Could not
establish a connection to telemetry.ceph.com:443
4:ERROR Mar 10 03:52:38.877 [564383]: opensock: Could not
establish a connection to telemetry.ceph.com:443
5:ERROR Mar 10 04:54:48.285 [564383]: opensock: Could not
establish a connection to telemetry.ceph.com:443
6:ERROR Mar 10 05:56:57.693 [564383]: opensock: Could not
establish a connection to telemetry.ceph.com:443
7:ERROR Mar 10 06:59:07.105 [564383]: opensock: Could not
establish a connection to telemetry.ceph.com:443
8:ERROR Mar 10 08:01:16.509 [564383]: opensock: Could not
establish a connection to telemetry.ceph.com:443
9:ERROR Mar 10 09:03:25.917 [564383]: opensock: Could not
establish a connection to telemetry.ceph.com:443
<http://telemetry.ceph.com:443/>
Thanks,
k
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
Locked nodes without a description:
$ teuthology-lock --list --status up --locked true -m smithi --all |
jq 'map(select(.locked_since < "2024-03-01")) |
map(select(.description == null)) | map(select(.locked_by !=
"jenkins@jenkins"))' | grep locked_by | sort | uniq -c
1 "locked_by": "akraitma@aklap",
4 "locked_by": "amaredia@teuthology",
1 "locked_by": "kkeithle@teuthology",
1 "locked_by": "yuvalif@teuthology",
4 "locked_by": "zack@teuthology",
If you need them for ongoing work, please add a description so the
node is filtered out of future queries.
If you need a development playground, please consider:
https://wiki.sepia.ceph.com/doku.php?id=devplayground#developer_playgrounds
--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
I will be taking down jenkins.ceph.com for maintenance this evening. I'm
setting the 'no new builds' flag now so that the build queue can drain.
If it's drained by 6PM PST (about 2.5 hours from now) I will start then,
and I'll notify the list when it's back up.
We're experiencing network problems to the community lab (sepia) today
again. There's been another cable cut. Teams from our upstream
internet provider are working on the issue; however, some inbound and
outbound connections are affected and just don't work.
We're at the mercy of the provider's infrastructure. I'll let this list
know when the problem gets fully resolved.
Hey, a lab network switch was fixed last week and 26 Smithi testnodes that
are connected to that switch were recovered.
I found the failed switch by enabling monitoring on the testnodes ipmi ports
We will keep extending monitoring to have an even better view of the lab
--
Adam Kraitman
Systems Administrator
Ceph Engineering
IRC: akraitma
Short summary: have "-debug" terminating your branch name. See:
https://github.com/ceph/ceph-build/pull/2167
and the integration branch helper script change:
https://github.com/ceph/ceph/pull/53855
The benefit for doing this is that mutex debugging will be enabled,
many compiler checks are enabled, and some optimizations will be
disabled (potentially making some debugging easier). One known
drawback will be that execution may be slower.
See also:
https://github.com/ceph/ceph-build/pull/2167#issuecomment-1751033910
There are build failures for CentOS 8 for which I will make tickets soon.
See also:
https://shaman.ceph.com/builds/ceph/wip-batrick-testing-20231006.014828-deb…
If this is shown to not create a lot of fallout in the QA suite
testing, this may be turned on by default without the "-debug" suffix
on branch names. I encourage QA testers to give this a try so any
issues can be shaken out.
--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
At the infrastructure meeting today, we decided on a course of action
for migrating the existing /home directory to CephFS. This is being
done for a few reasons:
- Alleviate load on the root file system device (which is also hosted
on the LRC via iscsi)
- Avoid disk space full scenarios we've regularly hit
- Is more recoverable in the event of teuthology corruption/catastrophe
- Is generally much faster.
- Use as a home file system on other sepia resources (maybe)
To effect this:
- The new "home" CephFS file system is mounted at /cephfs/home
- User's home /home/$USER has been or will be (again) rsync'd to
/cephfs/home/$USER
- User's account "home" (/etc/passwd) is being updated to /cephfs/home/$USER
- User's old home /home/$USER will be archived to /home/.archive/$USER
- A symlink will be placed in /home/$USER pointing to
/cephfs/home/$USER for compatibility with existing
(mis-)configurations.
The main reason for not simply updating /home is to allow
administrators continued access to teuthology in the event of a
Ceph(FS) outage.
Most home directories have already been rsync'd as of 2 weeks ago. A
final rsync will be performed prior to each user's terminal migration.
In order to update a user's home directory, the user must be logged
out. Generally no action need be taken but I may kindly ask you to log
out of teuthology if necessary.
Thanks to Laura Flores, Venky Shankar, Yuri Weinstein, and Leonid Usov
for volunteering as guinea pigs for my early testing. They have
already been migrated. The rest of the users will be migrated in a few
days time incrementally.
--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D