- Sepia - lists.ceph.io

RHEL7.7 available

by David Galloway

RHEL7.7 is now available to run tests on. -- David Galloway Systems Administrator, RDU Ceph Engineering IRC: dgalloway

4 years, 8 months

2
3
0 0

issue with triggering a teuthology test with a branch on teuthology repo

by Rishabh Dave

Hi, To test this teuthology PR[1] using the dummy test here[2], I am issuing following command - teuthology-suite -v -m smithi -c wip-rishabh-master -s fs -p 50 --filter "client-recovery" -t wip-rishabh-rewrite-remote-mktemp I am expecting it to trigger tests with teuthology branch wip-rishabh-rewrite-remote-mktemp. Instead what happens is that I get 2 jobs without the requested teuthology branch and the command's output prints an following error message twice - HTTPError: 500 Server Error: Internal Server Error for url: http://paddles.front.sepia.ceph.com:80/runs/rishabh-2019-08-19_07:58:49-fs-… I've copied the output of the command here[3]. Any ideas what's wrong? [1] https://github.com/ceph/teuthology/pull/1302 [2] https://github.com/ceph/ceph-ci/blob/wip-rishabh-master/qa/tasks/cephfs/tes… [3] https://paste.fedoraproject.org/paste/73Wt-eVCz3TrWAywuV6BZQ Thanks, - Rishabh

4 years, 8 months

2
3
0 0

Paramiko errors

by Brad Hubbard

I updated teuthology yesterday and since then have seen a log of the following errors ...src/teuthology/virtualenv/local/lib/python2.7/site-packages/paramiko/ecdsakey.py:164: CryptographyDeprecationWarning: Support for unsafe construction of public numbers from encoded data will be removed in a future version. Please use EllipticCurvePublicKey.from_encoded_point self.ecdsa_curve.curve_class(), pointinfo 2019-07-31 01:45:18,976.976 ERROR:paramiko.transport:Exception: Error reading SSH protocol banner 2019-07-31 01:45:18,976.976 ERROR:paramiko.transport:Traceback (most recent call last): 2019-07-31 01:45:18,976.976 ERROR:paramiko.transport: File "/home/bhubbard/src/teuthology/virtualenv/local/lib/python2.7/site-packages/paramiko/transport.py", line 1966, in run 2019-07-31 01:45:18,976.976 ERROR:paramiko.transport: self._check_banner() 2019-07-31 01:45:18,977.977 ERROR:paramiko.transport: File "/home/bhubbard/src/teuthology/virtualenv/local/lib/python2.7/site-packages/paramiko/transport.py", line 2143, in _check_banner 2019-07-31 01:45:18,977.977 ERROR:paramiko.transport: "Error reading SSH protocol banner" + str(e) 2019-07-31 01:45:18,977.977 ERROR:paramiko.transport:SSHException: Error reading SSH protocol banner Sometimes these are fatal and sometimes not. Wondering if anyone else has seen them? -- Cheers, Brad

4 years, 9 months

2
9
0 0

Re: New scrub build for LRC

by David Zafman

David G, I've been looking over logs and ceph pg dump pgs on the LRC and things look good to me. If you see anything not working file a tracker or if you have any questions please contact me. There is one thing that you should be aware of. There are still filestore objectstores for some of the OSDs. The auto_repair feature is not supported for filestore, so when they deep-scrub they won't repair. With auto_repair enabled in this mixed cluster the LRC will auto_repair if the primary OSD for a PG is bluestore even if some replicas are filestore. So I would convert the the remaining filestore OSDs to bluestore. If you are paranoid you should disable auto_repair until the conversion is completed. David Z On 7/2/19 3:11 PM, David Zafman wrote: > > I don't see that now in ceph status. A pg's deep scrub would have to > be over 5 days overdue for that warning to occur. > > David > > On 7/2/19 2:29 PM, David Galloway wrote: >> This build is installed now. >> >> It looks like "1 pgs not scrubbed in time" is back. >> >> On 6/28/19 12:27 PM, David Zafman wrote: >>> David, >>> >>> I have a new scrub handling code built for Nautilus. Could we >>> install this on the LRC to see how well it works in a more realistic >>> environment? >>> >>> https://shaman.ceph.com/builds/ceph/wip-zafman-testing-nautilus/31ff31f2c8d… >>> >>> >>> >>> Thanks >>> >>> David Zafman >>>

4 years, 9 months

3
3
0 0

ceph-ci containers

by Sage Weil

Hi everyone, I'd really like to push this through and get something working this week. After poking around it seems clear that there aren't any other registries we should be using for our temp/test builds (unless we just spam dockerhub, but that seems unwise). So, let's just get over ourselves and run our own registry. 1- Where to put it? I'm assuming this should go in the same place that chacra is putting our other temp builds. This is in RDU, right? What machine(s) should we use? If we use the same retention policy as the debs/rpms then this will be an incremental increase in the storage needed. 2- What registry software to use? We don't need any fancy featuers whatsoever--just the ability to push, pull, and delete images. So, whatever is easiest to set up. 3- Jenkins integration. I think we need to have a child job linked to the centos build to do the ceph-container build and then push. Similarly, whatever it is that removes the old packages from the repos needs to also delete the image. 4- Chacra/shaman integration? Should the container build show up in shaman/chacra as well? Is there extra work needed to do that? Thanks! sage

4 years, 10 months

8
14
0 0

Re: [sepia] Testing mgr/rook orchestrator PRs

by Sebastian Wagner

Hi Alfredo, Am 30.04.19 um 15:00 schrieb Alfredo Deza: > On Tue, Apr 30, 2019 at 8:52 AM Sebastian Wagner > <sebastian.wagner(a)suse.com> wrote: >> >> All, >> >> I've been working on exercising some Rook orchestrator commands in an >> automated fashion (like deploying Ceph services). The concept itself >> works pretty well and now I'd like to integrate this into Sepia. >> >> A part of this endeavor was to set up an empty Kubernetes cluster using >> local VMs and Terraform. As Sepia already runs a k8s cluster, it might >> make sense to just use this existing cluster, instead of creating a new >> cluster for every test run. One downside of re-using existing clusters >> is: Only one Test run can access a given cluster at a time and thus >> eliminating some possible parallelism. >> >> There is another bummer: As far as I know, we're not building Ceph >> container images for Ceph PRs and https://hub.docker.com/r/ceph/ceph >> only contains stable Nautilus images. Testing Ceph images automatically >> after they're released to the public isn't going to fly. >> >> Are there any plans to build Ceph container images in Shaman or from >> within Jenkins Jobs? > > This has been discussed in the past, but it is a tremendous effort > which has many moving pieces. Indeed, it is. On the other hand, if we want to make container first-class citizens, is not building them really a viable option? > One of them is where to store the > container images - I don't think it is OK to push > to hub.docker.com since we build about 400 repositories per day. Actually this would be a perfect use case for a private registry. > >> >> Or asked in a different way: Are there any automatically build Octopus >> container images? > > There isn't anything for any release at the moment. Thanks for the clarification. >> >> Best, >> Sebastian >> >> >> >> >> >> >> -- >> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany >> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah, HRB 21284 (AG Nürnberg) > -- SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah, HRB 21284 (AG Nürnberg)

4 years, 12 months

2
2
0 0

Re: [sepia] Testing mgr/rook orchestrator PRs

by Sebastian Wagner

Am 30.04.19 um 15:34 schrieb Alfredo Deza: > On Tue, Apr 30, 2019 at 9:27 AM Sebastian Wagner > <sebastian.wagner(a)suse.com> wrote: >> >> Hi Alfredo, >> >> Am 30.04.19 um 15:00 schrieb Alfredo Deza: >>> On Tue, Apr 30, 2019 at 8:52 AM Sebastian Wagner >>> <sebastian.wagner(a)suse.com> wrote: >>>> >>>> All, >>>> >>>> I've been working on exercising some Rook orchestrator commands in an >>>> automated fashion (like deploying Ceph services). The concept itself >>>> works pretty well and now I'd like to integrate this into Sepia. >>>> >>>> A part of this endeavor was to set up an empty Kubernetes cluster using >>>> local VMs and Terraform. As Sepia already runs a k8s cluster, it might >>>> make sense to just use this existing cluster, instead of creating a new >>>> cluster for every test run. One downside of re-using existing clusters >>>> is: Only one Test run can access a given cluster at a time and thus >>>> eliminating some possible parallelism. >>>> >>>> There is another bummer: As far as I know, we're not building Ceph >>>> container images for Ceph PRs and https://hub.docker.com/r/ceph/ceph >>>> only contains stable Nautilus images. Testing Ceph images automatically >>>> after they're released to the public isn't going to fly. >>>> >>>> Are there any plans to build Ceph container images in Shaman or from >>>> within Jenkins Jobs? >>> >>> This has been discussed in the past >>>, but it is a tremendous effort >>> which has many moving pieces. >> >> Indeed, it is. >> >> On the other hand, if we want to make container first-class citizens, is >> not building them really a viable option? > > I agree with you here, we should be building containers, regardless of > how many repositories we produce a day > >> >>> One of them is where to store the >>> container images - I don't think it is OK to push >>> to hub.docker.com since we build about 400 repositories per day. >> >> Actually this would be a perfect use case for a private registry. > > I agree again here. Would love to see if it there was a > community-based effort for a registry so we could push images. As it > stands right now, our very small team can't possibly > take on running/maintaining another service, much less provide for the > tremendous amount of infrastructure needed. Out of my head, I could think of two alternatives that don't require any new services: 1. Maybe Docker hub could build a nightly image from latest master https://shaman.ceph.com/api/repos/ceph/master/latest/ using a static Dockerfile using a setup described at https://docs.docker.com/docker-hub/builds/ This wouldn't give us tests for PRs, though. 2. Or alternatively, Jenkins could build images locally without pushing them anywhere. (I'm not a big fan of this, as it would require a temporary private container registry while executing the test.) > > >> >>> >>>> >>>> Or asked in a different way: Are there any automatically build Octopus >>>> container images? >>> >>> There isn't anything for any release at the moment. >> >> Thanks for the clarification. >> >>>> >>>> Best, >>>> Sebastia -- SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah, HRB 21284 (AG Nürnberg)

4 years, 12 months

2
3
0 0

Testing mgr/rook orchestrator PRs

by Sebastian Wagner

All, I've been working on exercising some Rook orchestrator commands in an automated fashion (like deploying Ceph services). The concept itself works pretty well and now I'd like to integrate this into Sepia. A part of this endeavor was to set up an empty Kubernetes cluster using local VMs and Terraform. As Sepia already runs a k8s cluster, it might make sense to just use this existing cluster, instead of creating a new cluster for every test run. One downside of re-using existing clusters is: Only one Test run can access a given cluster at a time and thus eliminating some possible parallelism. There is another bummer: As far as I know, we're not building Ceph container images for Ceph PRs and https://hub.docker.com/r/ceph/ceph only contains stable Nautilus images. Testing Ceph images automatically after they're released to the public isn't going to fly. Are there any plans to build Ceph container images in Shaman or from within Jenkins Jobs? Or asked in a different way: Are there any automatically build Octopus container images? Best, Sebastian -- SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany GF: Felix Imendörffer, Mary Higgins, Sri Rasiah, HRB 21284 (AG Nürnberg)

4 years, 12 months

2
1
0 0

python36-cephfs errors (was: Teuthology failures)

by David Galloway

Jos just messaged me about this issue on IRC. Seems maybe it's not user error? Kefu, do you know what's going on here? I only ask you specifically because of this PR: https://github.com/ceph/ceph/pull/23411 -------- Forwarded Message -------- Subject: Re: Teuthology failures Date: Tue, 9 Apr 2019 23:50:12 +0530 From: Sidharth Anupkrishnan <sanupkri(a)redhat.com> To: Patrick Donnelly <pdonnell(a)redhat.com> CC: David Galloway <dgallowa(a)redhat.com>, Kefu Chai <kchai(a)redhat.com> sorry attached the wrong logs. The latest logs(against the master) are: http://pulpito.ceph.com/sidharthanup-2019-04-09_16:33:46-multimds-master-di… On Tue, Apr 9, 2019 at 10:44 PM Sidharth Anupkrishnan <sanupkri(a)redhat.com <mailto:sanupkri@redhat.com>> wrote: I tried testing against master also, still fails.. Here's the logs: http://pulpito.ceph.com/sidharthanup-2019-04-09_12:36:20-multimds-nautilus-… The teuthology-suite command used was: teuthology-suite --machine-type smithi --distro rhel -D 7.6 --flavor default --email sanupkri(a)redhat.com <mailto:sanupkri@redhat.com> -p 9 --suite multimds --ceph master -n 5 --ceph-repo https://github.com/ceph/ceph.git --suite-branch wip-dir-pin-attribute-fail --suite-repo https://github.com/sidharthanup/ceph.git --filter test_exports --limit 2 The same qa runs against nautilus were working yesterday. Only today am i faced with this error. On Tue, Apr 9, 2019 at 9:46 PM Patrick Donnelly <pdonnell(a)redhat.com <mailto:pdonnell@redhat.com>> wrote: A cursory look suggests the problem is just that Sidharth is testing a master QA suite against Nautilus. @Sidharth Anupkrishnan instead do: teuthology-suite ... --ceph-repo https://github.com/ceph/ceph.git --ceph <https://github.com/ceph/ceph.git--ceph> master -n 5 -n 5 allows teuthology-suite to look back 5 merges to look for the most recent version of master that has finished building. You can also use -S <sha1> to pick a recent merge commit that has completed building. On Tue, Apr 9, 2019 at 6:55 AM David Galloway <dgallowa(a)redhat.com <mailto:dgallowa@redhat.com>> wrote: > > Hey Sidharth, > > I looked at the repo that was built for the version of Ceph you're > testing there and it looks like the package name is actually > python36-cephfs. > > See > https://2.chacra.ceph.com/r/ceph/nautilus/c09e90d1847fc4ffdd7384c9adf7f60c1… > > So I went and looked at the code in teuthology and I *think* the package > lists it uses to install packages is provided in the ceph.git repo > somewhere. > > I did find this which adds python3-cephfs: > https://github.com/ceph/ceph/pull/23411 > > But you'll see here, teuthology renames it to 'python34-cephfs': > https://github.com/ceph/teuthology/blob/master/teuthology/task/install/rpm.… > > SO... I'm not sure where exactly to fix this. If we're not supposed to > be building a python36-cephfs package, I guess that'd get fixed in the > spec file? > > If we are supposed to be building a package called python36-cephfs, then > teuthology needs to be patched. > > I've CC'ed Kefu and Patrick so they can take a look and suggest a > resolution. > > On 4/9/19 8:28 AM, Sidharth Anupkrishnan wrote: > > Hey David! > > > > I've run into some errors while testing > > : http://pulpito.ceph.com/sidharthanup-2019-04-09_11:17:44-multimds-nautilus-… > > . Seems the smithi machines cannot install python34-cephfs. Any idea why ? > > > > Regards, > > Sidharth Anupkrishnan -- Patrick Donnelly

5 years

1
0
0 0

git fetch failing on senta02

by Patrick Donnelly

pdonnell@senta02 ~/ceph$ git fetch --all Fetching origin Fetching upstream From https://github.com/ceph/ceph x [deleted] (none) -> upstream/pull/26516/merge error: RPC failed; curl 18 transfer closed with outstanding read data remaining fatal: The remote end hung up unexpectedly error: Could not fetch upstream It's been oddly failing like this for the last two days. Could this be a firewall/IDS issue? -- Patrick Donnelly

5 years, 1 month

1
0
0 0

2024

2023

2022

2021

2020

2019

Sepia