I am bumping this email to hopefully get some more eyes on it.
We are continuing to have this problem. Unfortunately the cluster is very lightly used currently until we go full production so we do not have the level of traffic that would generate a lot of statistics.
We did update to 14.2.16 from 14.2.10 on Feb 1, 2021 and this seems to correlate with when the errors started popping up.
Our current plan is to roll back the version to 14.2.10 again and rerun the test that causes the issue.
I noted there was another email thread regarding latencies for a user who also updated to 14.2.16 recently and I'm not sure if this could be related or not to my issue.
Any suggestions you may have are very welcomed.
Cheers,
--
Mike Cave
On 2021-02-11, 8:37 AM, "Mike Cave" <mcave(a)uvic.ca> wrote:
So, as the subject states I have an issue with buckets returning a 404 error when they are listed immediately after being created; as well the bucket fails to be deleted if you try to delete it immediately after creation.
The behaviour is intermittent.
If I leave the bucket in place for a few minutes, the bucket behaves normally. I’m thinking this is a metadata issue or something along those lines but I’m out of my depth now.
To the best of our knowledge the cluster has not changed in any way since the same tests were run in December with no errors.
We are running Ceph 14.2.16 on all parts of the cluster.
I am using the python-swift client for the connection on a CentOS7 machine.
Can replicate the results from the mons or an external client as well.
I’m willing to share my test script as well if you would like to see how I’m generating the error.
Here is a piece of the logs in case I missed something in the interpretation (log level at 20):
14:23:17.069 7faba00df700 1 ====== starting new request req=0x55fb7a138700 =====
14:23:17.069 7faba00df700 2 req 148 0.000s initializing for trans_id = tx000000000000000000094-0060245cd5-2b8949-default
14:23:17.069 7faba00df700 10 rgw api priority: s3=8 s3website=7
14:23:17.069 7faba00df700 10 host=<NameRemoved>
14:23:17.069 7faba00df700 20 subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0
14:23:17.069 7faba00df700 -1 res_query() failed
14:23:17.069 7faba00df700 20 final domain/bucket subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 s->info.domain= s->info.request_uri=/swift/v1/404test
14:23:17.069 7faba00df700 10 ver=v1 first=404test req=
14:23:17.069 7faba00df700 10 handler=28RGWHandler_REST_Bucket_SWIFT
14:23:17.069 7faba00df700 2 req 148 0.000s getting op 2
14:23:17.069 7faba00df700 10 req 148 0.000s swift:delete_bucket scheduling with dmclock client=3 cost=1
14:23:17.069 7faba00df700 10 op=30RGWDeleteBucket_ObjStore_SWIFT
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket verifying requester
14:23:17.069 7faba00df700 20 req 148 0.000s swift:delete_bucket rgw::auth::swift::DefaultStrategy: trying rgw::auth::swift::TempURLEngine
14:23:17.069 7faba00df700 20 req 148 0.000s swift:delete_bucket rgw::auth::swift::TempURLEngine denied with reason=-13
14:23:17.069 7faba00df700 20 req 148 0.000s swift:delete_bucket rgw::auth::swift::DefaultStrategy: trying rgw::auth::swift::SignedTokenEngine
14:23:17.069 7faba00df700 10 req 148 0.000s swift:delete_bucket swift_user=xmcc:swift
14:23:17.069 7faba00df700 20 build_token token=0a000000786d63633a73776966748960ea4653df708a55ae2560e58acf01
14:23:17.069 7faba00df700 20 req 148 0.000s swift:delete_bucket rgw::auth::swift::SignedTokenEngine granted access
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket normalizing buckets and tenants
14:23:17.069 7faba00df700 10 s->object=<NULL> s->bucket=404test
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket init permissions
14:23:17.069 7faba00df700 20 get_system_obj_state: rctx=0x55fb7a137770 obj=default.rgw.meta:root:404test state=0x55fb7a060ac0 s->prefetch_data=0
14:23:17.069 7faba00df700 10 cache get: name=default.rgw.meta+root+404test : hit (negative entry)
14:23:17.069 7faba00df700 20 get_system_obj_state: rctx=0x55fb7a137130 obj=default.rgw.meta:users.uid:xmcc state=0x55fb7a060f40 s->prefetch_data=0
14:23:17.069 7faba00df700 10 cache get: name=default.rgw.meta+users.uid+xmcc : hit (requested=0x6, cached=0x17)
14:23:17.069 7faba00df700 20 get_system_obj_state: s->obj_tag was set empty
14:23:17.069 7faba00df700 20 Read xattr: user.rgw.idtag
14:23:17.069 7faba00df700 20 get_system_obj_state: rctx=0x55fb7a137130 obj=default.rgw.meta:users.uid:xmcc state=0x55fb7a060f40 s->prefetch_data=0
14:23:17.069 7faba00df700 10 cache get: name=default.rgw.meta+users.uid+xmcc : hit (requested=0x6, cached=0x17)
14:23:17.069 7faba00df700 20 get_system_obj_state: s->obj_tag was set empty
14:23:17.069 7faba00df700 20 Read xattr: user.rgw.idtag
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket recalculating target
14:23:17.069 7faba00df700 10 Starting retarget
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket reading permissions
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket init op
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket verifying op mask
14:23:17.069 7faba00df700 20 req 148 0.000s swift:delete_bucket required_mask= 4 user.op_mask=7
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket verifying op permissions
14:23:17.069 7faba00df700 20 req 148 0.000s swift:delete_bucket -- Getting permissions begin with perm_mask=50
14:23:17.069 7faba00df700 5 req 148 0.000s swift:delete_bucket Searching permissions for identity=rgw::auth::ThirdPartyAccountApplier() -> rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=xmcc, acct_name=xmcc, subuser=swift, perm_mask=15, is_admin=0) mask=50
14:23:17.069 7faba00df700 5 Searching permissions for uid=xmcc
14:23:17.069 7faba00df700 5 Found permission: 15
14:23:17.069 7faba00df700 5 Searching permissions for group=1 mask=50
14:23:17.069 7faba00df700 5 Permissions for group not found
14:23:17.069 7faba00df700 5 Searching permissions for group=2 mask=50
14:23:17.069 7faba00df700 5 Permissions for group not found
14:23:17.069 7faba00df700 5 req 148 0.000s swift:delete_bucket -- Getting permissions done for identity=rgw::auth::ThirdPartyAccountApplier() -> rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=xmcc, acct_name=xmcc, subuser=swift, perm_mask=15, is_admin=0), owner=xmcc, perm=2
14:23:17.069 7faba00df700 10 req 148 0.000s swift:delete_bucket identity=rgw::auth::ThirdPartyAccountApplier() -> rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=xmcc, acct_name=xmcc, subuser=swift, perm_mask=15, is_admin=0) requested perm (type)=2, policy perm=2, user_perm_mask=2, acl perm=2
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket verifying op params
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket pre-executing
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket executing
14:23:17.069 7faba00df700 0 req 148 0.000s swift:delete_bucket ERROR: bucket 404test not found
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket completing
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket op status=-2002
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket http status=404
14:23:17.069 7faba00df700 1 ====== req done req=0x55fb7a138700 op status=-2002 http_status=404 latency=0s ======
--
Mike Cave
I acknowledge and respect the Lekwungen-speaking Peoples on whose traditional territories the university stands and the Songhees, Esquimalt and WSANEC peoples whose historical relationships with the land continue to this day.
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
Dear cephers,
I was doing some maintenance yesterday involving shutdown-power up cycles of ceph servers. With the last server I run into a problem. The server runs an MDS and a couple of OSDs. After reboot, the MDS joined the MDS cluster without problems, but the OSDs didn't come up. This was 1 out of 12 servers and I had no such problems with the other 11. I also observed that "ceph status" was responding very slow.
Upon further inspection, I found out that 2 of my 3 MONs (the leader and a peon) were running at 100% CPU. Client I/O was continuing, probably because the last cluster map remained valid. On our node performance monitoring I could see that the 2 busy MONs were showing extraordinary network activity.
This state lasted for over one hour. After the MONs settled down, the OSDs finally joined as well and everything went back to normal.
The other instance I have seen similar behaviour was, when I restarted a MON on an empty disk and the re-sync was extremely slow due to a too large value for mon_sync_max_payload_size. This time, I'm pretty sure it was MON-client communication; see below.
Are there any settings similar to mon_sync_max_payload_size that could influence responsiveness of MONs in a similar way?
Why do I suspect it is MON-client communication? In our monitoring, I do not see the huge amount of packages sent by the MONs arriving at any other ceph daemon. They seem to be distributed over client nodes, but since we have a large count of client nodes (>550) this is covered by the background network traffic. A second clue is that I have had such extended lock-ups before and, whenever I checked, I only observed these in case the leader had a large share of client sessions.
For example, yesterday the client session count per MON was:
ceph-01: 1339 (leader)
ceph-02: 189 (peon)
ceph-03: 839 (peon)
I usually restart the leader when such a critical distribution occurs. As long as the leader has the fewest client sessions, I never observe this problem.
Ceph version is 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable).
Thanks for any clues!
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hi everyone,
Ceph will be present at DevConf.CZ, February 18-20 in a joint booth
with the Rook Community!
https://www.devconf.cz
If you're interested in more information about being present at the
booth to provide expertise/content/presentations to our audience,
please let me know privately.
--
Mike Perez
Hi there,
we are in the process of growing our Nautilus ceph cluster. Currently,
we have 6 nodes, 3 nodes with 2×5.5TB, 6x11TB disks and 8x186GB SSD and
3 nodes with 6×5.5TB and 6×7.5TB disks. All with dual link 10GE NICs.
The SSDs are used for the CephFS metadata pool, the hard drives are
used for the CephFS data pool. All OSD journals are kept on the drives
themselves. Replication level is 3 for both data and metadata pools.
The new servers have 12x12TB disks and 1 1.5TB NVMe drive. We expect to
get another 3 similar nodes in the near future.
My question is what is the most sensible thing to do with the NVMe
drives. I would like to increase the replication level of the metadata
pool. So my idea was to split the NVMes into say 4 partitions and add
them to the metadata pool.
Given the size of the drives and the metadata pool usage (~35GB) that
seems overkill. Would it make sense to partition the drives further and
stick the OSD journals on the NVMEs?
Regards
magnus
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
For the past several months I had been building a sizable Ceph cluster that
will be up to 10PB with between 20 and 40 OSD servers this year.
A few weeks ago I was informed that SUSE is shutting down SES and will no
longer be selling it. We haven't licensed our proof of concept cluster
that is currently at 14 OSD nodes, but it looks like SUSE is not going to
be the answer here.
I'm seeking recommendations for consulting help on this project since SUSE
has let me down.
I have Ceph installed and operating, however, I've been struggling with
getting the pool configured properly for CephFS and getting very poor
performance. The OSD servers have TLC NVMe for DB, and Optane NVMe for
WAL, so I should be seeing decent performance with the current cluster.
I'm not opposed to completely switching OS distributions. Ceph on SUSE was
our first SUSE installation. Almost everything else we run is on CentOS,
but that may change thanks to IBM cannibalizing CentOS.
Please reach out to me if you can recommend someone to sell us consulting
hours and/or a support contract.
-Chip Schweiss
chip.schweiss(a)wustl.edu
Washington University School of Medicine
I was wondering if someone could post a config for haproxy. Is there something specific to configure? Like binding clients to a specific backend server, client timeouts, security specific to rgw etc.
Hi,
Looking at the Octopus upgrade instructions, I see "the first time each
OSD starts, it will do a format conversion to improve the accounting for
“omap” data. This may take a few minutes to as much as a few hours (for
an HDD with lots of omap data)." and that I can disable this by setting
bluestore_fsck_quick_fix_on_mount to false.
A couple of questions about this:
i) what are the consequences of turning off this "quick fix"? Is it
possible to have it run in the background or similar?
ii) is there any way to narrow down the time estimate? Our production
cluster has 3060 OSDs on hdd (with block.db on NVME), and obviously 3000
lots of "a few hours" is an awful lot of time...
I'll be doing some testing on our test cluster (by putting 10M objects
into an S3 bucket before trying the upgrade), but it'd be useful to have
some idea of how this is likely to work at scale...
Thanks,
Matthew
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
Hello everyone,
We just installed a Ceph cluster version luminous (12.2.11) on servers
working with Debian buster (10.8)
using ceph-deploy and we are trying to upgrade it to mimic but can't
find a way to do it.
We tried ceph-deploy install --release mimic mon1 mon2 mon3 (after
having modified /etc/apt/sources.list.d/ceph.list)
but this does nothing because the packets are said to be up to date.
Could someone help us, please ?
Best regards
Hi,
(sorry if this gets posted twice. I forgot a subject in the first mail)
We expereinced an outage this morning on a jewel cluster with 1559 osds.
It appeared that a switch uplink in a rack misbehaved and once shutting that
interface ceph health restored quickly. I have some questions though on
osd behaviour that I hope someone can answer
1 - In a lot of osd logs I saw that neighbours reported the osd down
(while the process was still running and obviously logging). Then after a
while the logs shows
* Got signal Interrupt
* prepare_to_stop starting shutdown
and the osd process stops
Why does the osd proces stop? Is it instructed to do so by the monitor
because neighbours reported it down and ceph wants to avoid flapping?
2 - The osds reported a lot of
* heartbeat_check: no reply from #ip:#port
When I telnet to the ip and port I get a connection just fine. Is there a
way to run a heartbeat_check from the commandline so that we can try
capture the traffic to determine why it fails
Thanks
Marcel