Hi,
Probably a basic/stupid question but I'm asking anyway. Through lack of knowledge and experience at the time, when we set up our pools, our pool that holds the majority of our data was created with a PG/PGP num of 64. As the amount of data has grown, this has started causing issues with balance of data across OSDs. I want to increase the PG count to at least 512, or maybe 1024 - obviously, I want to do this incrementally. However, rather than going from 64 to 128, then 256 etc, I'm considering doing this in much smaller increments over a longer period of time so that it will hopefully be doing the majority of moving around of data during the quieter time of day. So, may start by going in increments of 4 until I get up to 128 and then go in jumps of 8 and so on.
My question is, will I still end up with the same net result going in increments of 4 until I hit 128 as I would if I were to go straight to 128 in one hit. What I mean by that is that once I reach 128, would I have the exact same level of data balance across PGs as I would if I went straight to 128? Are there any drawbacks in going up in small increments over a long period of time? I know that I'll have uneven PG sizes until I get to that exponent of 2 but that should be OK as long as the end result is the desired result. I suspect I may have a greater amount of data moving around overall doing it this way but given my goal is to reduce the amount of intensive data moves during higher traffic times, that's not a huge concern in the grand scheme of things.
Thanks in advance,
Mark
Hi everyone!
I'm facing a weird issue with one of my CEPH clusters:
OS: CentOS - 8.2.2004 (Core)
CEPH: Nautilus 14.2.11 - stable
RBD using erasure code profile (K=3; m=2)
When I want to format one of my RBD image (client side) I've got the
following kernel messages multiple time with different sector IDs:
*[2417011.790154] blk_update_request: I/O error, dev rbd23, sector
164743869184 op 0x3:(DISCARD) flags 0x4000 phys_seg 1 prio class
0[2417011.791404] rbd: rbd23: discard at objno 20110336 2490368~1703936
result -1 *
At first I thought about a faulty disk BUT the monitoring system is not
showing anything faulty so I decided to run manual tests on all my OSDs to
look at disk health using smartctl etc.
None of them is marked as not healthy and actually they don't get any
counter with faulty sectors/read or writes and the Wear Level is 99%
So, the only particularity of this image is it is a 80Tb image, but it
shouldn't be an issue as we already have that kind of image size used on
another pool.
If anyone have a clue at how I could sort this out, I'll be more than happy
^^
Kind regards!
I'm trying to use ceph-volume to do various things.
It works fine locally, for things like
ceph-volume lvm zap
But when I want it to do OSD level things, it is unhappy.
To use a trivial example, it wants to do things like
/usr/bin/ceph --cluster ceph --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
but then dies saying,
[errno 13] RADOS permission denied (error connecting to the cluster)
and if I directly run that long command myself, it indeed dies.
(which is not too surprising, since /var/lib/ceph/bootstrap-osd/ceph.keyring does not exist)
However, if I just run from the same command prompt,
/usr/bin/ceph osd tree -f json
it works fine.
How can i get ceph-volume to just use the creds that are already working somewhere?
--
Philip Brown| Sr. Linux System Administrator | Medata, Inc.
5 Peters Canyon Rd Suite 250
Irvine CA 92606
Office 714.918.1310| Fax 714.918.1325
pbrown(a)medata.com| www.medata.com
Friends,
Any help or suggestion here for missing data?
Thanks,
-Vikas
From: Vikas Rana <vrana(a)vtiersys.com>
Sent: Tuesday, February 16, 2021 12:20 PM
To: 'ceph-users(a)ceph.io' <ceph-users(a)ceph.io>
Subject: Data Missing with RBD-Mirror
Hi Friends,
We have a very weird issue with rbd-mirror replication. As per the command
output, we are in sync but the OSD usage on DR side doesn't match the Prod
Side.
On Prod, we are using close to 52TB but on DR side we are only 22TB.
We took a snap on Prod and mounted the snap on DR side and compared the data
and we found lot of missing data. Please see the output below.
Please help us resolve this issue or point us in right direction.
Thanks,
-Vikas
DR# rbd --cluster cephdr mirror pool status cifs --verbose
health: OK
images: 1 total
1 replaying
research_data:
global_id: 69656449-61b8-446e-8b1e-6cf9bd57d94a
state: up+replaying
description: replaying, master_position=[object_number=390133, tag_tid=4,
entry_tid=447832541], mirror_position=[object_number=390133, tag_tid=4,
entry_tid=447832541], entries_behind_master=0
last_update: 2021-01-29 15:10:13
DR# ceph osd pool ls detail
pool 5 'cifs' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins
pg_num 128 pgp_num 128 last_change 1294 flags hashpspool stripe_width 0
application rbd
removed_snaps [1~5]
PROD# ceph df detail
POOLS:
NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED
MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED
cifs 17 N/A N/A 26.0TiB 30.10
60.4TiB 6860550 6.86M 873MiB 509MiB 52.1TiB
DR# ceph df detail
POOLS:
NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED
MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED
cifs 5 N/A N/A 11.4TiB 15.78
60.9TiB 3043260 3.04M 2.65MiB 431MiB 22.8TiB
PROD#:/vol/research_data# du -sh *
11T Flab1
346G KLab
1.5T More
4.4T ReLabs
4.0T WLab
DR#:/vol/research_data# du -sh *
2.6T Flab1
14G KLab
52K More
8.0K RLabs
202M WLab
I'm coming back to trying mixed SSD+spinning disks after maybe a year.
It was my vague recollection, that if you told ceph "go auto configure all the disks", it would actually automatically carve up the SSDs into the appropriate number of LVM segments, and use them as WAL devices for each hdd based OSD on the system.
Was I wrong?
Because when I tried to bring up a brand new cluster (Octopus, cephadm bootstrapped), with multiple nodes and multiple disks per node...
it seemed to bring up the SSDS as just another set of OSDs.
it clearly recognized them as ssd. The output of "ceph orch device ls" showed them as ssd vs hdd for the others.
It just...didnt use them as I expected.
?
Maybe I was thinking of ceph ansible.
Is there not a nice way to do this with the new cephadm based "ceph orch"?
I would rather not have to go write json files or whatever by hand, when a computer should be perfectly capable of auto generating this stuff itself
--
Philip Brown| Sr. Linux System Administrator | Medata, Inc.
5 Peters Canyon Rd Suite 250
Irvine CA 92606
Office 714.918.1310| Fax 714.918.1325
pbrown(a)medata.com| www.medata.com
At one point in the life cycle of my test ceph cluster, I used the
--all-available-devices
flag of ceph orch.
Which will always attempt to bring up any new autodetected disks.
I now see in the docs,
"If you want to avoid this behavior (disable automatic creation of OSD on available devices), use the unmanaged parameter:"
But... I believe I did run it with the --unmanaged flag afterwards.
Unfortunately, the original still seems to persist, and it keeps auto creating.
How can I get it to stop?
I also see mention that,
"When the parameter all-available-devices or a DriveGroup specification is used, a cephadm service is created"
However, using "ceph orch ps", I dont see any relevantly named service.
Where else should I be looking?
--
Philip Brown| Sr. Linux System Administrator | Medata, Inc.
5 Peters Canyon Rd Suite 250
Irvine CA 92606
Office 714.918.1310| Fax 714.918.1325
pbrown(a)medata.com| www.medata.com
What is the best way to move an rbd image to a different pool. I want to move some 'old' images (some have snapshots) to backup pool. For some there is also a difference in device class.
I am bumping this email to hopefully get some more eyes on it.
We are continuing to have this problem. Unfortunately the cluster is very lightly used currently until we go full production so we do not have the level of traffic that would generate a lot of statistics.
We did update to 14.2.16 from 14.2.10 on Feb 1, 2021 and this seems to correlate with when the errors started popping up.
Our current plan is to roll back the version to 14.2.10 again and rerun the test that causes the issue.
I noted there was another email thread regarding latencies for a user who also updated to 14.2.16 recently and I'm not sure if this could be related or not to my issue.
Any suggestions you may have are very welcomed.
Cheers,
--
Mike Cave
On 2021-02-11, 8:37 AM, "Mike Cave" <mcave(a)uvic.ca> wrote:
So, as the subject states I have an issue with buckets returning a 404 error when they are listed immediately after being created; as well the bucket fails to be deleted if you try to delete it immediately after creation.
The behaviour is intermittent.
If I leave the bucket in place for a few minutes, the bucket behaves normally. I’m thinking this is a metadata issue or something along those lines but I’m out of my depth now.
To the best of our knowledge the cluster has not changed in any way since the same tests were run in December with no errors.
We are running Ceph 14.2.16 on all parts of the cluster.
I am using the python-swift client for the connection on a CentOS7 machine.
Can replicate the results from the mons or an external client as well.
I’m willing to share my test script as well if you would like to see how I’m generating the error.
Here is a piece of the logs in case I missed something in the interpretation (log level at 20):
14:23:17.069 7faba00df700 1 ====== starting new request req=0x55fb7a138700 =====
14:23:17.069 7faba00df700 2 req 148 0.000s initializing for trans_id = tx000000000000000000094-0060245cd5-2b8949-default
14:23:17.069 7faba00df700 10 rgw api priority: s3=8 s3website=7
14:23:17.069 7faba00df700 10 host=<NameRemoved>
14:23:17.069 7faba00df700 20 subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0
14:23:17.069 7faba00df700 -1 res_query() failed
14:23:17.069 7faba00df700 20 final domain/bucket subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 s->info.domain= s->info.request_uri=/swift/v1/404test
14:23:17.069 7faba00df700 10 ver=v1 first=404test req=
14:23:17.069 7faba00df700 10 handler=28RGWHandler_REST_Bucket_SWIFT
14:23:17.069 7faba00df700 2 req 148 0.000s getting op 2
14:23:17.069 7faba00df700 10 req 148 0.000s swift:delete_bucket scheduling with dmclock client=3 cost=1
14:23:17.069 7faba00df700 10 op=30RGWDeleteBucket_ObjStore_SWIFT
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket verifying requester
14:23:17.069 7faba00df700 20 req 148 0.000s swift:delete_bucket rgw::auth::swift::DefaultStrategy: trying rgw::auth::swift::TempURLEngine
14:23:17.069 7faba00df700 20 req 148 0.000s swift:delete_bucket rgw::auth::swift::TempURLEngine denied with reason=-13
14:23:17.069 7faba00df700 20 req 148 0.000s swift:delete_bucket rgw::auth::swift::DefaultStrategy: trying rgw::auth::swift::SignedTokenEngine
14:23:17.069 7faba00df700 10 req 148 0.000s swift:delete_bucket swift_user=xmcc:swift
14:23:17.069 7faba00df700 20 build_token token=0a000000786d63633a73776966748960ea4653df708a55ae2560e58acf01
14:23:17.069 7faba00df700 20 req 148 0.000s swift:delete_bucket rgw::auth::swift::SignedTokenEngine granted access
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket normalizing buckets and tenants
14:23:17.069 7faba00df700 10 s->object=<NULL> s->bucket=404test
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket init permissions
14:23:17.069 7faba00df700 20 get_system_obj_state: rctx=0x55fb7a137770 obj=default.rgw.meta:root:404test state=0x55fb7a060ac0 s->prefetch_data=0
14:23:17.069 7faba00df700 10 cache get: name=default.rgw.meta+root+404test : hit (negative entry)
14:23:17.069 7faba00df700 20 get_system_obj_state: rctx=0x55fb7a137130 obj=default.rgw.meta:users.uid:xmcc state=0x55fb7a060f40 s->prefetch_data=0
14:23:17.069 7faba00df700 10 cache get: name=default.rgw.meta+users.uid+xmcc : hit (requested=0x6, cached=0x17)
14:23:17.069 7faba00df700 20 get_system_obj_state: s->obj_tag was set empty
14:23:17.069 7faba00df700 20 Read xattr: user.rgw.idtag
14:23:17.069 7faba00df700 20 get_system_obj_state: rctx=0x55fb7a137130 obj=default.rgw.meta:users.uid:xmcc state=0x55fb7a060f40 s->prefetch_data=0
14:23:17.069 7faba00df700 10 cache get: name=default.rgw.meta+users.uid+xmcc : hit (requested=0x6, cached=0x17)
14:23:17.069 7faba00df700 20 get_system_obj_state: s->obj_tag was set empty
14:23:17.069 7faba00df700 20 Read xattr: user.rgw.idtag
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket recalculating target
14:23:17.069 7faba00df700 10 Starting retarget
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket reading permissions
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket init op
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket verifying op mask
14:23:17.069 7faba00df700 20 req 148 0.000s swift:delete_bucket required_mask= 4 user.op_mask=7
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket verifying op permissions
14:23:17.069 7faba00df700 20 req 148 0.000s swift:delete_bucket -- Getting permissions begin with perm_mask=50
14:23:17.069 7faba00df700 5 req 148 0.000s swift:delete_bucket Searching permissions for identity=rgw::auth::ThirdPartyAccountApplier() -> rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=xmcc, acct_name=xmcc, subuser=swift, perm_mask=15, is_admin=0) mask=50
14:23:17.069 7faba00df700 5 Searching permissions for uid=xmcc
14:23:17.069 7faba00df700 5 Found permission: 15
14:23:17.069 7faba00df700 5 Searching permissions for group=1 mask=50
14:23:17.069 7faba00df700 5 Permissions for group not found
14:23:17.069 7faba00df700 5 Searching permissions for group=2 mask=50
14:23:17.069 7faba00df700 5 Permissions for group not found
14:23:17.069 7faba00df700 5 req 148 0.000s swift:delete_bucket -- Getting permissions done for identity=rgw::auth::ThirdPartyAccountApplier() -> rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=xmcc, acct_name=xmcc, subuser=swift, perm_mask=15, is_admin=0), owner=xmcc, perm=2
14:23:17.069 7faba00df700 10 req 148 0.000s swift:delete_bucket identity=rgw::auth::ThirdPartyAccountApplier() -> rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=xmcc, acct_name=xmcc, subuser=swift, perm_mask=15, is_admin=0) requested perm (type)=2, policy perm=2, user_perm_mask=2, acl perm=2
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket verifying op params
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket pre-executing
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket executing
14:23:17.069 7faba00df700 0 req 148 0.000s swift:delete_bucket ERROR: bucket 404test not found
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket completing
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket op status=-2002
14:23:17.069 7faba00df700 2 req 148 0.000s swift:delete_bucket http status=404
14:23:17.069 7faba00df700 1 ====== req done req=0x55fb7a138700 op status=-2002 http_status=404 latency=0s ======
--
Mike Cave
I acknowledge and respect the Lekwungen-speaking Peoples on whose traditional territories the university stands and the Songhees, Esquimalt and WSANEC peoples whose historical relationships with the land continue to this day.
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
Dear cephers,
I was doing some maintenance yesterday involving shutdown-power up cycles of ceph servers. With the last server I run into a problem. The server runs an MDS and a couple of OSDs. After reboot, the MDS joined the MDS cluster without problems, but the OSDs didn't come up. This was 1 out of 12 servers and I had no such problems with the other 11. I also observed that "ceph status" was responding very slow.
Upon further inspection, I found out that 2 of my 3 MONs (the leader and a peon) were running at 100% CPU. Client I/O was continuing, probably because the last cluster map remained valid. On our node performance monitoring I could see that the 2 busy MONs were showing extraordinary network activity.
This state lasted for over one hour. After the MONs settled down, the OSDs finally joined as well and everything went back to normal.
The other instance I have seen similar behaviour was, when I restarted a MON on an empty disk and the re-sync was extremely slow due to a too large value for mon_sync_max_payload_size. This time, I'm pretty sure it was MON-client communication; see below.
Are there any settings similar to mon_sync_max_payload_size that could influence responsiveness of MONs in a similar way?
Why do I suspect it is MON-client communication? In our monitoring, I do not see the huge amount of packages sent by the MONs arriving at any other ceph daemon. They seem to be distributed over client nodes, but since we have a large count of client nodes (>550) this is covered by the background network traffic. A second clue is that I have had such extended lock-ups before and, whenever I checked, I only observed these in case the leader had a large share of client sessions.
For example, yesterday the client session count per MON was:
ceph-01: 1339 (leader)
ceph-02: 189 (peon)
ceph-03: 839 (peon)
I usually restart the leader when such a critical distribution occurs. As long as the leader has the fewest client sessions, I never observe this problem.
Ceph version is 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable).
Thanks for any clues!
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14