Hey all.
I was wondering if Ceph Octopus is capable of automating/managing snapshot creation/retention and then replication? Ive seen some notes about it, but can't seem to find anything solid.
Open to suggestions as well. Appreciate any input!
Hi,
My cluster is in a warning state as it is rebalancing after I've added a
bunch of disks. (no issue here!)
Though, there are a few things which I just cannot understand... I hope
someone can help me... I'm getting hopeless finding the answers... If you
can answer any question (even one), it will be greatly appreciated.
Environment: Ceph Nautilus 14.2.11, 282 OSDs, mainly erasure coding.
1. Ceph dashboard shows more PGs for an OSD then I can extract from the pg
dump information. I assume that the value in the dashboard is coming from
Prometheus, though I'm not sure. Querying Prometheus thru Grafana also
gives me the same figures as the one I see in the dashboard (which is
incorrect). I can't find out how this value is calculated... All the
information I can find regarding calculating the pg's stored on an osd are
derived from pg dump :-( Help help help
2. As the cluster is rebalancing and there is a huge gap between the acting
and the up osd's for a bunch of pg's, some disks sometimes have slow
responses due to the backfilling and are from time to time marked as down.
(I know I can go around this one by setting "nodown" temporarily). If the
disk which is going down is in an acting set of a specific pg (not in the
upmap for that pg), then the pg will be marked as degraded as the system
will try to rebuild the missing data towards an osd in the up set. This I
understand... What I don't understand is that when the disk is restarted
and thus marked back as "up" (or simply being patient...), it doesn't add
that osd back to the acting set... Restarting other osds (and thus causing
more peering operations again) results in the disk being added again to the
acting set... I don't understand how this is happening.
3. Addition to question 2: If an osd which was in an acting set went down
and is removed from the acting set (replaced with -1), which process will
remove the obsolete data from the osd that went down once it is back up?
Which process is cleaning up the obsolete copy in the end? Does scrubbing
take this process into account? I assume that this might be related to my
first question too.
4. I'm data mining the pg dump output... If I get an answer on all the
previous questions, this will probably be answered automatically: When I
look at all the acting pg's for a specific osd, and I look at the numbytes
of that pg and I calculate the size that should be stored on that osd
(taking into account the erasure coding process) I get a difference between
disk space which should be used and effectively used. E.g. for a disk the
system is saying 11.3 TiB is being used... Calculating it using pg dump
gives me +/- 10TiB which should be used. I know that a delta might occur
due to the block size etc, but it doesn't seem correct as the usage is too
high...
I've tried to search on processes which clean up the osds, garbage
collection, etc. but no good information is available. You can find tons of
information related to garbage collection in combination with RGW, but not
for the RADOS mechanism... I really can't find any clue how the pg's of a
disk are removed after it went down and is not being used again in the
acting/up set of those pg's...
Another question which I probably should post in the dev group, but which
IDE is recommended for developing in the Ceph project? I'm working on a
Mac... Don't know if there are any recommendations.
I really hope I can get some help on these questions.
Many thanks!
Regards,
Kristof
Benji is independent from Ceph. It utilizes Ceph snapshots to do the backups, but it has nothing to do with managing Ceph snapshots.
I am simply looking for the ability to manage Ceph snapshots. For example. Take a snapshot every 30 minutes, keep 8 of those 30 minute snapshots.
Hi,
What is the correct procedure to change a running cluster deployed with cephadm to a custom container from a private repository that requires a login?
I would have thought something like this would have been the right way, but the second command fails with an authentication error.
$ cephadm registry-login --registry-url myregistry.com/ceph/ --registry-username 'robot$myuser --registry-password 'mysecret'
$ ceph orch upgrade start --image myregistry.com/ceph/ceph:v15.2.5
Thanks,
Liam
Hi,
I'm experimenting ceph on a (small) test cluster. I'm using version 15.2.5
deployed with cephadm.
I was trying to do some "disaster" testing, such as wiping a disk in order
to simulate a hardware failure, destroy the osd and recreate it, all of
which I managed to do successfully.
However, a few hours after this test, the orchestrator failed with no
apparent reason. I tried to disable and reenable cephadm, but with no luck:
# ceph orch ls
Error ENOENT: No orchestrator configured (try `ceph orch set backend`)
# ceph orch set backend cephadm
Error ENOENT: Module not found
What could have happened? Is there some way to reenable cephadm?
Thanks,
Marco
Hi,
I have created a test Ceph cluster with Ceph Octopus using cephadm.
Cluster total RAW disk capacity is 262 TB but it's allowing to use of only
132TB.
I have not set quota for any of the pool. what could be the issue?
Output from :-
ceph -s
cluster:
id: f8bc7682-0d11-11eb-a332-0cc47a5ec98a
health: HEALTH_WARN
clock skew detected on mon.strg-node3, mon.strg-node2
2 backfillfull osd(s)
4 pool(s) backfillfull
1 pools have too few placement groups
services:
mon: 3 daemons, quorum strg-node1,strg-node3,strg-node2 (age 7m)
mgr: strg-node3.jtacbn(active, since 7m), standbys: strg-node1.gtlvyv
mds: cephfs-strg:1 {0=cephfs-strg.strg-node1.lhmeea=up:active} 1
up:standby
osd: 48 osds: 48 up (since 7m), 48 in (since 5d)
task status:
scrub status:
mds.cephfs-strg.strg-node1.lhmeea: idle
data:
pools: 4 pools, 289 pgs
objects: 17.29M objects, 66 TiB
usage: 132 TiB used, 130 TiB / 262 TiB avail
pgs: 288 active+clean
1 active+clean+scrubbing+deep
mounted volume shows
node1:/ 67T 66T 910G 99% /mnt/cephfs
Only 9M
------------------------------------------------------------------------------
root@fond-beagle:/var/lib/ceph/mon/ceph-fond-beagle/store.db# ls -lh
*.log
-rw------- 1 ceph ceph 9.9M Oct 26 10:57 1443554.log
-------------------------------------------------------------------------------
El 2020-10-26 10:55, Anthony D'Atri escribió:
> See if you have big LOG* files inside misleading you
>
>> On Oct 26, 2020, at 7:13 AM, Ing. Luis Felipe Domínguez Vega
>> <luis.dominguez(a)desoft.cu> wrote:
>>
>> How can i free the store of ceph monitor?:
>>
>> ------------------------------------------------------------------------
>> root@fond-beagle:/var/lib/ceph/mon/ceph-fond-beagle# du -h -d1
>> 542G ./store.db
>> 542G .
>> ------------------------------------------------------------------------
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
Hello,
(BTW, Nautilus 14.2.7 on Debian non-container.)
We're about to purchase more OSD nodes for our cluster, but I have a
couple questions about hardware choices. Our original nodes were 8 x
12TB SAS drives and a 1.6TB Samsung NVMe card for WAL, DB, etc.
We chose the NVMe card for performance since it has an 8 lane PCIe
interface. However, we're currently BlueFS spillovers.
The Tyan chassis we are considering has the option of 4 x U.2 NVMe bays
- each with 4 PCIe lanes, (and 8 SAS bays). It has occurred to me that
I might stripe 4 1TB NVMe drives together to get much more space for
WAL/DB and a net performance of 16 PCIe lanes.
Any thoughts on this approach?
Also, any thoughts/recommendations on 12TB OSD drives? For
price/capacity this is a good size for us, but I'm wondering if my
BlueFS spillovers are resulting from using drives that are too big. I
also thought I might have seen some comments about cutting large drives
into multiple OSDs - could that be?
Thanks.
-Dave
--
Dave Hall
Binghamton University
kdhall(a)binghamton.edu
Hi,
I have a cluster with 182 OSDs, this has been expanded towards 282 OSDs.
Some disks were near full.
The new disks have been added with initial weight = 0.
The original plan was to increase this slowly towards their full weight
using the gentle reweight script. However, this is going way too slow and
I'm also having issues now with "backfill_toofull".
Can I just add all the OSDs with their full weight, or will I get a lot of
issues when I'm doing that?
I know that a lot of PGs will have to be replaced, but increasing the
weight slowly will take a year at the current speed. I'm already playing
with the max backfill to increase the speed, but every time I increase the
weight it will take a lot of time again...
I can face the fact that there will be a performance decrease.
Looking forward to your comments!
Regards,
Kristof