Sorry to post this to the list, but does this lists.ceph.io password
reset work for anyone?
https://lists.ceph.io/accounts/password/reset/
For my accounts which are getting mail I have "The e-mail address is
not assigned to any user account".
Best Regards, Dan
Hi Dominic,
I just created a feature ticket in the Ceph tracker to keep track of
this issue.
Here's the ticket: https://tracker.ceph.com/issues/41537
Cheers,
Ricardo Dias
On 17/07/19 20:06, DHilsbos(a)performair.com wrote:
> All;
>
> I'm trying to firm up my understanding of how Ceph works, and ease of management tools and capabilities.
>
> I stumbled upon this: http://docs.ceph.com/docs/nautilus/rados/configuration/mon-lookup-dns/
>
> It got me wondering; how do you convey protocol version 2 capabilities in this format?
>
> The examples all list port 6789, which is the port for protocol version 1. Would I add SRV records for port 3300? How does the client distinguish v1 from v2 in this case?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International, Inc.
> DHilsbos(a)PerformAir.com
> www.PerformAir.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users(a)lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
--
Ricardo Dias
Senior Software Engineer - Storage Team
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284
(AG Nürnberg)
Hi,
We have all SSD disks as ceph's backend storage.
Consider the cost factor, can we setup the cluster to have only two
replicas for objects?
thanks & regards
Wesley
It seems that with Linux kernel 4.16.10 krdb clients are seen as Jewel
rather than Luminous. Can someone tell me which kernel version will be seen
as Luminous as I want to enable the Upmap Balancer.
Hi everybody
Im new to ceph and I have a question related to active+remapped+backfilling and misplaced objects
Recently I copied more than 10 million objects to a new cluster with 3 nodes and 6 osds during this migration one of my OSDs got full and health check became ERR I dont know why but ceph started to write every object on only one osd(can I change this behaviour) and after it gots full I try to reweight it by utilization and increase the pgs for 1 pool.
cluster became accessible again with status warning and recovery started. I checked the cluster status for 2 days and I found that I always have 1 pg with status active+remapped+backfilling and more than 5% misplaced objects. I thought recovery process takes more days so I leave the cluster to do the recovery in background, till now I have more active+remapped+backfill_wait pgs and more misplaced objects ( about 10%)
the question is what should I do ? waiting for recovery to finish ? can I speedup this process ?
these servers are in production environment am I in trouble or not ?
Kind Regards
Thanks
Hi everyone,
there are a couple of bug reports about this in Redmine but only one
(unanswered) mailing list message[1] that I could find. So I figured I'd
raise the issue here again and copy the original reporters of the bugs
(they are BCC'd, because in case they are no longer subscribed it
wouldn't be appropriate to share their email addresses with the list).
This is about https://tracker.ceph.com/issues/40029, and
https://tracker.ceph.com/issues/39978 (the latter of which was recently
closed as a duplicate of the former).
In short, it appears that at least in luminous and mimic (I haven't
tried nautilus yet), it's possible to crash a mon when attempting to add
a new OSD as it's trying to inject itself into the crush map under its
host bucket, when that host bucket does not exist yet.
What's worse is that when the OSD's "ceph osd new" process has thus
crashed the leader mon, a new leader is elected and in case the "ceph
osd new" process is still running on the OSD node, it will promptly
connect to that mon, and kill it too. This then continues until
sufficiently many mons have died for quorum to be lost.
The recovery steps appear to involve
- killing the "ceph osd new" process,
- restarting mons until you regain quorum,
- and then running "ceph osd purge" to drop the problematic OSD entry
from the crushmap and osdmap.
The issue can apparently be worked around by adding the host buckets to
the crushmap manually before adding the new OSDs, but surely this isn't
intended to be a prerequisite, at least not to the point of mons
crashing otherwise?
Also I am guessing that this is some weird corner case rooted in an
unusual combination of contributing factors, because otherwise I am
guessing more people would be bitten by this problem.
Anyone able to share their thoughts on this one? Have more people run
into this?
Cheers,
Florian
[1]
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/034880.html
— interestingly I could find this message in the pipermail archive but
none in the one that my MUA keeps for me. So perhaps that message wasn't
delivered to all subscribers, which might be why it has gone unanswered.
HI all!
I use ceph as the openstack VM disk. I have a VM run postgresql.
I found the disk on the vm run postgresql is very busy and slow!
But the ceph cluster is very healthy and without any slow request.
Even the vm disk is very busy, but the ceph cluster is look like very idle.
My ceph version is 12.2.8 and the vm disk is ext4 file system.
The postgresql vm disk is very busy! look below:
avg-cpu: %user %nice %system %iowait %steal %idle
0.53 0.00 1.6 16.55 0.00 81.31
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vdb 0.00 7425.00 2.0 65.5 40.00 63904.00 940.54 134.27 66966.54 12553.40 69042.94 14.71 100.05
The ceph cluster is very idle.
osd commit_latency(ms) apply_latency(ms)
39 0 1
38 0 2
37 0 0
36 0 1
35 0 0
34 0 0
33 0 0
32 0 0
31 0 0
30 0 1
29 0 1
28 0 0
27 0 1
26 0 1
25 0 1
24 0 1
23 0 0
22 0 0
9 0 0
8 0 0
7 0 0
6 0 1
5 0 1
4 0 7
0 0 1
1 0 3
2 0 2
3 0 1
10 0 1
11 0 0
13 0 0
14 0 0
15 0 1
16 0 0
17 0 1
18 0 0
19 0 0
20 0 0
21 0 0
Can anybody tell me why?
Thanks in advance!
On Fri, Aug 23, 2019 at 7:38 AM Ajitha Robert <ajitharobert01(a)gmail.com> wrote:
>
> Sir,
>
> I have a running DR setup with ceph.. but i did the same for another two sites.. Its actually direct L2 connectivity link between sites.. I m getting repeated error
>
> rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout
That is just indicative that you are having issues talking to your
local cluster. Assuming you only have a single rbd-mirror daemon
running, it seems like it cannot even sent a message through the OSD
to itself within 5 seconds. Perhaps your cluster is too slow to
respond?
>
> its coming continously. so not getting replicated to other site. Whether direct L2 connectivity is a concern?? whether rbd-mirror expects a L3 layer link for two sites?
>
> On Wed, Jul 24, 2019 at 12:42 AM Ajitha Robert <ajitharobert01(a)gmail.com> wrote:
>>
>> Thanks for your reply.
>>
>> Regarding rbd mirroring, Can you please check the logs for rbd image creation. Second[2] one started syncing and no progress further.
>>
>> 1)Log for manual rbd image creation
>>
>> http://paste.openstack.org/show/754766/
>>
>>
>> 2)Log for 16gb volume created from cinder, status in cinder volume is available
>>
>> http://paste.openstack.org/show/754767/
>>
>>
>> 3)Log for 100gb volume created from cinder, status in cinder volume is error
>>
>> http://paste.openstack.org/show/754769/
>>
>>
>> On Tue, Jul 23, 2019 at 1:13 AM Jason Dillaman <jdillama(a)redhat.com> wrote:
>>>
>>> On Mon, Jul 22, 2019 at 3:26 PM Ajitha Robert <ajitharobert01(a)gmail.com> wrote:
>>> >
>>> > Thanks for your reply
>>> >
>>> > 1) In scenario 1, I didnt attempt to delete the cinder volume. Please find the cinder volume log.
>>> > http://paste.openstack.org/show/754731/
>>>
>>> It might be better to ping Cinder folks about that one. It doesn't
>>> really make sense to me from a quick glance.
>>>
>>> >
>>> > 2) In scenario 2. I will try with debug. But i m having a test setup with one OSD in primary and one OSD in secondary. distance between two ceph clusters is 300 km
>>> >
>>> >
>>> > 3)I have disabled ceph authentication totally for all including rbd-mirror daemon. Also i have deployed the ceph cluster using ceph-ansible. Will these both create any issue to the entire setup
>>>
>>> Not to my knowledge.
>>>
>>> > 4)The image which was in syncing mode, showed read only status in secondary.
>>>
>>> Mirrored images are either primary or non-primary. It is the expected
>>> (documented) behaviour that non-primary images are read-only.
>>>
>>> > 5)In a presentation i found as journaling feature is causing poor performance in IO operations and we can skip the journaling process for mirroring... Is it possible.. By enabling mirroring to entire cinder pool as pool mode instead of mirror mode of rbd mirroring.. And we can skip the replication_enabled is true spec in cinder type..
>>>
>>> Journaling is required for RBD mirroring.
>>>
>>> >
>>> >
>>> >
>>> > On Mon, Jul 22, 2019 at 11:13 PM Jason Dillaman <jdillama(a)redhat.com> wrote:
>>> >>
>>> >> On Mon, Jul 22, 2019 at 10:49 AM Ajitha Robert <ajitharobert01(a)gmail.com> wrote:
>>> >> >
>>> >> > No error log in rbd-mirroring except some connection timeout came once,
>>> >> > Scenario 1:
>>> >> > when I create a bootable volume of 100 GB with a glance image.Image get downloaded and from cinder, volume log throws with "volume is busy deleting volume that has snapshot" . Image was enabled with exclusive lock, journaling, layering, object-map, fast-diff and deep-flatten
>>> >> > Cinder volume is in error state but the rbd image is created in primary but not in secondary.
>>> >>
>>> >> Any chance you know where in Cinder that error is being thrown? A
>>> >> quick grep of the code doesn't reveal that error message. If the image
>>> >> is being synced to the secondary site when you attempt to delete it,
>>> >> it's possible you could hit this issue. Providing debug log messages
>>> >> from librbd on the Cinder controller might also be helpful for this.
>>> >>
>>> >> > Scenario 2:
>>> >> > but when i create a 50gb volume with another glance image. Volume get created. and in the backend i could see the rbd images both in primary and secondary
>>> >> >
>>> >> > From rbd mirror image status i found secondary cluster starts copying , and syncing was struck at around 14 %... It will be in 14 % .. no progress at all. should I set any parameters for this like timeout??
>>> >> >
>>> >> > I manually checked rbd --cluster primary object-map check <object-name>.. No results came for the objects and the command was in hanging.. Thats why got worried on the failed to map object key log. I couldnt even rebuild the object map.
>>> >>
>>> >> It sounds like one or more of your primary OSDs are not reachable from
>>> >> the secondary site. If you run w/ "debug rbd-mirror = 20" and "debug
>>> >> rbd = 20", you should be able to see the last object it attempted to
>>> >> copy. From that, you could use "ceph osd map" to figure out the
>>> >> primary OSD for that object.
>>> >>
>>> >> > the image which was in syncing mode, showed read only status in secondary.
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Mon, 22 Jul 2019, 17:36 Jason Dillaman, <jdillama(a)redhat.com> wrote:
>>> >> >>
>>> >> >> On Sun, Jul 21, 2019 at 8:25 PM Ajitha Robert <ajitharobert01(a)gmail.com> wrote:
>>> >> >> >
>>> >> >> > I have a rbd mirroring setup with primary and secondary clusters as peers and I have a pool enabled image mode.., In this i created a rbd image , enabled with journaling.
>>> >> >> >
>>> >> >> > But whenever i enable mirroring on the image, I m getting error in osd.log. I couldnt trace it out. please guide me to solve this error.
>>> >> >> >
>>> >> >> > I think initially it worked fine. but after ceph process restart. these error coming
>>> >> >> >
>>> >> >> >
>>> >> >> > Secondary.osd.0.log
>>> >> >> >
>>> >> >> > 2019-07-22 05:36:17.371771 7ffbaa0e9700 0 <cls> /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:61: failed to get omap key: client_a5c76849-ba16-480a-a96b-ebfdb7f6ac65
>>> >> >> > 2019-07-22 05:36:17.388552 7ffbaa0e9700 0 <cls> /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:472: active object set earlier than minimum: 0 < 1
>>> >> >> > 2019-07-22 05:36:17.413102 7ffbaa0e9700 0 <cls> /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:61: failed to get omap key: order
>>> >> >> > 2019-07-22 05:36:23.341490 7ffbab8ec700 0 <cls> /build/ceph-12.2.12/src/cls/rbd/cls_rbd.cc:4125: error retrieving image id for global id '9e36b9f8-238e-4a54-a055-19b19447855e': (2) No such file or directory
>>> >> >> >
>>> >> >> >
>>> >> >> > primary-osd.0.log
>>> >> >> >
>>> >> >> > 2019-07-22 05:16:49.287769 7fae12db1700 0 log_channel(cluster) log [DBG] : 1.b deep-scrub ok
>>> >> >> > 2019-07-22 05:16:54.078698 7fae125b0700 0 log_channel(cluster) log [DBG] : 1.1b scrub starts
>>> >> >> > 2019-07-22 05:16:54.293839 7fae125b0700 0 log_channel(cluster) log [DBG] : 1.1b scrub ok
>>> >> >> > 2019-07-22 05:17:04.055277 7fae12db1700 0 <cls> /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:472: active object set earlier than minimum: 0 < 1
>>> >> >> >
>>> >> >> > 2019-07-22 05:33:21.540986 7fae135b2700 0 <cls> /build/ceph-12.2.12/src/cls/journal/cls_journal.cc:472: active object set earlier than minimum: 0 < 1
>>> >> >> > 2019-07-22 05:35:27.447820 7fae12db1700 0 <cls> /build/ceph-12.2.12/src/cls/rbd/cls_rbd.cc:4125: error retrieving image id for global id '8a61f694-f650-4ba1-b768-c5e7629ad2e0': (2) No such file or directory
>>> >> >>
>>> >> >> Those don't look like errors, but the log level should probably be
>>> >> >> reduced for those OSD cls methods. If you look at your rbd-mirror
>>> >> >> daemon log, do you see any errors? That would be the important place
>>> >> >> to look.
>>> >> >>
>>> >> >> >
>>> >> >> > --
>>> >> >> > Regards,
>>> >> >> > Ajitha R
>>> >> >> > _______________________________________________
>>> >> >> > ceph-users mailing list
>>> >> >> > ceph-users(a)lists.ceph.com
>>> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Jason
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Jason
>>> >
>>> >
>>> >
>>> > --
>>> > Regards,
>>> > Ajitha R
>>>
>>>
>>>
>>> --
>>> Jason
>>
>>
>>
>> --
>> Regards,
>> Ajitha R
>
>
>
> --
> Regards,
> Ajitha R
--
Jason
Hi everyone,
apologies in advance; this will be long. It's also been through a bunch
of edits and rewrites, so I don't know how well I'm expressing myself at
this stage — please holler if anything is unclear and I'll be happy to
try to clarify.
I am currently in the process of investigating the behavior of OpenStack
Nova instances when being snapshotted and suspended, in conjunction with
qemu-guest-agent (qemu-ga). I realize that RBD-backed Nova/libvirt
instances are expected to behave differently from file-backed ones, but
I think I might have reason to believe that the RBD-backed ones are
indeed behaving incorrectly, and I'd like to verify that.
So first up, for comparison, let's recap how a Nova/libvirt/KVM instance
behaves when it is *not* backed by RBD (such as, it's using a qcow2 file
that is on a Nova compute node in /var/lib/nova/instances), is booted
from an image with the hw_qemu_guest_agent=yes meta property set, and
runs qemu-guest-agent within the guest:
- User issues "nova suspend" or "openstack server suspend".
- If nova-compute on the compute node decides that the instance has
qemu-guest-agent running (which is the case if it's qemu or kvm, and its
image has hw_qemu_guest_agent=yes), it sends a guest-sync command over
the guest agent VirtIO serial port. This command registers in the
qemu-ga log file in the guest.
- nova-compute on the compute node sends a libvirt managed-save command.
- Nova reports the instance as suspended.
- User issues "nova resume" or "openstack server resume".
- nova-compute on the compute node sends a libvirt start command.
- Again, if nova-compute on the compute node knows that the instance has
qemu-guest-agent running, it sends another command over the serial port,
namely guest-set-time. This, too, registers in the guest's qemu-ga log.
- Nova reports the instance as active (running normally) again.
Now, when I instead use a Nova environment that is fully RBD-backed, I
see exactly the same behavior as described above. So I know that in
principle, nova-compute/qemu-ga communication works in both an
RBD-backed and a non-RBD-backed environment.
However, things appear to get very different when it comes to snapshots.
Again, starting with a file-backed environment:
- User issues "nova image-create" or "openstack server image create".
- If nova-compute on the compute node decides that the instance can be
quiesced (which is the case if it's qemu or kvm, and its image has
hw_qemu_guest_agent=yes), then it sends a "guest-fsfreeze-freeze"
command over the guest agent VirtIO serial port.
- The guest agent inside the guest loops over all mounted filesystems,
and issues the FIFREEZE ioctl (which maps to the kernel freeze_super()
function). This can be seen in the qemu-ga log file in the guest, and it
is also verifiable by using ftrace on the qemu-ga PID and checking for
the freeze_super() function call.
- nova-compute then takes a live snapshot of the instance.
- Once complete, the guest gets a "guest-fsfreeze-thaw" command, and
again I can see this in the qemu-ga log, and with ftrace.
And now with RBD:
- User issues "nova image-create" or "openstack server image create".
- The guest-fsfreeze-freeze agent command never happens.
Now I can see the info message from
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…
in my nova-compute log, which confirms that we're attempting a live
snapshot.
I also do *not* see the warning from
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…,
so it looks like the direct_snapshot() call from
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…
succeeds. This is defined in
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…
and it uses RBD functionality only. Importantly, it never interacts with
qemu-ga, so it appears to not worry at all about freezing the filesystem.
(Which does seem to contradict
https://docs.ceph.com/docs/master/rbd/rbd-openstack/?highlight=uuid#image-p…,
by the way, so that may be a documentation bug.)
Now here's another interesting part. Were the direct snapshot to fail,
if I read
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…
and
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…
correctly, the fallback behavior would be as follows: The domain would
next be "suspended" (note, again this is Nova suspend, which maps to
libvirt managed-save per
https://opendev.org/openstack/nova/src/commit/7bf75976016aae5d458eca9f6ddac…),
then snapshotted using a libvirt call and resumed again post-snapshot.
In which case there would be a guest-sync call on suspend.
And it's this part that has me a bit worried. If an RBD backed instance,
on a successful snapshot, never freezes its filesystem *and* never does
any kind of sync, either, doesn't that mean that such an instance can't
be made to produce consistent snapshots? (Particularly in the case of
write-back caching, which is recommended and normally safe for
RBD/virtio devices.) Or is there some magic within the Qemu RBD storage
driver that I am unaware of, that makes any such contortions unnecessary?
Thanks in advance for your insights!
Cheers,
Florian
It's certainly possible. It makes things a little more complex though. Some
questions you may want to consider during the design..
- Is the customer aware this won't preserve any data on the luns they are
hoping to reuse.
- Is the plan to eventually replace the SAN with JBOD, in the same systems?
If so you may want to make your luns look like the eventual drive size and
count.
- Is the plan to use a few systems with SAN and add standalone systems
later? Then you need to calculate expected speeds and divide between
failure domains.
- Is the plan to use a couple of hosts with SAN to save money, and have the
rest be traditional Ceph storage? If so consider putting the SAN hosts all
in one failure domain.
- Depending on the SAN you may consider aligning your failure domains to
different arrays, switches, or even array directors.
- Remember to take the hosts network speed into consideration when
calculating how many luns to put on each host.
Hope that helps.
-Brett
On Thu, Aug 22, 2019, 4:14 AM Mohsen Mottaghi <mohsenmottaghi(a)outlook.com>
wrote:
> Hi
>
>
> Yesterday one of our customers asked us a strange request. He asked us to
> use SAN as the Ceph storage space to add the SAN storages it currently has
> to the cluster and reduce other disk purchase costs.
>
>
> Anybody know can we do this or not?! And if this is possible how we should
> start to architect this Strange Ceph?! Is it good or not?!
>
>
>
> Thanks for your help.
>
> Mohsen Mottaghi
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>