hey Gal and Eric,
in today's standup, we discussed the version of our apache arrow
submodule. it's currently pinned at 6.0.1, which was tagged in nov.
2021. the centos9 builds are using the system package
libarrow-devel-9.0.0. arrow's upstream recently tagged an 11.0.0
release
as far as i know, there still aren't any system packages for ubuntu,
so we're likely to be stuck with the submodule for quite a while. how
do guys want to handle these updates? is it worth trying to update
before the reef release?
Details of this release are summarized here:
https://tracker.ceph.com/issues/63443#note-1
Seeking approvals/reviews for:
smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE failures)
rados - Neha, Radek, Travis, Ernesto, Adam King
rgw - Casey
fs - Venky
orch - Adam King
rbd - Ilya
krbd - Ilya
upgrade/quincy-x (reef) - Laura PTL
powercycle - Brad
perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures)
Please reply to this email with approval and/or trackers of known
issues/PRs to address them.
TIA
YuriW
Hi,
I have an image with a snapshot and some changes after snapshot.
```
$ rbd du backup/f0408e1e-06b6-437b-a2b5-70e3751d0a26
NAME PROVISIONED USED
f0408e1e-06b6-437b-a2b5-70e3751d0a26@snapshot-eb085877-7557-4620-9c01-c5587b857029 10 GiB 2.4 GiB
f0408e1e-06b6-437b-a2b5-70e3751d0a26 10 GiB 2.4 GiB
<TOTAL> 10 GiB 4.8 GiB
```
If there is no changes after snapshot, the image line will show 0 used.
I did export and import.
```
$ rbd export --export-format 2 backup/f0408e1e-06b6-437b-a2b5-70e3751d0a26 - | rbd import --export-format 2 - backup/test
Exporting image: 100% complete...done.
Importing image: 100% complete...done.
```
When check the imported image, the image line shows 0 used.
```
$ rbd du backup/test
NAME PROVISIONED USED
test@snapshot-eb085877-7557-4620-9c01-c5587b857029 10 GiB 2.4 GiB
test 10 GiB 0 B
<TOTAL> 10 GiB 2.4 GiB
```
Any clues how that happened? I'd expect the same du as the source.
I tried another quick test. It works fine.
```
$ rbd create backup/test-src --size 10G
$ sudo rbd map backup/test-src
/dev/rbd0
$ echo "hello" | sudo tee /dev/rbd0
hello
$ rbd du backup/test-src
NAME PROVISIONED USED
test-src 10 GiB 4 MiB
$ rbd snap create backup/test-src@snap-1
Creating snap: 100% complete...done.
$ rbd du backup/test-src
NAME PROVISIONED USED
test-src@snap-1 10 GiB 4 MiB
test-src 10 GiB 0 B
<TOTAL> 10 GiB 4 MiB
$ echo "world" | sudo tee /dev/rbd0
world
$ rbd du backup/test-src
NAME PROVISIONED USED
test-src@snap-1 10 GiB 4 MiB
test-src 10 GiB 4 MiB
<TOTAL> 10 GiB 8 MiB
$ rbd export --export-format 2 backup/test-src - | rbd import --export-format 2 - backup/test-dst
Exporting image: 100% complete...done.
Importing image: 100% complete...done.
$ rbd du backup/test-dst
NAME PROVISIONED USED
test-dst@snap-1 10 GiB 4 MiB
test-dst 10 GiB 4 MiB
<TOTAL> 10 GiB 8 MiB
```
Thanks!
Tony
Hello,
I am trying to evaluate the performance of diff_iterate. This will affect our following decision on achieve "Copy local rbd image to remote ceph pool".
Our current proposal is to first send a table which contains snapshot info and all the offset and length obtained from image.diff_iterate to remote site. Then remote site will request each extent from local site.
Based on what we have tested, we know that diff_iterate is quite fast to return all extent changed. However, we would like to know more about how it works inside?
Assume that a image has snap1(oldest), snap2 and snap3(latest). If we do a diff _iterate between snap3 to snap1, what does it do underneath?
I suspect that for each snapshot, there is some sort of table that record all the extents for that snapshot. diffe_iterate is basically comparing 2 tables of different snapshots?? Am I correct?
Please share more insights on this topic to help me get deeper understanding of diff_iterate and how a snapshot is created.
Thanks,
Jon Liu
Hello,
- gibba nodes are used inefficiently
- used a lot closer to the end of the major release cycle (or for
specific projects, e.g. mclock), but largely idle in the middle of
the release cycle
- a considerable waste of hardware resources if used only to exercise
upgrading to some (currently reef) backport releases
- proposal to release gibba nodes for teuthology (Patrick)
- for special-purpose suites where jobs require more nodes and/or
more time than usual (e.g. running for 10h with 6-8 nodes)?
- run tests for different components on the same cluster
concurrently, this is lacking today except for a few bits in
upgrade suites
- ... or even just existing suites (Casey)
- need Neha to weigh in as gibba cluster caretaker
- 18.2.1 blockers
- MDS crashing on old kernel clients
- https://github.com/ceph/ceph/pull/54677 is a temporary stop-gap
change in smoke and powercycle suites needed for reproducing
- increases the number of jobs in reef (scheduling with --subset
would defeat the purpose of the change)
- needs ack from core
- https://github.com/ceph/ceph/pull/54407 is the fix
- Venky to test with amended smoke suite, merge and hand off to
Yuri for LRC upgrade
- discussion on test suite changes would be held separately
- https://tracker.ceph.com/issues/63618 (next item)
- potential data corruption in bluestore (!!!)
- can occur under heavy fragmentation if db is co-located with the
main device or after bluefs spillover to the main device, when the
main device is configured with 64k alloc size
- affects OSDs that were upgraded without redeploying from octopus
and earlier releases
- a crash on ceph_assert(available >= allocated) during OSD startup
is an indicator
- more likely than actual data corruption? (Igor)
- Laura to check telemetry for instances of this assert
- assumed to be caused by https://github.com/ceph/ceph/pull/48854
which shipped in 18.2.0 and was backported to 16.2.14 and 17.2.6,
meaning that all release streams are vulnerable
- tracked in https://tracker.ceph.com/issues/63618 (hit on 17.2.7)
- https://tracker.ceph.com/issues/62282 was hit by Adam on 17.2.6,
Igor believes the root cause to be the same
- for now, this is a blocker for 16.2.15 and 18.2.1
- might necessitate hot fixes (also for quincy)
- regression for RHEL tests on main ("nothing provides lua-devel")
- https://tracker.ceph.com/issues/63672
- 42 pacific PRs left to be triaged
- https://github.com/ceph/ceph/pulls?q=is%3Aopen+is%3Apr+milestone%3Apacific
- move to v16.2.15 milestone or close PR and reject backport
Thanks,
Ilya
Hi,
The context is RBD on bluestore. I did check extent on Wiki.
I see "extent" when talking about snapshot and export/import.
For example, when create a snapshot, we mark extents. When
there is write to marked extents, we will make a copy.
I also know that user data on block device maps to objects.
How "extent" and "object" are related?
Can I say extent is a set of continuous objects (with default tripe settings)?
Thanks!
Tony
The Monitor.cc has a function *monmap->contains()* which allows us to check
whether the mon with a given id exists. This function is not present in
*MDSMonitor.cc,* Any idea how to check whether mds with specific id(s)
exists?
Trying to fix: https://tracker.ceph.com/issues/42597
++adding
@ceph-users-confirm+4555fdc6282a38c849f4d27a40339f1b7e4bde74@ceph.io
<ceph-users-confirm+4555fdc6282a38c849f4d27a40339f1b7e4bde74(a)ceph.io>
++Adding dev(a)ceph.io
Thanks,&, Regards
Arihant Jain
On Mon, 27 Nov, 2023, 7:48 am AJ_ sunny, <jains8550(a)gmail.com> wrote:
> Hi team,
>
> After doing above changes I am still getting the issue in which machine
> continuously went shutdown
>
> In nova-compute logs I am getting only this footprint
>
> Logs:-
> 2023-10-16 08:48:10.971 7 WARNING nova.compute.manager
> [req-c7b731db-2b61-400e-917f-8645c9984696 f226d81a45dd46488fb2e19515 848
> 316d215042914de190f5f9e1c8466bf0 default default] [instance:
> 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3] Received unexpected - vent
> network-vif-plugged-f191f6c8-dff5-4c1b-94b3-8d91aa6ff5ac for instance with
> vm_state active and task_state None. 2023-10-21 22:42:44.589 7 INFO
> nova.compute.manager [-] [instance: 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3]
> VM Stopped (Lifecyc Event)
>
> 2023-10-21 22:42:44.683 7 INFO nova.compute.manager
> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -] [instance: 4b04d3f1-
> fbd-4b63-b693-a0ef316ecff3] During _sync_instance_power_state the DB
> power_state (1) does not match the vm_power_state from ti e hypervisor (4).
> Updating power_state in the DB to match the hypervisor.
>
> 2023-10-21 22:42:44.811 7 WARNING nova.compute.manager
> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d ----] [instance: 4b04d3f
> 1-1fbd-4b63-b693-a0ef316ecff3] Instance shutdown by itself. Calling the
> stop API. Current vm_state: active, current task_state : None, original DB
> power_state: 1, current VM power_state: 4 2023-10-21 22:42:44.977 7 INFO
> nova.compute.manager [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -]
> [instance: 4b04d3f1-1
>
> fbd-4b63-b693-a0ef316ecff3] Instance is already powered off in the
> hypervisor when stop is called.
>
>
> And in this architecture we are using ceph is the backend storage for
> Nova,glance & cinder
> When machine auto goes down and if i try to start the machine it will go
> in error i.e. in Vm console is show I/O ERROR during boot so first we need
> to rebuild the volume from ceph side then I have to start the machine
> Rbd object-map rebuild<volume-id>
> Openstack server start <server-id>
>
> So this issue is showing two faces one from ceph side and another from
> nova-compute log
> can someone please help me out to fix out this issue asap
>
> Thanks & Regards
> Arihant Jain
>
> On Tue, 24 Oct, 2023, 4:56 pm , <smooney(a)redhat.com> wrote:
>
>> On Tue, 2023-10-24 at 10:11 +0530, AJ_ sunny wrote:
>> > Hi team,
>> >
>> > Vm is not shutting off by owner from inside its automatically went to
>> > shutdown i.e. libvirt lifecycle stop event triggering
>> > In my nova.conf configuration I am using ram_allocation_ratio = 1.5
>> > And previously I tried to set in nova.conf
>> > Sync_power_state_interval = -1 but still facing the same problem
>> > OOM might be causing this issue
>> > Can you please give me some idea to fix this issue if OOM is the cause
>> the general answer is swap.
>>
>> nova should alwasy be deployed with swap even if you do not have over
>> commit enabled.
>> there are a few reason for this the first being python allocates memory
>> diffently if
>> any swap is aviable, even 1G is enough to have it not try to commit all
>> memory. so
>> when swap is aviable the nova/neutron agents will use much less resident
>> memeory even with
>> out usign any of the swap space.
>>
>> we have some docs about this downstream
>>
>> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17…
>>
>> if you are being ultra conservative we recommend allocating (ram *
>> allocation ratio) in swap so in your case allcoate
>> 1.5 times your ram as swap. we woudl expect the actul useage of swap to
>> be a small fraction of that however so we
>> also provide a formula for
>>
>> overcommit_ratio = NovaRAMAllocationRatio - 1
>> Minimum swap size (MB) = (total_RAM * overcommit_ratio) +
>> RHEL_min_swap
>> Recommended swap size (MB) = total_RAM * (overcommit_ratio +
>> percentage_of_RAM_to_use_for_swap)
>>
>> so say your host had 64G of ram with an allocation ratio of 1.5 and a min
>> swap percentaiong of 25%
>> the conserviver swap recommentation would be
>>
>> (64*(0.5+0.25)) + disto_min_swap
>> (64*0.75) + 4G = 52G of recommended swap
>>
>> if your wondering why we add a min swap precentage and disto min swap its
>> basically to acocund for the
>> Qemu and host OS memory overhead as well as the memory used by the
>> nova/neutron agents and libvirt/ovs
>>
>>
>> if your not using memory over commit my general recommdation is if you
>> have less then 64G of ram allcoate 16G if you
>> have more then 256G of ram allocate 64G and you should be fine. when you
>> do use memofy over commit you must
>> have at least enouch swap to account for the qemu overhead of all
>> instance + the over committed memory.
>>
>>
>> the other common cause of OOM errors is if you are using numa affinity
>> and the guest dont request
>> hw:mem_page_size=<something> without setting a mem_page_size request we
>> dont do numa aware memory placement. the kernel
>> OOM system works
>> on a per numa node basis, numa affintiy does not supprot memory over
>> commit either so that is likly not your issue.
>> i jsut said i woudl mention it to cover all basis.
>>
>> regards
>> sean
>>
>>
>>
>> >
>> >
>> > Thanks & Regards
>> > Arihant Jain
>> >
>> > On Mon, 23 Oct, 2023, 11:29 pm , <smooney(a)redhat.com> wrote:
>> >
>> > > On Mon, 2023-10-23 at 13:19 -0400, Jonathan Proulx wrote:
>> > > >
>> > > > I've seen similar log traces with overcommitted memory when the
>> > > > hypervisor runs out of physical memory and OOM killer gets the VM
>> > > > process.
>> > > >
>> > > > This is an unusuall configuration (I think) but if the VM owner
>> claims
>> > > > they didn't power down the VM internally you might look at the local
>> > > > hypevisor logs to see if the VM process crashed or was killed for
>> some
>> > > > other reason.
>> > > yep OOM events are one common causes fo this.
>> > >
>> > > nova is bacialy just saying "hay you said this vm should be active
>> but its
>> > > not, im going to update the db to reflect
>> > > reality." you can turn that off with
>> > >
>> > >
>> https://docs.openstack.org/nova/latest/configuration/config.html#workaround…
>> > > or
>> > >
>> > >
>> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.sy…
>> > > either disabel the sync via setign the interval to -1
>> > > or disable haneling the virt lifecycle events.
>> > >
>> > > i would recommend the sync_power_state_interval approach but again if
>> vms
>> > > are stopping
>> > > and you dont know why you likely should discover why rahter then just
>> > > turning if the update of the nova db to reflect
>> > > the actual sate.
>> > >
>> > > >
>> > > > -Jon
>> > > >
>> > > > On Mon, Oct 23, 2023 at 02:02:26PM +0100, smooney(a)redhat.com wrote:
>> > > > :On Mon, 2023-10-23 at 17:45 +0530, AJ_ sunny wrote:
>> > > > :> Hi team,
>> > > > :>
>> > > > :> I am using openstack kolla ansible on wallaby version and
>> currently I
>> > > am
>> > > > :> facing issue with virtual machine, vm is shutoff by itself and
>> and
>> > > from log
>> > > > :> it seems libvirt lifecycle stop event triggering again and again
>> > > > :>
>> > > > :> Logs:-
>> > > > :> 2023-10-16 08:48:10.971 7 WARNING nova.compute.manager
>> > > > :> [req-c7b731db-2b61-400e-917f-8645c9984696
>> f226d81a45dd46488fb2e19515
>> > > 848
>> > > > :> 316d215042914de190f5f9e1c8466bf0 default default] [instance:
>> > > > :> 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3] Received unexpected - vent
>> > > > :> network-vif-plugged-f191f6c8-dff5-4c1b-94b3-8d91aa6ff5ac for
>> instance
>> > > with
>> > > > :> vm_state active and task_state None. 2023-10-21 22:42:44.589 7
>> INFO
>> > > > :> nova.compute.manager [-] [instance:
>> > > 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3]
>> > > > :> VM Stopped (Lifecyc Event)
>> > > > :>
>> > > > :> 2023-10-21 22:42:44.683 7 INFO nova.compute.manager
>> > > > :> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -] [instance: 4b04d3f1-
>> > > > :> fbd-4b63-b693-a0ef316ecff3] During _sync_instance_power_state
>> the DB
>> > > > :> power_state (1) does not match the vm_power_state from ti e
>> > > hypervisor (4).
>> > > > :> Updating power_state in the DB to match the hypervisor.
>> > > > :>
>> > > > :> 2023-10-21 22:42:44.811 7 WARNING nova.compute.manager
>> > > > :> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d ----] [instance:
>> 4b04d3f
>> > > > :> 1-1fbd-4b63-b693-a0ef316ecff3] Instance shutdown by itself.
>> Calling
>> > > the
>> > > > :> stop API. Current vm_state: active, current task_state : None,
>> > > original DB
>> > > > :> power_state: 1, current VM power_state: 4 2023-10-21
>> 22:42:44.977 7
>> > > INFO
>> > > > :> nova.compute.manager [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -]
>> > > > :> [instance: 4b04d3f1-1
>> > > > :>
>> > > > :> fbd-4b63-b693-a0ef316ecff3] Instance is already powered off in
>> the
>> > > > :> hypervisor when stop is called.
>> > > > :
>> > > > :that sounds like the guest os shutdown the vm.
>> > > > :i.e. somethign in the guest ran sudo poweroff
>> > > > :then nova detected teh vm was stoped by the user and updated its
>> db to
>> > > match
>> > > > :that.
>> > > > :
>> > > > :that is the expected beahvior wehn you have the power sync enabled.
>> > > > :it is enabled by default.
>> > > > :>
>> > > > :>
>> > > > :> Thanks & Regards
>> > > > :> Arihant Jain
>> > > > :> +91 8299719369
>> > > > :
>> > > >
>> > >
>> > >
>>
>>
Hi,
src-image is 1GB (provisioned size). I did the following 3 tests.
1. rbd export src-image - | rbd import - dst-image
2. rbd export --export-format 2 src-image - | rbd import --export-format 2 - dst-image
3. rbd export --export-format 2 src-image - | rbd import - dst-image
With #1 and #2, dst-image size (rbd info) is the same as src-image, which is expected.
With #3, dst-image size (rbd info) is close to used size (rbd du), not the provisioned
size of src-image. I'm not sure if this image is actually useable when write into it.
The questions is that, is #3 not supposed to be used at all?
I checked doc, didn't see something like "--export-format 2 has to be used for
importing the image which is exported with --export-format 2 option".
Any comments?
Thanks!
Tony