Hello
I have ~300TB of data in default.rgw.buckets.data k2m2 pool and I would
like to move it to a new k5m2 pool.
I found instructions using cache tiering[1], but they come with a vague
scary warning, and it looks like EC-EC may not even be possible [2] (is
it still the case?).
Can anybody recommend a safe procedure to copy an EC pool's data to
another pool with a more efficient erasure coding? Perhaps there is a
tool out there that could do it?
A few days of downtime would be tolerable, if it will simplify things.
Also, I have enough free space to temporarily store the k2m2 data in a
replicated pool (if EC-EC tiering is not possible, but EC-replicated and
replicated-EC tiering is possible).
Is there a tool or some efficient way to verify that the content of two
pools is the same?
Thanks,
Vlad
[1] https://ceph.io/geen-categorie/ceph-pool-migration/
[2]
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-February/016109.ht…
Hi all,
A while back, I indicated we had an issue with our cluster filling up too
fast. After checking everything, we've concluded this was because we had a
lot of small files and the allocation size on the bluestore was too high
(64kb).
We are now recreating the OSD's (2 disk at the same time) but, this will
take a very long time as we're dealing with 130 OSDs.
The current process we're following is removing 2 osd's and recreating them.
We're using erasure coding (6 + 3).
Has anyone some advice on how we can move forward with this? We've already
increased some parameters to speed up recovery, but even then, it would
still cost us too much time.
If we could recreate them faster, that would be great... Or adapt the
allocation on the fly?
Any suggestions are welcome...
Thank you,
Kristof.
Hi,
we're seeing active mds failover to standbys every few weeks causing few minutes of cephfs downtime. It's not crashing, all the log says is.
2020-02-25 08:30:53.313 7f9a457ae700 1 mds.m2-1045557 Updating MDS map to version 10132 from mon.1
2020-02-25 08:30:53.313 7f9a457ae700 1 mds.m2-1045557 Map has assigned me to become a standby
Anyone got any idea why that is? It's not so nice that ceph does it by itself.
Thanks
Hey all, we're excited to be returning properly to SCaLE in
Pasadena[1] this year (March 5-8) with a Thursday Birds-of-a-Feather
session[2] and a booth in the expo hall. Please come by if you're
attending the conference or are in the area to get face time with
other area users and Ceph developers. :)
Also, I got drafted into organizing this so if you'd be willing to
help man the booth in exchange for an Expo pass, shoot me an email! I
think I've got 3 spots left.
-Greg
[1]: https://www.socallinuxexpo.org/scale/18x
[2]: https://www.socallinuxexpo.org/scale/18x/presentations/ceph-storage
On our test cluster after upgrading to 14.2.5 I'm having problems with the mons pegging a CPU core while moving data around. I'm currently converting the OSDs from FileStore to BlueStore by marking the OSDs out in multiple nodes, destroying the OSDs, and then recreating them with ceph-volume lvm batch. This seems too get the ceph-mon process into a state where it pegs a CPU core on one of the mons:
1764450 ceph 20 0 4802412 2.1g 16980 S 100.0 28.1 4:54.72 ceph-mon
Has anyone else run into this with 14.2.5 yet? I didn't see this problem while the cluster was running 14.2.4.
Thanks,
Bryan
Dear all,
we recently added two additional RGWs to our CEPH cluster (version
14.2.7). They work flawlessly, however they do not show up in 'ceph
status':
[cephmon1] /root # ceph -s | grep -A 6 services
services:
mon: 3 daemons, quorum cephmon1,cephmon2,cephmon3 (age 14h)
mgr: cephmon1(active, since 14h), standbys: cephmon2, cephmon3
mds: cephfs:1 {0=cephmon1=up:active} 2 up:standby
osd: 168 osds: 168 up (since 2w), 168 in (since 6w)
rgw: 1 daemon active (ceph-s3)
As you can see, only the first, old RGW (ceph-s3) is listed. Is there
any place where the RGWs need to get "announced"? Any idea, how to
debug this?
Thanks,
Andreas
--
| Andreas Haupt | E-Mail: andreas.haupt(a)desy.de
| DESY Zeuthen | WWW: http://www-zeuthen.desy.de/~ahaupt
| Platanenallee 6 | Phone: +49/33762/7-7359
| D-15738 Zeuthen | Fax: +49/33762/7-7216
ceph version 12.2.13 luminous (stable)
My whole ceph cluster went to kind of read only state. Ceph status showed that client reads is 0 op/s for whole cluster. There was normal amount of writes going on.
I checked health and it said:
# ceph health detail
HEALTH_WARN Reduced data availability: 1 pg inactive, 1 pg peering
PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg peering
pg 26.13b is stuck peering for 25523.506788, current state peering, last acting [2,0,33]
All osds showed to be up and all monitors are good. All pools are 3/2 (size/min) and space usage ~30%.
I fixed this by restarting forst osd.2 (nothing happened) and then restarted osd.0. After that everyting went back to normal.
So what can cause "stuck peering" and how can i prevent this event from happening again?
I have a 3 node ceph cluster for my house that I have been using for a few
years now without issue. Each node is a MON, MGR, and MDS, and has 2-3 OSDs
on them. It has, however been slow. I decided to finally move the bluestore
DBs to SSDs. I did one OSD as a test case to make sure everything was going
to go OK. I deleted the OSD, then created a new OSD using the ceph-deploy
tool and pointed the DB at a LVM partition on a SSD.
Everything went OK, and recovery started. Later in the day I noticed that
my MDS daemon is damaged (PGs are still recovering).
I've tried the cephfs-journal-tool --rank=cephfs:all journal export
backup.bin command, but it gave me:
2020-02-23 17:50:03.589 7f7d8b225740 -1 Missing object 200.00c30b6d
2020-02-23 17:50:07.919 7f7d8b225740 -1 Bad entry start ptr
(0x30c2dbb92003) at 0x30c2d3a125ea
(both lines have several repeats)
and will not complete.
Looking at the log file of the mds that was active at the time shows:
2020-02-23 17:13:09.091 7fad40029700 0 mds.0.journaler.mdlog(ro)
_finish_read got error -2
2020-02-23 17:13:09.091 7fad40029700 0 mds.0.journaler.mdlog(ro)
_finish_read got error -2
2020-02-23 17:13:09.091 7fad40029700 0 mds.0.journaler.mdlog(ro)
_finish_read got error -2
2020-02-23 17:13:09.091 7fad3e826700 0 mds.0.log _replay journaler got
error -2, aborting
2020-02-23 17:13:09.091 7fad3e826700 -1 log_channel(cluster) log [ERR] :
missing journal object
One other thing that happened about the same time, I noticed I was having
memory pressure on all the nodes with only 200MB of free ram. I've tweaked
the bluestore osd_memory_target to try to help that not happen again. Even
so, I'm a bit confused how that could cause catastrophic failure, as I had
2 other MDSes on standby.
Any help would be appreciated.
Hi,
I am looking for a good way of migrating/realocating ceph cluster. It
has about 2PB net, mainly RBD, but object storage is also used. The new
location is far away about 1,500 kilometers. Of course I have to
minimize the downtime of the cluster :)
Right now I see following scenarios:
1. Build identical cluster. Freeze the source. Copy everything with
cppool/rbd mirror. Relocate servers and the power on them.
2. Run cluster mirroring over network.
3. Using cache tier.
I'm starting research about this migration, so probably some of the
solutions above are unfeasible.
Maybe somebody has experience about migration between DC?
Any new ideas? Thougths?
Every comment will be helpful
--
Regards,
Rafał Wądołowski
Hi Troy,
Looks like we hit the same today -- Sage posted some observations
here: https://tracker.ceph.com/issues/39525#note-6
Did it happen again in your cluster?
Cheers, Dan
On Tue, Aug 20, 2019 at 2:18 AM Troy Ablan <tablan(a)gmail.com> wrote:
>
> While I'm still unsure how this happened, this is what was done to solve
> this.
>
> Started OSD in foreground with debug 10, watched for the most recent
> osdmap epoch mentioned before abort(). For example, if it mentioned
> that it just tried to load 80896 and then crashed
>
> # ceph osd getmap -o osdmap.80896 80896
> # ceph-objectstore-tool --op set-osdmap --data-path
> /var/lib/ceph/osd/ceph-77/ --file osdmap.80896
>
> Then I restarted the osd in foreground/debug, and repeated for the next
> osdmap epoch until it got past the first few seconds. This process
> worked for all but two OSDs. For the ones that succeeded I'd ^C and
> then start the osd via systemd
>
> For the remaining two, it would try loading the incremental map and then
> crash. I had presence of mind to make dd images of every OSD before
> starting this process, so I reverted these two to the state before
> injecting the osdmaps.
>
> I then injected the last 15 or so epochs of the osdmap in sequential
> order before starting the osd, with success.
>
> This leads me to believe that the step-wise injection didn't work
> because the osd had more subtle corruption that it got past, but it was
> confused when it requested the next incremental delta.
>
> Thanks again to Brad/badone for the guidance!
>
> Tracker issue updated.
>
> Here's the closing IRC dialogue re this issue (UTC-0700)
>
> 2019-08-19 16:27:42 < MooingLemur> badone: I appreciate you reaching out
> yesterday, you've helped a ton, twice now :) I'm still concerned
> because I don't know how this happened. I'll feel better once
> everything's active+clean, but it's all at least active.
>
> 2019-08-19 16:30:28 < badone> MooingLemur: I had a quick discussion with
> Josh earlier and he shares my opinion this is likely somehow related to
> these drives or perhaps controllers, or at least specific to these machines
>
> 2019-08-19 16:31:04 < badone> however, there is a possibility you are
> seeing some extremely rare race that no one up to this point has seen before
>
> 2019-08-19 16:31:20 < badone> that is less likely though
>
> 2019-08-19 16:32:50 < badone> the osd read the osdmap over the wire
> successfully but wrote it out to disk in a format that it could not then
> read back in (unlikely) or...
>
> 2019-08-19 16:33:21 < badone> the map "changed" after it had been
> written to disk
>
> 2019-08-19 16:33:46 < badone> the second is considered most likely by us
> but I recognise you may not share that opinion
> _______________________________________________
> ceph-users mailing list
> ceph-users(a)lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com