Reviving this old thread.
I still think this is something we should consider as users still
experience problems:
* Impossible to 'pin' to a version. User installs 14.2.0 and 4 months
later they add other nodes but version moved to 14.2.2
* Impossible to use a version that is not what the latest is (e.g. if
someone doesn't need the release from Monday, but wants the one from 6
months ago), similar to the above
* When a release is underway, the repository breaks because syncing
packages takes hours. The operation is not atomic.
* It is not currently possible to "remove" a bad release, in the past,
this means cutting a new release as soon as possible, which can take
days
The latest issue (my fault!) was to cut a release and get the packages
out without communicating with the release manager, which caused users
to note there is a new version *as soon as it was up* vs, a
process that could've not touched the 'latest' url until the
announcement goes out.
If you have been affected by any of these issues (or others I didn't
come up with), please let us know in this thread so that we can find
some common ground and try to improve the process.
Thanks!
On Tue, Jul 24, 2018 at 10:38 AM Alfredo Deza <adeza(a)redhat.com> wrote:
>
> Hi all,
>
> After the 12.2.6 release went out, we've been thinking on better ways
> to remove a version from our repositories to prevent users from
> upgrading/installing a known bad release.
>
> The way our repos are structured today means every single version of
> the release is included in the repository. That is, for Luminous,
> every 12.x.x version of the binaries is in the same repo. This is true
> for both RPM and DEB repositories.
>
> However, the DEB repos don't allow pinning to a given version because
> our tooling (namely reprepro) doesn't construct the repositories in a
> way that this is allowed. For RPM repos this is fine, and version
> pinning works.
>
> To remove a bad version we have to proposals (and would like to hear
> ideas on other possibilities), one that would involve symlinks and the
> other one which purges the known bad version from our repos.
>
> *Symlinking*
> When releasing we would have a "previous" and "latest" symlink that
> would get updated as versions move forward. It would require
> separation of versions at the URL level (all versions would no longer
> be available in one repo).
>
> The URL structure would then look like:
>
> debian/luminous/12.2.3/
> debian/luminous/previous/ (points to 12.2.5)
> debian/luminous/latest/ (points to 12.2.7)
>
> Caveats: the url structure would change from debian-luminous/ to
> prevent breakage, and the versions would be split. For RPMs it would
> mean a regression if someone is used to pinning, for example pinning
> to 12.2.2 wouldn't be possible using the same url.
>
> Pros: Faster release times, less need to move packages around, and
> easier to remove a bad version
>
>
> *Single version removal*
> Our tooling would need to go and remove the known bad version from the
> repository, which would require to rebuild the repository again, so
> that the metadata is updated with the difference in the binaries.
>
> Caveats: time intensive process, almost like cutting a new release
> which takes about a day (and sometimes longer). Error prone since the
> process wouldn't be the same (one off, just when a version needs to be
> removed)
>
> Pros: all urls for download.ceph.com and its structure are kept the same.
Hi,
I need your advice about the following setup.
Currently, we have a Ceph nautilus cluster used by Openstack Cinder with
single NIC in 10Gbps on OSD hosts.
We will upgrade the cluster by adding 7 new hosts dedicated to
Nova/Glance and we would like to add a cluster network to isolate
replication and recovery traffic.
For now, it's not possible to add a second NIC and FC so we are thinking
about enabling DELL NPAR [1] which allows splitting a single physical
NIC in 2 logical NICs (1 for public network and 1 for Cluster network).
We can set max and min bandwidth and implement dynamic bandwidth
balancing for NPAR to get the appropriate bandwidth when Ceph need it
(default alloc is 66% for cluster network and 34% for public network).
Any experiences with this kind of configuration? Do you see any
disadvantages doing this?
And one question, if we put this in production, adding cluster network
value in ceph.conf and restarting each OSD is enough for Ceph?
Best,
Adrien
[1]
https://www.dell.com/support/article/fr/fr/frbsdt1/how12596/how-npar-works?…
Hi all,
I'm happy to announce that next Oct 16th we will have the Ceph Day
Argentina in Buenos Aires. The event will be held in the Museo de
Informatica de Argentina, so apart from hearing the latest features from
core developers, real use cases from our users and usage experiences from
customers and partners, you will be able to enjoy and contribute to this
fantastic museum that holds a great collection of vintage hardware.
If you are a local and/or you are in the area, we really hope you can join
us!
CFP is open in
https://forms.zohopublic.com/thingee/form/CephDayArgentina2019/formperma/yf…
Ceph Day Buenos Aires site is available in
https://ceph.io/cephdays/ceph-day-argentina-2019/
Cheers,
Victoria
Hi everyone,
We are running Nautilus 14.2.2 with 6 nodes and a total of 44 OSDs, all are 2TB spinning disks.
# ceph osd count-metadata osd_objectstore
"bluestore": 44
# ceph osd pool get one size
size: 3
# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 80 TiB 33 TiB 47 TiB 47 TiB 58.26
TOTAL 80 TiB 33 TiB 47 TiB 47 TiB 58.26
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
one 2 15 TiB 4.06M 47 TiB 68.48 7.1 TiB
bench 5 250 MiB 67 250 MiB 0 21 TiB
Why pool's stats are showing incorrect values for %USED and MAX AVAIL?
They should be much bigger.
The first 24 OSDs was created on jewell release and the osd_objectstore was 'filestore'.
While we were with mimic release, we added 20 more 'bluestore' OSDs. The first 24 was destroyed and recreated as 'bluestore'.
After the upgrade from mimic release, all the OSD's was updated with ceph-bluestore-tool repair.
The incorrect values appeared after the upgrade from 14.2.1 to 14.2.2.
Any help will be appreciated :)
BR,
NAlexandrov
Hi,
I'm facing several issues with my ceph cluster (2x MDS, 6x ODS).
Here I would like to focus on the issue with pgs backfill_toofull.
I assume this is related to the fact that the data distribution on my
OSDs is not balanced.
This is the current ceph status:
root@ld3955:~# ceph -s
cluster:
id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
health: HEALTH_ERR
1 MDSs report slow metadata IOs
78 nearfull osd(s)
1 pool(s) nearfull
Reduced data availability: 2 pgs inactive, 2 pgs peering
Degraded data redundancy: 304136/153251211 objects degraded
(0.198%), 57 pgs degraded, 57 pgs undersized
Degraded data redundancy (low space): 265 pgs backfill_toofull
3 pools have too many placement groups
74 slow requests are blocked > 32 sec
80 stuck requests are blocked > 4096 sec
services:
mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 98m)
mgr: ld5505(active, since 3d), standbys: ld5506, ld5507
mds: pve_cephfs:1 {0=ld3976=up:active} 1 up:standby
osd: 368 osds: 368 up, 367 in; 302 remapped pgs
data:
pools: 5 pools, 8868 pgs
objects: 51.08M objects, 195 TiB
usage: 590 TiB used, 563 TiB / 1.1 PiB avail
pgs: 0.023% pgs not active
304136/153251211 objects degraded (0.198%)
1672190/153251211 objects misplaced (1.091%)
8564 active+clean
196 active+remapped+backfill_toofull
57 active+undersized+degraded+remapped+backfill_toofull
35 active+remapped+backfill_wait
12 active+remapped+backfill_wait+backfill_toofull
2 active+remapped+backfilling
2 peering
io:
recovery: 18 MiB/s, 4 objects/s
Currently I'm using 6 OSD nodes.
Node A
48x 1.6TB HDD
Node B
48x 1.6TB HDD
Node C
48x 1.6TB HDD
Node D
48x 1.6TB HDD
Node E
48x 7.2TB HDD
Node F
48x 7.2TB HDD
Question:
Is it advisable to distribute the drives equally over all nodes?
If yes, how should this be executed w/o ceph disruption?
Regards
Thomas
Hello,
I'm running ceph 14.2.3 on six hosts with each four osds. I did recently
upgrade this from four hosts.
The cluster is running fine. But i get this in my logs:
Sep 11 11:02:41 ceph1 ceph-mon[1333]: 2019-09-11 11:02:41.953 7f26023a6700
-1 verify_upmap number of buckets 5 exceeds desired 4
Sep 11 11:02:41 ceph1 ceph-mon[1333]: 2019-09-11 11:02:41.953 7f26023a6700
-1 verify_upmap number of buckets 5 exceeds desired 4
Sep 11 11:02:41 ceph1 ceph-mon[1333]: 2019-09-11 11:02:41.953 7f26023a6700
-1 verify_upmap number of buckets 5 exceeds desired 4
It looks like the balancer is not doing any work.
Here are some infos about the cluster:
ceph1 ~ # ceph osd crush rule ls
replicated_rule
cephfs_ec
ceph1 ~ # ceph osd crush rule dump replicated_rule
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
ceph1 ~ # ceph osd crush rule dump cephfs_ec
{
"rule_id": 1,
"rule_name": "cephfs_ec",
"ruleset": 1,
"type": 3,
"min_size": 8,
"max_size": 8,
"steps": [
{
"op": "set_chooseleaf_tries",
"num": 5
},
{
"op": "set_choose_tries",
"num": 100
},
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "choose_indep",
"num": 4,
"type": "host"
},
{
"op": "choose_indep",
"num": 2,
"type": "osd"
},
{
"op": "emit"
}
]
}
ceph1 ~ # ceph osd erasure-code-profile ls
default
isa_62
ceph1 ~ # ceph osd erasure-code-profile get default
k=2
m=1
plugin=jerasure
technique=reed_sol_van
ceph1 ~ # ceph osd erasure-code-profile get isa_62
crush-device-class=
crush-failure-domain=osd
crush-root=default
k=6
m=2
plugin=isa
technique=reed_sol_van
The idea with four hosts was that the ec profile should take two osds on
each host for the eight buckets.
Now with six hosts i guess two hosts will have tow buckets on two osds and
four hosts will have each one bucket for a piece of data.
Any idea how to resolve this?
Regards
Eric
Hi everyone,
I'm configurating ISCSI gateway in Ceph Mimic (13.2.6) using ceph manual:
https://docs.ceph.com/docs/mimic/rbd/iscsi-target-cli/
But i stopped in this problem: In manual says:
"Set the client’s CHAP username to myiscsiusername and password to
myiscsipassword:
> /iscsi-target...at:rh7-client> auth chap=myiscsiusername/myiscsipassword"
But I receive this response:
/iscsi-target...at:rh7-client> auth chap=myiscsitest/myiscsitestpasswd
Unexpected keyword parameter 'chap'.
The options disponibles are:
/iscsi-target...at:rh7-client> auth ?
To set authentication, specify username=<user> password=<password>
[mutual_username]=<user> [mutual_password]=<password>
But if configure as asks:
auth username=myiscsitest password=myiscsitestpasswd
Failed to update the client's auth: Invalid password
I tried with high password complexibility, but the problem persists.
My questions:
- How is the correct mode for configure authentication?
- How contribute for update of documentation? A bug report has opened* for
broken information of instalation of ceph-iscsi-gw, but was closed without
update of documentation:https://github.com/ceph/ceph-ansible/issues/2707
Regards
Gesiel Bernardeds
Hi All,
I have a question about "orphaned" objects in default.rgw.buckets.data pool.
Few days ago i ran "radosgw-admin orphans find ..."
[dc-1 root@mon-1 tmp]$ radosgw-admin orphans list-jobs
[
"orphans-find-1"
]
Today I checked the result. I listed orphaned objects by command:
$# for i in `rados -p default.rgw.log ls |grep
orphan.scan.orphans-find-1.rados`; do rados -p default.rgw.log listomapkeys
$i; done > orphaned_objects.txt
There are a lot: xxx.__shadow_.yyy objects.
Is it possible to check if these __shadow_ are orphaned (can be removed) or
belongs to any good object?
How to check if a shadow object is still in use and what object it belongs
to?
Some of them are very old.
For example:
default.rgw.buckets.data/23a033d8-1146-2345-9f94-81383220c334.3130618.2__shadow_.-GOuvdWROljudUgTgq6u6wRR-lHoxU0_1
mtime 2017-08-16 04:01:49.000000, size 4194304
default.rgw.buckets.data/23a033d8-1146-2345-9f94-81383220c334.3130618.2__shadow_.-GOuvdWROljudUgTgq6u6wRR-lHoxU0_2
mtime 2017-08-16 04:01:49.000000, size 4194304
default.rgw.buckets.data/23a033d8-1146-2345-9f94-81383220c334.3130618.2__shadow_.-GOuvdWROljudUgTgq6u6wRR-lHoxU0_4
mtime 2017-08-16 04:01:49.000000, size 4194304
default.rgw.buckets.data/23a033d8-1146-2345-9f94-81383220c334.3130618.2__shadow_.-GOuvdWROljudUgTgq6u6wRR-lHoxU0_5
mtime 2017-08-16 04:01:49.000000, size 4194304
default.rgw.buckets.data/23a033d8-1146-2345-9f94-81383220c334.3130618.2__shadow_.-GOuvdWROljudUgTgq6u6wRR-lHoxU0_6
mtime 2017-08-16 04:01:49.000000, size 4194304
Best regards,
PO
Hi,
I am using ceph mimic in a small test setup using the below configuration.
OS: ubuntu 18.04
1 node running (mon,mds,mgr) + 4 core cpu and 4GB RAM and 1 Gb lan
3 nodes each having 2 osd's, disks are 2TB + 2 core cpu and 4G RAM and 1
Gb lan
1 node acting as cephfs client + 2 core cpu and 4G RAM and 1 Gb lan
configured cephfs_metadata_pool (3 replica) and cephfs_data_pool erasure
2+1.
When running a script doing multiple folders creation ceph started throwing
error late IO due to high metadata workload.
once after folder creation complete PG's degraded and I am waiting for PG
to complete recovery but my OSD's starting to crash due to OOM and
restarting after some time.
Now my question is I can wait for recovery to complete but how do I stop
OOM and OSD crash? basically want to know the way to control memory usage
during recovery and make it stable.
I have also set very low PG metadata_pool 8 and data_pool 16.
I have already set "mon osd memory target to 1Gb" and I have set
max-backfill from 1 to 8.
Attached msg from "kern.log" from one of the node and snippet of error msg
in this mail.
---------error msg snippet ----------
-bash: fork: Cannot allocate memory
Sep 18 19:01:57 test-node1 kernel: [341246.765644] msgr-worker-0 invoked
oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null),
order=0, oom_score_adj=0
Sep 18 19:02:00 test-node1 kernel: [341246.765645] msgr-worker-0 cpuset=/
mems_allowed=0
Sep 18 19:02:00 test-node1 kernel: [341246.765650] CPU: 1 PID: 1737 Comm:
msgr-worker-0 Not tainted 4.15.0-45-generic #48-Ubuntu
Sep 18 19:02:02 test-node1 kernel: [341246.765833] Out of memory: Kill
process 1727 (ceph-osd) score 489 or sacrifice child
Sep 18 19:02:03 test-node1 kernel: [341246.765919] Killed process 1727
(ceph-osd) total-vm:3483844kB, anon-rss:1992708kB, file-rss:0kB,
shmem-rss:0kB
Sep 18 19:02:03 test-node1 kernel: [341246.899395] oom_reaper: reaped
process 1727 (ceph-osd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Sep 18 22:09:57 test-node1 kernel: [352529.433155] perf: interrupt took too
long (4965 > 4938), lowering kernel.perf_event_max_sample_rate to 40250
regards
Amudhan