December 2020 - ceph-users

by Duncan Bellamy

Hi, I am trying to set up a new cluster with cephadm using a docker backend. The initial boot strap did not finish cleanly and it errored out waiting for the mon-ip, I used the command: cephadm bootstrap --mon-ip 192.168.0.1 With 192.168.0.1 being the ip address of this first host. I tried the command again but it failed as the new ceph node was actually running so it could not bind to the ports. After a bit of searching I was able to use "sudo cephadm shell —“ commands to change the username and password for the dashboard and login to it. I then used cephadm to add a new host with "sudo cephadm shell — ceph orch host add host2” Now in the inventory of the dashboard, and "ceph orch device ls” only devices on host2 are listed not host1. In the Cluster/Hosts section of the dashboard host1 has its root volume drive listed in devices, and host2 has the root volume drive and drive for the OSD listed. I successfully added an OSD with a drive on host2, trying the same command adjusted for host1 I get the following in the log: Dec 23 08:55:47 localhost systemd[1]: var-lib-docker-overlay2-91e9dffa86c333353dd6b445021c852d7ce8da6237d0d4d95909d68ef3d4fe23\x2dinit-merged.mount: Succeeded. Dec 23 08:55:47 localhost systemd[24638]: var-lib-docker-overlay2-91e9dffa86c333353dd6b445021c852d7ce8da6237d0d4d95909d68ef3d4fe23\x2dinit-merged.mount: Succeeded. Dec 23 08:55:47 localhost containerd[1470]: time="2020-12-23T08:55:47.369773808Z" level=info msg="shim containerd-shim started" address=/containerd-shim/80f876072532ebebdfef341a5c793654e27766f2d1708991a6f25599b24b6557.sock debug=false pid=28597 Dec 23 08:55:47 localhost bash[8745]: debug 2020-12-23T08:55:47.517+0000 ffff73d7a200 1 mon.host1(a)0(leader).osd e12 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 71303168 kv_alloc: 876609536 Dec 23 08:55:47 localhost containerd[1470]: time="2020-12-23T08:55:47.621748606Z" level=info msg="shim reaped" id=69a786e4a61605c1e6eca5a6e0e5ed0900635a214b0f1c96a4f26ea7911a12ff Dec 23 08:55:47 localhost dockerd[2930]: time="2020-12-23T08:55:47.631479207Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete" Dec 23 08:55:47 localhost systemd[24638]: var-lib-docker-overlay2-91e9dffa86c333353dd6b445021c852d7ce8da6237d0d4d95909d68ef3d4fe23-merged.mount: Succeeded. Dec 23 08:55:47 localhost systemd[1]: var-lib-docker-overlay2-91e9dffa86c333353dd6b445021c852d7ce8da6237d0d4d95909d68ef3d4fe23-merged.mount: Succeeded. Dec 23 08:55:47 localhost systemd[24638]: var-lib-docker-overlay2-64bb135bc0cdab187566992dc9870068dee1430062e1a2b484381c19e03da895\x2dinit-merged.mount: Succeeded. Dec 23 08:55:47 localhost systemd[1]: var-lib-docker-overlay2-64bb135bc0cdab187566992dc9870068dee1430062e1a2b484381c19e03da895\x2dinit-merged.mount: Succeeded. Dec 23 08:55:47 localhost containerd[1470]: time="2020-12-23T08:55:47.972437378Z" level=info msg="shim containerd-shim started" address=/containerd-shim/4a61d63e1f46722ffa7a950c31145d167c5c69087d003e5928a6aa3a4831f031.sock debug=false pid=28659 Dec 23 08:55:48 localhost bash[8745]: cluster 2020-12-23T08:55:46.892633+0000 mgr.host1.kkssvi (mgr.24098) 24278 : cluster [DBG] pgmap v24212: 1 pgs: 1 undersized+peered; 0 B data, 112 KiB used, 931 GiB / 932 GiB avail Dec 23 08:55:48 localhost bash[8756]: debug 2020-12-23T08:55:48.889+0000 ffff93573700 0 log_channel(cluster) log [DBG] : pgmap v24213: 1 pgs: 1 undersized+peered; 0 B data, 112 KiB used, 931 GiB / 932 GiB avail Dec 23 08:55:49 localhost bash[8756]: debug 2020-12-23T08:55:49.085+0000 ffff9056f700 0 log_channel(audit) log [DBG] : from='client.24206 -' entity='client.admin' cmd=[{"prefix": "orch daemon add osd", "svc_arg": "host1:/dev/nvme0n1", "target": ["mon-mgr", ""]}]: dispatch Dec 23 08:55:49 localhost bash[8745]: debug 2020-12-23T08:55:49.085+0000 ffff71575200 0 mon.host1@0(leader) e2 handle_command mon_command({"prefix": "osd tree", "states": ["destroyed"], "format": "json"} v 0) v1 Dec 23 08:55:49 localhost bash[8745]: debug 2020-12-23T08:55:49.085+0000 ffff71575200 0 log_channel(audit) log [DBG] : from='mgr.24098 192.168.0.1:0/2486989775' entity='mgr.host1.kkssvi' cmd=[{"prefix": "osd tree", "states": ["destroyed"], "format": "json"}]: dispatch Dec 23 08:55:49 localhost bash[8756]: debug 2020-12-23T08:55:49.089+0000 ffff8ed6d700 0 log_channel(cephadm) log [INF] : Found osd claims -> {} Dec 23 08:55:49 localhost bash[8756]: debug 2020-12-23T08:55:49.089+0000 ffff8ed6d700 0 log_channel(cephadm) log [INF] : Found osd claims for drivegroup None -> {} Dec 23 08:55:49 localhost containerd[1470]: time="2020-12-23T08:55:49.331868093Z" level=info msg="shim reaped" id=780a38dd49fce4a823c4c3d834abdd1cc17bbe0c0aa4f2dd7caeddf8dce1708e Dec 23 08:55:49 localhost dockerd[2930]: time="2020-12-23T08:55:49.341765820Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete" Dec 23 08:55:49 localhost systemd[24638]: var-lib-docker-overlay2-64bb135bc0cdab187566992dc9870068dee1430062e1a2b484381c19e03da895-merged.mount: Succeeded. Dec 23 08:55:49 localhost systemd[1]: var-lib-docker-overlay2-64bb135bc0cdab187566992dc9870068dee1430062e1a2b484381c19e03da895-merged.mount: Succeeded. Dec 23 08:55:49 localhost bash[8745]: audit 2020-12-23T08:55:49.091014+0000 mon.host1 (mon.0) 1093 : audit [DBG] from='mgr.24098 192.168.0.1:0/2486989775' entity='mgr.host1.kkssvi' cmd=[{"prefix": "osd tree", "states": ["destroyed"], "format": "json"}]: dispatch Dec 23 08:55:50 localhost bash[8745]: cluster 2020-12-23T08:55:48.893433+0000 mgr.host1.kkssvi (mgr.24098) 24279 : cluster [DBG] pgmap v24213: 1 pgs: 1 undersized+peered; 0 B data, 112 KiB used, 931 GiB / 932 GiB avail Dec 23 08:55:50 localhost bash[8745]: audit 2020-12-23T08:55:49.087597+0000 mgr.host1.kkssvi (mgr.24098) 24280 : audit [DBG] from='client.24206 -' entity='client.admin' cmd=[{"prefix": "orch daemon add osd", "svc_arg": "host1:/dev/nvme0n1", "target": ["mon-mgr", ""]}]: dispatch Dec 23 08:55:50 localhost bash[8745]: cephadm 2020-12-23T08:55:49.093552+0000 mgr.host1.kkssvi (mgr.24098) 24281 : cephadm [INF] Found osd claims -> {} Dec 23 08:55:50 localhost bash[8745]: cephadm 2020-12-23T08:55:49.093933+0000 mgr.host1.kkssvi (mgr.24098) 24282 : cephadm [INF] Found osd claims for drivegroup None -> {} The other problem is logging is set to debug for both hosts, I tried "sudo cephadm shell -- ceph daemon mon.host1 config set mon_cluster_log_file_level info” which reports success but logging remains at debug level. If I try the same command with mon.host2 I get INFO:cephadm:Inferring fsid ae111111-1111-1111-1111-f1111a11111a INFO:cephadm:Inferring config /var/lib/ceph/ae147088-4486-11eb-9044-f1337a55707a/mon.host1/config INFO:cephadm:Using recent ceph image ceph/ceph:v15 admin_socket: exception getting command descriptions: [Errno 2] No such file or directory Which looks like it is trying to use the config for host1 on host2? Thanks, Duncan

3 years, 4 months

1
1
0 0

krbd cache quesitions

by huxiaoyu＠horebdata.cn

Dear ceph folks, rbd_cache can be set up as a read /write cache for librbd, widely used with openstack cinder. Does krbd has a silmilar cache controll mechanism or not? I am using krbd for iSCSI and NFS backend storage, and wonder whether a cache setting exists for krbd. thanks in advance, Samuel huxiaoyu(a)horebdata.cn

3 years, 4 months

2
1
0 0

Ceph rgw & dashboard problem

by Mika Saari

Hi, Using Ceph Octopus installed with cephadm here. Version running currently is 15.2.6. There are 3 machines running the cluster. Machine names are introduced in /etc/hosts in long(FQDN) & short forms but ceph hostnames of the servers are in short form (not sure if this affects anyway). rdb side is working nicely, tested with a linux client. Trying to get object gateway to be visible in dashboard but getting error when selecting "Object Gateway -> Daemons" Error: RGW REST API failed request with status code 403 (b'{"Code":"AccessDenied","RequestId":"tx000000000000000000040-005fe20384-8ecbc' b'-ou","HostId":"8ecbc-ou-default"}') What am I doing wrong here? Thanks a lot, -Mika ---- Procedure what I have done ---- 1) ceph orch apply rgw default ou --placement="1 ceph1" 2) radosgw-admin user create --uid=test --display-name=test --access-key=test --secret-key=test 3) radosgw-admin period update --rgw-realm=default --commit 4) aws configure --profile=default aws configure --profile=default AWS Access Key ID [None]: test AWS Secret Access Key [None]: test Default region name [None]: default Default output format [None]: json 5) aws s3 mb s3://test1 --endpoint-url http://ceph1 make_bucket: test1 5.1) radosgw-admin bucket list [ "test1" ] 6) ceph dashboard --help | grep reset-rgw | awk '{print $2}' | xargs -n 1 ceph dashboard Option RGW_API_ACCESS_KEY reset to default value "" Option RGW_API_ADMIN_RESOURCE reset to default value "admin" Option RGW_API_HOST reset to default value "" Option RGW_API_PORT reset to default value "80" Option RGW_API_SCHEME reset to default value "http" Option RGW_API_SECRET_KEY reset to default value "" Option RGW_API_SSL_VERIFY reset to default value "True" Option RGW_API_USER_ID reset to default value "" 7) ceph dashboard set-rgw-api-user-id "test" Option RGW_API_USER_ID updated 8) ceph dashboard set-rgw-api-access-key test Option RGW_API_ACCESS_KEY updated 9) ceph dashboard set-rgw-api-secret-key test Option RGW_API_SECRET_KEY updated 10) ceph mgr module disable dashboard 11) ceph mgr module enable dashboard

3 years, 4 months

2
2
0 0

diskprediction_local fails with python3-sklearn 0.22.2

by Eric Dold

Hello the mgr module diskprediction_local fails under ubuntu 20.04 focal with python3-sklearn version 0.22.2 Ceph version is 15.2.3 when the module is enabled i get the following error: File "/usr/share/ceph/mgr/diskprediction_local/module.py", line 112, in serve self.predict_all_devices() File "/usr/share/ceph/mgr/diskprediction_local/module.py", line 279, in predict_all_devices result = self._predict_life_expentancy(devInfo['devid']) File "/usr/share/ceph/mgr/diskprediction_local/module.py", line 222, in _predict_life_expentancy predicted_result = obj_predictor.predict(predict_datas) File "/usr/share/ceph/mgr/diskprediction_local/predictor.py", line 457, in predict pred = clf.predict(ordered_data) File "/usr/lib/python3/dist-packages/sklearn/svm/_base.py", line 585, in predict if self.break_ties and self.decision_function_shape == 'ovo': AttributeError: 'SVC' object has no attribute 'break_ties' Best Regards Eric

3 years, 4 months

2
1
0 0

Failing OSD RocksDB Corrupt

by Ashley Merrick

Hello,I had some faulty power cables on some OSD's in one server which caused lots of IO issues/disks appearing/disappearing.This has been corrected now, 2 of the 10 OSD's are working, however 8 are failing to start due to what looks to be a corrupt DB.When running a ceph-bluestore-tool fsck I get the following output:rocksdb: [db/db_impl_open.cc:516] db.wal/002221.log: dropping 1302 bytes; Corruption: missing start of fragmented record(2)2020-12-22T16:21:52.715+0100 7f7b6a1500c0 4 rocksdb: [db/db_impl.cc:389] Shutdown: canceling all background work2020-12-22T16:21:52.715+0100 7f7b6a1500c0 4 rocksdb: [db/db_impl.cc:563] Shutdown complete2020-12-22T16:21:52.715+0100 7f7b6a1500c0 -1 rocksdb: Corruption: missing start of fragmented record(2)2020-12-22T16:21:52.715+0100 7f7b6a1500c0 -1 bluestore(/var/lib/ceph/b1db6b36-0c4c-4bce-9cda-18834be0632d/osd.28) opendb erroring opening db:Trying to start the OSD leads to:ceph_abort_msg("Bad table magic number: expected 9863518390377041911, found 9372993859750765257 in db/002442.sst")It looks like the last write to these OSD's never fully completed, sadly as I was adding this new node to move from OSD to Host redundancy (EC Pool) I have 20% down PG's currently, is there anything I can do to remove the last entry in the DB or somehow clean up the rocksDB to get these OSD's atleast started? Understand may end up with some corrupted files.Thanks Sent via MXlogin

3 years, 4 months

1
0
0 0

PGs down

by Jeremy Austin

I could use some input from more experienced folks… First time seeing this behavior. I've been running ceph in production (replicated) since 2016 or earlier. This, however, is a small 3-node cluster for testing EC. Crush map rules should sustain the loss of an entire node. Here's the EC rule: rule cephfs425 { id 6 type erasure min_size 3 max_size 6 step set_chooseleaf_tries 40 step set_choose_tries 400 step take default step choose indep 3 type host step choose indep 2 type osd step emit } I had actual hardware failure on one node. Interestingly, this appears to have resulted in data loss. OSDs began to crash in a cascade on other nodes (i.e., nodes with no known hardware failure). Not a low RAM problem. I could use some pointers about how to get the down PGs back up — I *think* there are enough EC shards, even disregarding the OSDs that crash on start. nautilus 14.2.15 ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 54.75960 root default -10 16.81067 host sumia 1 hdd 5.57719 osd.1 up 1.00000 1.00000 5 hdd 5.58469 osd.5 up 1.00000 1.00000 6 hdd 5.64879 osd.6 up 1.00000 1.00000 -7 16.73048 host sumib 0 hdd 5.57899 osd.0 up 1.00000 1.00000 2 hdd 5.56549 osd.2 up 1.00000 1.00000 3 hdd 5.58600 osd.3 up 1.00000 1.00000 -3 21.21844 host tower1 4 hdd 3.71680 osd.4 up 0 1.00000 7 hdd 1.84799 osd.7 up 1.00000 1.00000 8 hdd 3.71680 osd.8 up 1.00000 1.00000 9 hdd 1.84929 osd.9 up 1.00000 1.00000 10 hdd 2.72899 osd.10 up 1.00000 1.00000 11 hdd 3.71989 osd.11 down 0 1.00000 12 hdd 3.63869 osd.12 down 0 1.00000 cluster: id: d0b4c175-02ba-4a64-8040-eb163002cba6 health: HEALTH_ERR 1 MDSs report slow requests 4/4239345 objects unfound (0.000%) Too many repaired reads on 3 OSDs Reduced data availability: 7 pgs inactive, 7 pgs down Possible data damage: 4 pgs recovery_unfound Degraded data redundancy: 95807/24738783 objects degraded (0.387%), 4 pgs degraded, 3 pgs undersized 7 pgs not deep-scrubbed in time 7 pgs not scrubbed in time services: mon: 3 daemons, quorum sumib,tower1,sumia (age 4d) mgr: sumib(active, since 7d), standbys: sumia, tower1 mds: cephfs:1 {0=sumib=up:active} 2 up:standby osd: 13 osds: 11 up (since 3d), 10 in (since 4d); 3 remapped pgs data: pools: 5 pools, 256 pgs objects: 4.24M objects, 15 TiB usage: 24 TiB used, 24 TiB / 47 TiB avail pgs: 2.734% pgs not active 95807/24738783 objects degraded (0.387%) 47910/24738783 objects misplaced (0.194%) 4/4239345 objects unfound (0.000%) 245 active+clean 7 down 3 active+recovery_unfound+undersized+degraded+remapped 1 active+recovery_unfound+degraded+repair progress: Rebalancing after osd.12 marked out [============================..] Rebalancing after osd.4 marked out [=============================.] An snipped from an example down pg: "up": [ 3, 2, 5, 1, 8, 9 ], "acting": [ 3, 2, 5, 1, 8, 9 ], <snip> ], "blocked": "peering is blocked due to down osds", "down_osds_we_would_probe": [ 11, 12 ], "peering_blocked_by": [ { "osd": 11, "current_lost_at": 0, "comment": "starting or marking this osd lost may let us proceed" }, { "osd": 12, "current_lost_at": 0, "comment": "starting or marking this osd lost may let us proceed" } ] }, { Oddly, these OSDs possibly did NOT experience hardware failure. However, they won't start -- see pastebin for ceph-osd.11.log https://pastebin.com/6U6sQJuJ HEALTH_ERR 1 MDSs report slow requests; 4/4239345 objects unfound (0.000%); Too many repaired reads on 3 OSDs; Reduced data availability : 7 pgs inactive, 7 pgs down; Possible data damage: 4 pgs recovery_unfound; Degraded data redundancy: 95807/24738783 objects degraded (0 .387%), 4 pgs degraded, 3 pgs undersized; 7 pgs not deep-scrubbed in time; 7 pgs not scrubbed in time MDS_SLOW_REQUEST 1 MDSs report slow requests mdssumib(mds.0): 42 slow requests are blocked > 30 secs OBJECT_UNFOUND 4/4239345 objects unfound (0.000%) pg 19.5 has 1 unfound objects pg 15.2f has 1 unfound objects pg 15.41 has 1 unfound objects pg 15.58 has 1 unfound objects OSD_TOO_MANY_REPAIRS Too many repaired reads on 3 OSDs osd.9 had 9664 reads repaired osd.7 had 9665 reads repaired osd.4 had 12 reads repaired PG_AVAILABILITY Reduced data availability: 7 pgs inactive, 7 pgs down pg 15.10 is down, acting [3,2,5,1,8,9] pg 15.1e is down, acting [5,1,9,8,2,3] pg 15.40 is down, acting [7,10,1,5,3,2] pg 15.4a is down, acting [0,3,5,6,9,10] pg 15.6a is down, acting [3,2,6,1,10,8] pg 15.71 is down, acting [3,2,1,6,8,10] pg 15.76 is down, acting [2,0,6,5,10,9] PG_DAMAGED Possible data damage: 4 pgs recovery_unfound pg 15.2f is active+recovery_unfound+undersized+degraded+remapped, acting [5,1,0,3,2147483647,7], 1 unfound pg 15.41 is active+recovery_unfound+undersized+degraded+remapped, acting [5,1,0,3,2147483647,2147483647], 1 unfound pg 15.58 is active+recovery_unfound+undersized+degraded+remapped, acting [10,2147483647,2,3,1,5], 1 unfound pg 19.5 is active+recovery_unfound+degraded+repair, acting [3,2,5,1,8,10], 1 unfound PG_DEGRADED Degraded data redundancy: 95807/24738783 objects degraded (0.387%), 4 pgs degraded, 3 pgs undersized pg 15.2f is stuck undersized for 635305.932075, current state active+recovery_unfound+undersized+degraded+remapped, last acting [5,1,0,3,2147483647,7] pg 15.41 is stuck undersized for 364298.836902, current state active+recovery_unfound+undersized+degraded+remapped, last acting [5,1,0,3,2147483647,2147483647] pg 15.58 is stuck undersized for 384461.110229, current state active+recovery_unfound+undersized+degraded+remapped, last acting [10,2147483647,2,3,1,5] pg 19.5 is active+recovery_unfound+degraded+repair, acting [3,2,5,1,8,10], 1 unfound PG_NOT_DEEP_SCRUBBED 7 pgs not deep-scrubbed in time pg 15.76 not deep-scrubbed since 2020-10-21 14:30:03.935228 pg 15.71 not deep-scrubbed since 2020-10-21 12:20:46.235792 pg 15.6a not deep-scrubbed since 2020-10-21 07:52:33.914083 pg 15.10 not deep-scrubbed since 2020-10-22 03:24:40.465367 pg 15.1e not deep-scrubbed since 2020-10-22 10:37:36.169959 pg 15.40 not deep-scrubbed since 2020-10-23 05:33:35.208748 pg 15.4a not deep-scrubbed since 2020-10-22 05:14:06.981035 PG_NOT_SCRUBBED 7 pgs not scrubbed in time pg 15.76 not scrubbed since 2020-10-24 08:12:40.090831 pg 15.71 not scrubbed since 2020-10-25 05:22:40.573572 pg 15.6a not scrubbed since 2020-10-24 15:03:09.189964 pg 15.10 not scrubbed since 2020-10-24 16:25:08.826981 pg 15.1e not scrubbed since 2020-10-24 16:05:03.080127 pg 15.40 not scrubbed since 2020-10-24 11:58:04.290488 pg 15.4a not scrubbed since 2020-10-24 11:32:44.573551 -- Jeremy Austin jhaustin(a)gmail.com

3 years, 4 months

4
14
0 0

osd_pglog memory hoarding - another case

by Kalle Happonen

Hello all, wrt: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/7IMIWCKIHXN… Yesterday we hit a problem with osd_pglog memory, similar to the thread above. We have a 56 node object storage (S3+SWIFT) cluster with 25 OSD disk per node. We run 8+3 EC for the data pool (metadata is on replicated nvme pool). The cluster has been running fine, and (as relevant to the post) the memory usage has been stable at 100 GB / node. We've had the default pg_log of 3000. The user traffic doesn't seem to have been exceptional lately. Last Thursday we updated the OSDs from 14.2.8 -> 14.2.13. On Friday the memory usage on OSD nodes started to grow. On each node it grew steadily about 30 GB/day, until the servers started OOM killing OSD processes. After a lot of debugging we found that the pg_logs were huge. Each OSD process pg_log had grown to ~22GB, which we naturally didn't have memory for, and then the cluster was in an unstable situation. This is significantly more than the 1,5 GB in the post above. We do have ~20k pgs, which may directly affect the size. We've reduced the pg_log to 500, and started offline trimming it where we can, and also just waited. The pg_log size dropped to ~1,2 GB on at least some nodes, but we're still recovering, and have a lot of ODSs down and out still. We're unsure if version 14.2.13 triggered this, or if the osd restarts triggered this (or something unrelated we don't see). This mail is mostly to figure out if there are good guesses why the pg_log size per OSD process exploded? Any technical (and moral) support is appreciated. Also, currently we're not sure if 14.2.13 triggered this, so this is also to put a data point out there for other debuggers. Cheers, Kalle Happonen

3 years, 4 months

4
16
0 0

kvm vm cephfs mount hangs on osd node (something like umount -l available?) (help wanted going to production)

by Marc Roos

I have a vm on a osd node (which can reach host and other nodes via the macvtap interface (used by the host and guest)). I just did a simple bonnie++ test and everything seems to be fine. Yesterday however the dovecot procces apparently caused problems (only using cephfs for an archive namespace, inbox is on rbd ssd, fs meta also on ssd) How can I recover from such lock-up. If I have a similar situation with an nfs-ganesha mount, I have the option to do a umount -l, and clients recover quickly without any issues. Having to reset the vm, is not really an option. What is best way to resolve this? Ceph cluster: 14.2.11 (the vm has 14.2.16) I have in my ceph.conf nothing special, these 2x in the mds section: mds bal fragment size max = 120000 # maybe for nfs-ganesha problems? # http://docs.ceph.com/docs/master/cephfs/eviction/ #mds_session_blacklist_on_timeout = false #mds_session_blacklist_on_evict = false mds_cache_memory_limit = 17179860387 All running: CentOS Linux release 7.9.2009 (Core) Linux mail04 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

3 years, 4 months

2
4
0 0

Can big data use Ceph?

by fantastic2085

Can big data use Ceph?For example, can Hive Hbase Spark use Ceph? https://github.com/ceph/cephfs-hadoop is no longer maintain?

3 years, 4 months

4
3
0 0

friendly warning about death by container versions

by Philip Brown

I was having horrible problems getting my test ceph clusterj reinitialized. All kinds of annoying things were happening. including things like getting differing output from ceph orch device ls vs ceph device ls Being new-ish to ceph, i was going nuts, wondering what kind of init options I was missing. Turns out, nothing i was doing was wrong, per se. I had ended up with differing container versions. Even after doing "cephadm rm-cluster", the old versions were sticking. The really annoying thing is, the difference was tiny. 15.2.8 on the master. 15.2.5 on other nodes. but device related things were failing, with 2020-12-21 09:37:09,134 INFO /bin/podman:stderr ceph-volume inventory: error: unrecognized arguments: --filter-for-batch in /var/log/ceph/cephadm.log Sighhh... To save anyone else some research, the end-user fix is: run ceph orch upgrade start --ceph-version 15.2.8 -- Philip Brown| Sr. Linux System Administrator | Medata, Inc. 5 Peters Canyon Rd Suite 250 Irvine CA 92606 Office 714.918.1310| Fax 714.918.1325 pbrown(a)medata.com| www.medata.com

3 years, 4 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users December 2020