Hello,
I'm trying to install nautilus on stretch following the directions here https://docs.ceph.com/docs/master/install/get-packages/ . However, it seems the stretch repo only includes ceph-deploy. Are the rest of the packages missing on purpose or have I missed something obvious?
Thanks
hi,all:
I use the aws s3 java sdk , when make a new bucket , with the hostname " s3.my-self.mydomain.com" ; will get a auth error.
but , when I use the hostname " s3.us-east-1.mydomian.com" ,will be ok, why ?
黄明友
IT基础架构部经理
V.Photos 云摄影
移动电话: +86 13540630430
客服电话:400 - 806 - 5775
电子邮件: hmy(a)v.photos
官方网址: www.v.photos
上海 黄浦区中山东二路88号外滩SOHO3Q F栋 2层
北京 朝阳区光华路9号光华路SOHO二期南二门SOHO3Q 1层
广州 天河区林和中路136号天誉花园二期3Wcoffice 天誉青创社区
深圳 南山区蛇口网谷科技大厦二期A座102网谷双创街 1层
成都 成华区建设路世贸广场 7层
Hello,
I have an old ceph 0.94.10 cluster that had 10 storage nodes with one extra
management node used for running commands on the cluster. Over time we'd
had some hardware failures on some of the storage nodes, so we're down to
6, with ceph-mon running on the management server and 4 of the storage
nodes. We attempted deploying a ceph.conf change and restarted ceph-mon and
ceph-osd services, but the cluster went down on us. We found all the
ceph-mons are stuck in the electing state, I can't get any response from
any ceph commands but I found I can contact the daemon directly and get
this information (hostnames removed for privacy reasons):
root@<mgmt1>:~# ceph daemon mon.<mgmt1> mon_status
{
"name": "<mgmt1>",
"rank": 0,
"state": "electing",
"election_epoch": 4327,
"quorum": [],
"outside_quorum": [],
"extra_probe_peers": [],
"sync_provider": [],
"monmap": {
"epoch": 10,
"fsid": "69611c75-200f-4861-8709-8a0adc64a1c9",
"modified": "2019-08-23 08:20:57.620147",
"created": "0.000000",
"mons": [
{
"rank": 0,
"name": "<mgmt1>",
"addr": "[fdc4:8570:e14c:132d::15]:6789\/0"
},
{
"rank": 1,
"name": "<mon1>",
"addr": "[fdc4:8570:e14c:132d::16]:6789\/0"
},
{
"rank": 2,
"name": "<mon2>",
"addr": "[fdc4:8570:e14c:132d::28]:6789\/0"
},
{
"rank": 3,
"name": "<mon3>",
"addr": "[fdc4:8570:e14c:132d::29]:6789\/0"
},
{
"rank": 4,
"name": "<mon4>",
"addr": "[fdc4:8570:e14c:132d::151]:6789\/0"
}
]
}
}
Is there any way to force the cluster back into a quorum even if it's just
one mon running to start it up? I've tried exporting the mgmt's monmap and
injecting it into the other nodes, but it didn't make any difference.
Thanks!
Hi,
Em qui, 29 de ago de 2019 às 22:32, fengyd <fengyd81(a)gmail.com> escreveu:
> Hi,
>
> The issue is still there?
>
Yes, yet.
> I have met an IO peformance issue recently and found that the count of the
> max fd for the Qemu/KVM was not bigger enough, the fd for Qemu/KVM was
> exhausted, the issue was solved after increasing the count of the max fd.
>
> How check and increase max fd for qemu? Can you give-me way?
Regards,
Gesiel
>
> On Wed, 21 Aug 2019 at 20:53, Gesiel Galvão Bernardes <
> gesiel.bernardes(a)gmail.com> wrote:
>
>> Hi Eliza,
>>
>> Em qua, 21 de ago de 2019 às 09:30, Eliza <eli(a)chinabuckets.com>
>> escreveu:
>>
>>> Hi
>>>
>>> on 2019/8/21 20:25, Gesiel Galvão Bernardes wrote:
>>> > I`m use a Qemu/kvm(Opennebula) with Ceph/RBD for running VMs, and I
>>> > having problems with slowness in aplications that many times not
>>> > consuming very CPU or RAM. This problem affect mostly Windows.
>>> Appearly
>>> > the problem is that normally the application load many short files
>>> (ex:
>>> > DLLs) and these files take a long time to load, generating a slowness.
>>>
>>> Did you check/test your network connection?
>>> Do you have a fast network setup?
>>
>>
>> I have a bond of two 10GB interfaces, with little use.
>>
>>>
>>>
>> regards.
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users(a)lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users(a)lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
"ceph osd down" will mark an OSD down once, but not shut it down. Hence, it will continue to send heartbeats and request to be marked up again after a couple of seconds. To keep it down, there are 2 ways:
- either set "ceph osd set noup",
- or actually shut the OSD down.
The first version will allow the OSD to keep running so you can talk to the daemon while it is marked "down" . Be aware that the OSD will be marked "out" after a while. You might need to mark it "in" manually when you are done with maintenance.
I believe with nautilus it is possible to set the noup flag on a specific OSD, which is much safer.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: ceph-users <ceph-users-bounces(a)lists.ceph.com> on behalf of solarflow99 <solarflow99(a)gmail.com>
Sent: 03 September 2019 19:40:59
To: Ceph Users
Subject: [ceph-users] forcing an osd down
I noticed this has happened before, this time I can't get it to stay down at all, it just keeps coming back up:
# ceph osd down osd.48
marked down osd.48.
# ceph osd tree |grep osd.48
48 3.64000 osd.48 down 0 1.00000
# ceph osd tree |grep osd.48
48 3.64000 osd.48 up 0 1.00000
health HEALTH_WARN
2 pgs backfilling
1 pgs degraded
2 pgs stuck unclean
recovery 18/164089686 objects degraded (0.000%)
recovery 1467405/164089686 objects misplaced (0.894%)
monmap e1: 3 mons at {0=192.168.4.10:6789/0,1=192.168.4.11:6789/0,2=192.168.4.12:6789/0<http://192.168.4.10:6789/0,1=192.168.4.11:6789/0,2=192.168.4.12:6789/0>}
election epoch 210, quorum 0,1,2 0,1,2
mdsmap e166: 1/1/1 up {0=0=up:active}, 2 up:standby
osdmap e25733: 45 osds: 45 up, 44 in; 2 remapped pgs
Hi, I encountered a problem with blocked MDS operations and a client becoming unresponsive. I dumped the MDS cache, ops, blocked ops and some further log information here:
https://files.dtu.dk/u/peQSOY1kEja35BI5/2010-09-03-mds-blocked-ops?l
A user of our HPC system was running a job that creates a somewhat stressful MDS load. This workload tends to lead to MDS warnings like "slow metadata ops" and "client does not respond to caps release", which usually disappear without intervantion after a while.
He cancelled the job and one operation from one of the clients remained stuck in the MDS. We had a health warning about 1 blocked meta data operation and one client failing to respond to caps release. I should mention that we execute "echo 3 > /proc/sys/vm/drop_caches" in the epilogue script executed after every job, which usually cleans up all unused caps without problems. So, at the time I was looking at the number of client caps, these were down to below 100 for the client in question due to epilogue script execution. Looks like there might be a race condition with the drop caches and MDS requests.
In addition, while this happened, there was backfill going on. All PGs were active+other stuff. All storage was r/w-accessible.
On the client side, this was in the logs:
Sep 3 09:15:57 sn110 kernel: INFO: task kworker/0:1:79782 blocked for more than 120 seconds.
Sep 3 09:15:57 sn110 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 3 09:15:57 sn110 kernel: kworker/0:1 D ffff995cf4614100 0 79782 2 0x00000000
Sep 3 09:15:57 sn110 kernel: Workqueue: ceph-pg-invalid ceph_invalidate_work [ceph]
Sep 3 09:15:57 sn110 kernel: Call Trace:
[... see link above ...]
I did not see slow ops on any of the OSDs. All other information in the link above.
We had to reboot the client to resolve this problem. It seems like the MDS does not clean up blocked requests in certain situations when it ought to be possible. I hope the cache and ops dumps help pinpoint the reason.
Best regards,
Frank
Hi
We have a couple of RHEL 7.6 (3.10.0-957.21.3.el7.x86_64) clients that
have a number of uninterruptible threads and I'm wondering if we're
looking at the issue fixed by
https://www.spinics.net/lists/ceph-devel/msg45467.html (the fix hasn't
made it into RHEL 7.7 3.10.0-1062).
Stack traces of the hung threads are at http://p.ip.fi/9pQA
There are a number of entries listed in
/sys/kernel/debug/ceph/*/{osdc,mdsc} at http://p.ip.fi/VVzx
Unfortunately, the issue isn't consistently reproducible
Cheers
Toby
--
Toby Darling, Scientific Computing (2N249)
MRC Laboratory of Molecular Biology
Francis Crick Avenue
Cambridge Biomedical Campus
Cambridge CB2 0QH
Phone 01223 267070
Is there no ceph wiki page with examples of manual repairs with the
ceph-objectstore-tool (eg. where pg repair and pg scrub don’t work)
I am having this issue for quite some time.
2019-09-02 14:17:34.175139 7f9b3f061700 -1 log_channel(cluster) log
[ERR] : deep-scrub 17.36
17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:head : expected
clone 17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4 1 missing
And tried to resolve it according to this procedure[0] but now I am
getting the message
ceph-objectstore-tool --dry-run --type bluestore --data-path
/var/lib/ceph/osd/ceph-29 --pgid 17.36
'{"oid":"rbd_data.1f114174b0dc51.0000000000000974","key":"","snapid":-2,
"hash":1357874486,"pool":17,"namespace":"","max":0}' remove
Snapshots are present, use removeall to delete everything
I am not sure about this removeall, but I do not want to start deleting
snapshots hoping it will amount to something. Besides if only maybe 4mb
block is damaged, do you really need to purge snapshots of 40GB. I
rather have a snapshot of 40GB missing 4MB than having no snapshot at
all.
[0]
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg47218.html
PS. Is there a record of who is having the longest unhealthy cluster
state? Because I would not like it to be me ;)