October 2019 - ceph-users

by Fyodor Ustinov

Hi! CEPH documentation requre "tcmu-runner-1.4.0 or newer package", but I can not find this package for Centos. Maybe someone knows where to download this package? WBR, Fyodor.

4 years, 7 months

2
2
0 0

Monitor unable to join existing cluster, stuck at probing

by msmit＠smit-it.info

Hi, I'm currently working on upgrading my existing monitors within my cluster. During the first deployment of this production cluster I made some choices that in hindsight where not the best. But, it worked, I learned and now I wish to mediate my previous bad choices. The cluster exists of three monitors that are currently in quorum and I wish to upgrade each of them by fully removing them from the cluster and rejoining them after a complete reinstall of the os (new hostname, new ip). Therefore I want to maintain quorum by temporary adding a monitor but this won't go as planned as the monitor will join, with `ceph-deploy add monitor mon4` but never leave the probing state (see log below). I have verified all networking and firewall settings and don't notice any connection errors, neither do I see any weird hostnames or ip-addresses in the existing monmap on all the hosts. Also manually confirmed that all the keys on the cluster are the same, so don't suspect a authentication error. Hope someone has any guidance. Thx. Log from mon4 > /var/log/ceph/ceph-mon.mon4.log 2019-10-16 11:21:51.960 7fc709c73a00 0 mon.mon4 does not exist in monmap, will attempt to join an existing cluster 2019-10-16 11:21:51.962 7fc709c73a00 0 using public_addr 10.200.1.104:0/0 -> 10.200.1.104:6789/0 2019-10-16 11:21:51.963 7fc709c73a00 0 starting mon.mon4 rank -1 at public addr 10.200.1.104:6789/0 at bind addr 10.200.1.104:6789/0 mon_data /var/lib/ceph/mon/ceph-mon4 fsid aaf1547b-8944-4f48-b354-93659202c6fe 2019-10-16 11:21:51.964 7fc709c73a00 0 starting mon.mon4 rank -1 at 10.200.1.104:6789/0 mon_data /var/lib/ceph/mon/ceph-mon4 fsid aaf1547b-8944-4f48-b354-93659202c6fe 2019-10-16 11:21:51.965 7fc709c73a00 1 mon.mon4@-1(probing) e0 preinit fsid aaf1547b-8944-4f48-b354-93659202c6fe 2019-10-16 11:21:51.965 7fc709c73a00 1 mon.mon4@-1(probing) e0 initial_members mon1,mon2,mon3,mon4, filtering seed monmap 2019-10-16 11:21:51.965 7fc709c73a00 1 mon.mon4(a)-1(probing).mds e0 Unable to load 'last_metadata' 2019-10-16 11:21:51.967 7fc709c73a00 0 mon.mon4@-1(probing) e0 my rank is now 3 (was -1) 2019-10-16 11:21:54.054 7fc6f934b700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch 2019-10-16 11:21:54.054 7fc6f934b700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished 2019-10-16 11:21:54.300 7fc6f934b700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch 2019-10-16 11:21:54.300 7fc6f934b700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished 2019-10-16 11:22:26.967 7fc6f5ad5700 -1 mon.mon4@3(probing) e0 get_health_metrics reporting 4 slow ops, oldest is log(1 entries from seq 1 at 2019-10-16 11:21:54.055387) 2019-10-16 11:22:31.967 7fc6f5ad5700 -1 mon.mon4@3(probing) e0 get_health_metrics reporting 4 slow ops, oldest is log(1 entries from seq 1 at 2019-10-16 11:21:54.055387) 2019-10-16 11:22:36.967 7fc6f5ad5700 -1 mon.mon4@3(probing) e0 get_health_metrics reporting 4 slow ops, oldest is log(1 entries from seq 1 at 2019-10-16 11:21:54.055387) 2019-10-16 11:22:37.478 7fc6f934b700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch 2019-10-16 11:22:37.478 7fc6f934b700 0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd=mon_status args=[]: finished 2019-10-16 11:22:41.968 7fc6f5ad5700 -1 mon.mon4@3(probing) e0 get_health_metrics reporting 4 slow ops, oldest is log(1 entries from seq 1 at 2019-10-16 11:21:54.055387) 2019-10-16 11:22:46.968 7fc6f5ad5700 -1 mon.mon4@3(probing) e0 get_health_metrics reporting 4 slow ops, oldest is log(1 entries from seq 1 at 2019-10-16 11:21:54.055387)

4 years, 7 months

3
2
0 0

RGW blocking on large objects

by Robert LeBlanc

We set up a new Nautilus cluster and only have RGW on it. While we had a job doing 200k IOPs of really small objects, I noticed that HAProxy was kicking out RGW backends because they were taking more than 2 seconds to return. We GET a large ~4GB file each minute and use that as a health check to determine if the system is taking too long to service requests. It seems that other IO is being blocked by this large transfer. This seems to be the case with both civetweb and beast. But I'm double checking beast at the moment because I'm not 100% sure we were using it at the start. Any ideas how to mitigate this? It seems that IOs are being scheduled on a thread and if they get unlucky enough to be scheduled behind a big IO, they are just stuck, in this case HAProxy could kick out the backend before the IO is returned and it has to re-request it. Thank you, Robert LeBlanc ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1

4 years, 7 months

4
16
0 0

RDMA

by gabryel.mason-williams＠diamond.ac.uk

Hello, I was wondering what user experience was with using Ceph over RDMA? - How you set it up? - Documentation used to set it up? - Known issues when using it? - If you still use it? Kind regards Gabryel Mason-Williams

4 years, 7 months

7
7
1 0

Re: ceph-users Digest, Vol 81, Issue 39 Re:RadosGW cant list objects when there are too many of them

by Romit Misra

Hi Arash, If the number of objects in a bucket are too large in the order of millions, a paginated listing approach works better. There are also ceratin RGW configs, that controls on how big a RGW response (in terms of number objects can be, by default I believe this is 1000) The code for Paginated Listing (Snippet can be modified):- *"* *try: buckethandle = s3_conn_src.get_bucket(bucket_name)* * while True:* * keys = buckethandle.get_all_keys(max_keys=1000,marker = marker) for k in keys:* * #do operation on keys (which are the objects)* * print k.name <http://k.name>* * #update marker* * marker = k.name <http://k.name>* * if keys.is_truncated is False: print "Breaking" break* *except Exception, e:* * print e* *"* Thanks Romit Misra Thanks Romit On Thu, Oct 17, 2019 at 4:18 PM <ceph-users-request(a)ceph.io> wrote: > Send ceph-users mailing list submissions to > ceph-users(a)ceph.io > > To subscribe or unsubscribe via email, send a message with subject or > body 'help' to > ceph-users-request(a)ceph.io > > You can reach the person managing the list at > ceph-users-owner(a)ceph.io > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of ceph-users digest..." > > Today's Topics: > > 1. RadosGW cant list objects when there are too many of them > (Arash Shams) > 2. Re: Recovering from a Failed Disk (replication 1) (Burkhard Linke) > 3. Re: RGW blocking on large objects (Paul Emmerich) > 4. Re: RadosGW cant list objects when there are too many of them > (Paul Emmerich) > 5. Re: Recovering from a Failed Disk (replication 1) (Frank Schilder) > > > ---------------------------------------------------------------------- > > Date: Thu, 17 Oct 2019 07:19:12 +0000 > From: Arash Shams <ara4sh(a)hotmail.com> > Subject: [ceph-users] RadosGW cant list objects when there are too > many of them > To: "ceph-users(a)ceph.io" <ceph-users(a)ceph.io> > Message-ID: <LNXP265MB0508FF1F47CB5EA9C29219FA926D0(a)LNXP265MB0508.GBR > P265.PROD.OUTLOOK.COM> > Content-Type: multipart/alternative; boundary="_000_LNXP265MB0508FF > 1F47CB5EA9C29219FA926D0LNXP265MB0508GBRP_" > > --_000_LNXP265MB0508FF1F47CB5EA9C29219FA926D0LNXP265MB0508GBRP_ > Content-Type: text/plain; charset="iso-8859-1" > Content-Transfer-Encoding: quoted-printable > > Dear All > > I have a bucket with 5 million Objects and I cant list objects with > radosgw-admin bucket list --bucket=3Dbucket | jq .[].name > or listing files using boto3 > > s3 =3D boto3.client('s3', > endpoint_url=3Dcredentials['endpoint_url'], > aws_access_key_id=3Dcredentials['access_key'], > aws_secret_access_key=3Dcredentials['secret_key']) > > response =3D s3.list_objects_v2(Bucket=3Dbucket_name) > for item in response['Contents']: > print(item['Key']) > > what is the solution ? how can I find list of my objects ? > > > > > --_000_LNXP265MB0508FF1F47CB5EA9C29219FA926D0LNXP265MB0508GBRP_ > Content-Type: text/html; charset="iso-8859-1" > Content-Transfer-Encoding: quoted-printable > > <html> > <head> > <meta http-equiv=3D"Content-Type" content=3D"text/html; > charset=3Diso-8859-= > 1"> > <style type=3D"text/css" style=3D"display:none;"> P > {margin-top:0;margin-bo= > ttom:0;} </style> > </head> > <body dir=3D"ltr"> > <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: > 12pt;= > color: rgb(0, 0, 0);"> > Dear All <br> > </div> > <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: > 12pt;= > color: rgb(0, 0, 0);"> > <br> > </div> > <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: > 12pt;= > color: rgb(0, 0, 0);"> > I have a bucket with 5 million Objects and I cant list objects with <br> > radosgw-admin bucket list --bucket=3Dbucket | jq .[].name<br> > or listing files using boto3 <br> > </div> > <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: > 12pt;= > color: rgb(0, 0, 0);"> > <span><br> > </span></div> > <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: > 12pt;= > color: rgb(0, 0, 0);"> > <span>    s3 =3D boto3.client('s3',<br> > </span> > <div>                    > = >   endpoint_url=3Dcredentials['endpoint_url'],<br> > </div> > <div>                    > = >   aws_access_key_id=3Dcredentials['access_key'],<br> > </div> > <div>                    > = >   aws_secret_access_key=3Dcredentials['secret_key'])<br> > </div> > <div><br> > </div> > <div>    response =3D > s3.list_objects_v2(Bucket=3Dbucket_name)<br= > > > </div> > <div>    for item in response['Contents']:<br> > </div> > <span>        print(item['Key'])</span><br> > </div> > <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: > 12pt;= > color: rgb(0, 0, 0);"> > <br> > </div> > <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: > 12pt;= > color: rgb(0, 0, 0);"> > what is the solution ? how can I find list of my objects ?</div> > <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: > 12pt;= > color: rgb(0, 0, 0);"> > <br> > </div> > <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: > 12pt;= > color: rgb(0, 0, 0);"> > <br> > </div> > <div style=3D"font-family: Calibri, Helvetica, sans-serif; font-size: > 12pt;= > color: rgb(0, 0, 0);"> > <br> > </div> > </body> > </html> > > --_000_LNXP265MB0508FF1F47CB5EA9C29219FA926D0LNXP265MB0508GBRP_-- > > ------------------------------ > > Date: Thu, 17 Oct 2019 10:18:11 +0200 > From: Burkhard Linke <Burkhard.Linke(a)computational.bio.uni-giessen.de> > Subject: [ceph-users] Re: Recovering from a Failed Disk (replication > 1) > To: ceph-users(a)ceph.io > Message-ID: <f341533b-cc7a-865d-0440-79084e0c5707(a)computational.bio.un > i-giessen.de> > Content-Type: multipart/alternative; > boundary="------------71A0D501B0D56489A2F673CA" > > This is a multi-part message in MIME format. > --------------71A0D501B0D56489A2F673CA > Content-Type: text/plain; charset=utf-8; format=flowed > Content-Transfer-Encoding: 7bit > > Hi, > > > On 10/17/19 5:56 AM, Ashley Merrick wrote: > > I think your better off doing the DD method, you can export and import > > a PG at a time (ceph-objectstore-tool) > > > > But if the disk is failing a DD is probably your best method. > > > In case of hardware problems or broken sectors, I would recommend > 'dd_rescue' instead of dd. It can handle broken sectors, automatic > retries, skipping etc. > > > You will definitely need a second disk to rescue to. > > > Regards, > > Burkhard > > > > --------------71A0D501B0D56489A2F673CA > Content-Type: text/html; charset=utf-8 > Content-Transfer-Encoding: 7bit > > <html> > <head> > <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> > </head> > <body text="#000000" bgcolor="#FFFFFF"> > <p>Hi,</p> > <p><br> > </p> > <div class="moz-cite-prefix">On 10/17/19 5:56 AM, Ashley Merrick > wrote:<br> > </div> > <blockquote type="cite" > cite=" > mid:16dd7dc0f88.e12acd11469078.7469588663842930162@amerrick.co.uk"> > <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> > <div style="font-family: Verdana, Arial, Helvetica, sans-serif; > font-size: 10pt;">I think your better off doing the DD method, > you can export and import a PG at a time > (ceph-objectstore-tool)<br> > <br> > But if the disk is failing a DD is probably your best method.<br> > </div> > </blockquote> > <p><br> > </p> > <p>In case of hardware problems or broken sectors, I would recommend > 'dd_rescue' instead of dd. It can handle broken sectors, automatic > retries, skipping etc.</p> > <p><br> > </p> > <p>You will definitely need a second disk to rescue to.</p> > <p><br> > </p> > <p>Regards,</p> > <p>Burkhard</p> > <br> > </body> > </html> > > --------------71A0D501B0D56489A2F673CA-- > > ------------------------------ > > Date: Thu, 17 Oct 2019 11:50:37 +0200 > From: Paul Emmerich <paul.emmerich(a)croit.io> > Subject: [ceph-users] Re: RGW blocking on large objects > To: Robert LeBlanc <robert(a)leblancnet.us> > Cc: ceph-users <ceph-users(a)ceph.io> > Message-ID: > < > CAD9yTbFSpxo1cMAAZ56YcwKxY2dcv0KRPcWYapT2fLmYhmrLkg(a)mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > On Thu, Oct 17, 2019 at 12:17 AM Robert LeBlanc <robert(a)leblancnet.us> > wrote: > > > > On Wed, Oct 16, 2019 at 2:50 PM Paul Emmerich <paul.emmerich(a)croit.io> > wrote: > > > > > > On Wed, Oct 16, 2019 at 11:23 PM Robert LeBlanc <robert(a)leblancnet.us> > wrote: > > > > > > > > On Tue, Oct 15, 2019 at 8:05 AM Robert LeBlanc <robert(a)leblancnet.us> > wrote: > > > > > > > > > > On Mon, Oct 14, 2019 at 2:58 PM Paul Emmerich < > paul.emmerich(a)croit.io> wrote: > > > > > > > > > > > > Could the 4 GB GET limit saturate the connection from rgw to > Ceph? > > > > > > Simple to test: just rate-limit the health check GET > > > > > > > > > > I don't think so, we have dual 25Gbp in a LAG, so Ceph to RGW has > > > > > multiple paths, but we aren't balancing on port yet, so RGW to > HAProxy > > > > > is probably limited to one link. > > > > > > > > > > > Did you increase "objecter inflight ops" and "objecter inflight > op bytes"? > > > > > > You absolutely should adjust these settings for large RGW setups, > > > > > > defaults of 1024 and 100 MB are way too low for many RGW setups, > we > > > > > > default to 8192 and 800MB > > > > > > > > On Nautilus the defaults already seem to be: > > > > objecter_inflight_op_bytes 104857600 > > > > default > > > = 100MiB > > > > > > > objecter_inflight_ops 24576 > > > > default > > > > > > not sure where you got this from, but the default is still 1024 even > > > in master: > https://github.com/ceph/ceph/blob/4774808cb2923f65f6919fe8be5f98917075cdd7/… > > > > Looks like it is overridden in > > > https://github.com/ceph/ceph/blob/4774808cb2923f65f6919fe8be5f98917075cdd7/… > > you are right, this is new in Nautilus. Last time I had to play around > with these settings was indeed on a Mimic deployment. > > > I'm just not > > understanding how your suggestions would help, the problem doesn't > > seem to be on the RADOS side (which it appears your tweaks target), > > but on the HTTP side as an HTTP health check takes a long time to come > > back when a big transfer is going on. > > I was guessing a bottleneck on the RADOS side because you mentioned > that you tried both civetweb and beast, somewhat unlikely to run into > the exact same problem with both > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90 > > ------------------------------ > > Date: Thu, 17 Oct 2019 12:00:20 +0200 > From: Paul Emmerich <paul.emmerich(a)croit.io> > Subject: [ceph-users] Re: RadosGW cant list objects when there are too > many of them > To: Arash Shams <ara4sh(a)hotmail.com> > Cc: "ceph-users(a)ceph.io" <ceph-users(a)ceph.io> > Message-ID: > <CAD9yTbGoCGPh= > Ba5tQqBvCLu9uUKo0KmjR9E3ayo6DkR-E2bxQ(a)mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > Listing large buckets is slow due to S3 ordering requirements, it's > approximately O(n^2). > However, I wouldn't consider 5M to be a large bucket, it should go to > only ~50 shards which should still perform reasonable. How fast are > your metadata OSDs? > > Try --allow-unordered in radosgw-admin to get an unordered result > which is only O(n) as you'd expect. > > For boto3: I'm not sure if v2 object listing is available yet (I think > it has only been merged into master but has not yet made it into a > release?). It doesn't support unordered listing but there has been > some work to implement it there, not sure about the current state. > > > > Paul > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90 > > On Thu, Oct 17, 2019 at 9:19 AM Arash Shams <ara4sh(a)hotmail.com> wrote: > > > > Dear All > > > > I have a bucket with 5 million Objects and I cant list objects with > > radosgw-admin bucket list --bucket=bucket | jq .[].name > > or listing files using boto3 > > > > s3 = boto3.client('s3', > > endpoint_url=credentials['endpoint_url'], > > aws_access_key_id=credentials['access_key'], > > aws_secret_access_key=credentials['secret_key']) > > > > response = s3.list_objects_v2(Bucket=bucket_name) > > for item in response['Contents']: > > print(item['Key']) > > > > what is the solution ? how can I find list of my objects ? > > > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > ------------------------------ > > Date: Thu, 17 Oct 2019 10:46:53 +0000 > From: Frank Schilder <frans(a)dtu.dk> > Subject: [ceph-users] Re: Recovering from a Failed Disk (replication > 1) > To: vladimir franciz blando <vladimir.blando(a)gmail.com>, > "ceph-users(a)ceph.io" <ceph-users(a)ceph.io> > Message-ID: <58e22bc6345b48718dbce7238a06e35d(a)dtu.dk> > Content-Type: text/plain; charset="utf-8" > > You probably need to attempt a physical data rescue. Data access will be > lost until done. > > First thing is shut down the OSD to avoid any further damage to the disk. > Second thing is to try ddrescue, repair data on a copy if possible and > then create a clone on a new disk from the copy. > If this doesn't help and you really need that last bit of data, you might > need support from one of those companies that restore disk data with > electron microscopy. > > I successfully transferred OSDs between disks using ddrescue. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: vladimir franciz blando <vladimir.blando(a)gmail.com> > Sent: 17 October 2019 05:29:13 > To: ceph-users(a)ceph.io > Subject: [ceph-users] Recovering from a Failed Disk (replication 1) > > Hi, > > I have a not ideal setup on one of my cluster, 3 ceph nodes but using > replication 1 on all pools (don't ask me why replication 1, it's a long > story). > > So it has come to this situation that a disk keeps on crashing, possible a > hardware failure and I need to recover from that. > > What's my best option for me to recover the data from the failed disk and > transfer it to the other healthy disks? > > This cluster is using Firefly > > - Vlad > [ > https://mailfoogae.appspot.com/t?sender=admxhZGltaXIuYmxhbmRvQGdtYWlsLmNvbQ… > <https://mailfoogae.appspot.com/t?sender=admxhZGltaXIuYmxhbmRvQGdtYWlsLmNvbQ…> > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > > > ------------------------------ > > End of ceph-users Digest, Vol 81, Issue 39 > ****************************************** > -- *-----------------------------------------------------------------------------------------* *This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee, you should not disseminate, distribute or copy this email. Please notify the sender immediately by email if you have received this email by mistake and delete this email from your system. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.***** **** *Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the organization. Any information on shares, debentures or similar instruments, recommended product pricing, valuations and the like are for information purposes only. It is not meant to be an instruction or recommendation, as the case may be, to buy or to sell securities, products, services nor an offer to buy or sell securities, products or services unless specifically stated to be so on behalf of the Flipkart group. Employees of the Flipkart group of companies are expressly required not to make defamatory statements and not to infringe or authorise any infringement of copyright or any other legal right by email communications. Any such communication is contrary to organizational policy and outside the scope of the employment of the individual concerned. The organization will not accept any liability in respect of such communication, and the employee responsible will be personally liable for any damages or other liability arising.***** **** *Our organization accepts no liability for the content of this email, or for the consequences of any actions taken on the basis of the information *provided,* unless that information is subsequently confirmed in writing. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.* _-----------------------------------------------------------------------------------------_

4 years, 7 months

1
0
0 0

Recovering from a Failed Disk (replication 1)

by vladimir franciz blando

Hi, I have a not ideal setup on one of my cluster, 3 ceph nodes but using replication 1 on all pools (don't ask me why replication 1, it's a long story). So it has come to this situation that a disk keeps on crashing, possible a hardware failure and I need to recover from that. What's my best option for me to recover the data from the failed disk and transfer it to the other healthy disks? This cluster is using Firefly - Vlad ᐧ

4 years, 7 months

6
8
0 0

RadosGW cant list objects when there are too many of them

by Arash Shams

Dear All I have a bucket with 5 million Objects and I cant list objects with radosgw-admin bucket list --bucket=bucket | jq .[].name or listing files using boto3 s3 = boto3.client('s3', endpoint_url=credentials['endpoint_url'], aws_access_key_id=credentials['access_key'], aws_secret_access_key=credentials['secret_key']) response = s3.list_objects_v2(Bucket=bucket_name) for item in response['Contents']: print(item['Key']) what is the solution ? how can I find list of my objects ?

4 years, 7 months

1
0
0 0

Dealing with changing EC Rules with drive classifications

by Jeremi Avenant

Good day I'm currently administrating a Ceph cluster that consists out of HDDs & SSDs. The rule for cephfs_data (ec) is to write to both these drive classifications (HDD+SSD). I would like to change it so that cephfs_metadata (non-ec) writes to SSD & cephfs_data (erasure encoded "ec") writes to HDD since we're experiencing high disk latency. 1) The first option to come to mind would be to migrate each pool to a new rule but this would mean moving a tonne of data around. (How is disk space calculated on this, if I use 600 TB in an EC pool, do I need another 600 TB pool to move it over, or does it shrink the existing pool as it inflates the new pool while moving?) 2) I would like to know if the alternative is possible: i.e. Delete the SSDs from the default host bucket (leave everything as it is) and move the metadata pool to the SSD based crush rule. However I'm not sure if this is possible as it will be deleting a leaf from a bucket in our default root. Which means when you add a new SSD osd where does it end up? crush map - http://pastefile.fr/6f37e7e594a61d0edd9dc947349c756b ceph osd pool ls detail - http://pastefile.fr/0f215e1252ec58c144d9abfe1688adc8 osd tree - http://pastefile.fr/2acdd377a2db021b6af2996929b85082 If anyone has any input it would be greatly appreciated. Regards -- *Jeremi-Ernst Avenant, Mr.*Cloud Infrastructure Specialist Inter-University Institute for Data Intensive Astronomy 5th Floor, Department of Physics and Astronomy, University of Cape Town Tel: 021 959 4137 <0219592327> Web: www.idia.ac.za <http://www.uwc.ac.za/> E-mail (IDIA): jeremi(a)idia.ac.za <mfundo(a)idia.ac.za> Rondebosch, Cape Town, 7600

4 years, 7 months

3
4
0 0

Please help me understand this large omap object found message.

by Robert LeBlanc

I've been searching around trying to learn about this, but it doesn't seem to be in index sharding problem, so I'm not sure how to approach it and I'm still new to RGW. This is what is in the cluster logs: Large omap object found. Object: 7:f2aa93d5:::.dir.b614530a-c47b-4109-a817-f90c1165bd50.204287.1:head Key count: 3626157 Size (bytes): 912540654 We primarily have one bucket and it currently has 64 shards, so the auto sharding seems to be working right. The '.dir' component of the object seems to indicate that it's not an index. Please help me understand what is going on here and what should be the best course of action. Thank you, ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1

4 years, 7 months

1
0
0 0

mix sata/sas same pool

by Frank R

I have inherited a cluster where about 30% of the osds in a pool are 7200 SAS. The other 70% are 7200 SATA. Should I look into creating 2 pools or will this likely not be a huge deal?

4 years, 7 months

3
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users October 2019