Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

[Ceph::RADOS] Help Debugging Inline C

by three18ti (Scribe)
on Nov 07, 2013 at 00:44 UTC ( #1061509=perlquestion: print w/ replies, xml ) Need Help??
three18ti has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I'm using Ceph for storage in my virtualization cluster and I would like to be able to control it programmatically from a Perl script. I looked at using SWIG or h2xs to generate a wrapper, but it seems I require more intimate knowledge of the source library than I currently posess to make effective use of those tools.

After a bit of googling, I did come across a thread where a Ceph module was started. Ceph::RADOS.

This module seems to work great except for the list_pools function which causes a segfault.

The relevant code from the module is here:

sub list_pools { my $self = shift; my @pools = list_pools_c($self->{conn}); return \@pools; } __C__ void list_pools_c (rados_t clu) { int buf_sz = rados_pool_list(clu,NULL,0); char buf[buf_sz]; int r = rados_pool_list(clu,buf,buf_sz); if (r != buf_sz) { printf("buffer size mismatch: got %d the first time, but %d " "the second.\n", buf_sz, r); } Inline_Stack_Vars; Inline_Stack_Reset; const char *b = buf; while(1) { if(b[0] == '\0') { Inline_Stack_Done; break; } Inline_Stack_Push(sv_2mortal(newSVpv(b,0))); b += strlen(b) +1; } }

We can observe this behavior when running a stacktrace on the testrados2.pl script that comes with the attachment 2 (I added a print "Testing list_pools\n" before the actual call to $c->list_pools to make debugging easier):

write(1, "Testing list_pools\n", 19Testing list_pools ) = 19 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7fffd5dd +5000} --- +++ killed by SIGSEGV (core dumped) +++ Segmentation fault (core dumped)

Running this code through gdb I get possibly a little more helpful error: ( gdb ; set args testrados2.pl ; run )

Testing list_pools Program received signal SIGSEGV, Segmentation fault. __memset_sse2 () at ../sysdeps/x86_64/multiarch/../memset.S:913 913 ../sysdeps/x86_64/multiarch/../memset.S: No such file or direc +tory.

But googling that error has not provided my any assistance.

I think this error is coming from the c function "list_pools_c" because Perl doesn't (that I've encountered) emit stack traces under "normal" circumstances.

I was hoping someone could help me debug the error as I'm not really sure where to go from here. I've asked a similar question on the ceph-users mailing list, but I've only gotten crickets.

Also, any words of wisdom on extending c libraries would be greatly appreciated. I've read perlxstut and related documentation. My C is a bit rusty and I'm attempting to extend someone else's library, I get the feeling that this project may take a bit of work.

Thanks Monks!

Edits: to fix links to Ceph mailing list and Ceph::RADOS package

Comment on [Ceph::RADOS] Help Debugging Inline C
Select or Download Code
Re: [Ceph::RADOS] Help Debugging Inline C
by taint (Chaplain) on Nov 07, 2013 at 02:52 UTC
    Greetings,
    While I know almost nothing of the application your working with. I do have a couple of thoughts on coercing more information out of your crashes/segfaults.
    It would be helpful to know what system you're running on.
    If I were forced to guess, I'd guess some version of Linux. That said, do you have the actual core file to examine?

    --Chris

    #!/usr/bin/perl -Tw
    use perl::always;
    my $perl_version = (5.12.5);
    print $perl_version;

      Hello Chris

      Thanks for your response.

      Here's the output of uname, of this machine: (ubuntu 13.10 x86_64 perl 5.15.4)

      root@kitt:~/Ceph-RADOS-0.01# uname -a Linux kitt 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 201 +3 x86_64 x86_64 x86_64 GNU/Linux root@kitt:~/Ceph-RADOS-0.01# perl -v This is perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-li +nux-gnu-thread-multi root@kitt:~/Ceph-RADOS-0.01# cat /etc/*release* DISTRIB_ID=Ubuntu DISTRIB_RELEASE=13.10 DISTRIB_CODENAME=saucy DISTRIB_DESCRIPTION="Ubuntu 13.10" NAME="Ubuntu" VERSION="13.10, Saucy Salamander" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 13.10" VERSION_ID="13.10"

      I'm open to any ideas that help me sus out this problem. Thanks!

      ----------------------------------

      <signature> root@kitt:~# perl -Mperl::always\ 9999 -e 1 Can't locate perl/always.pm in @INC (@INC contains: /etc/perl /usr/loc +al/lib/perl/5.14.2 /usr/local/share/perl/5.14.2 /usr/lib/perl5 /usr/s +hare/perl5 /usr/lib/perl/5.14 /usr/share/perl/5.14 /usr/local/lib/sit +e_perl .). BEGIN failed--compilation aborted. Just sayin' ;) </signature>
Re: [Ceph::RADOS] Help Debugging Inline C
by syphilis (Canon) on Nov 07, 2013 at 04:04 UTC
    int buf_sz = rados_pool_list(clu,NULL,0); char buf[buf_sz];

    With some compilers, that won't assign buf_sz chars to buf. I would expect that, on such compilers, that piece of code would cause an error - and the module would not compile.
    However, if it didn't throw a compile-time exception, then you would most likely experience a runtime segfault.

    It's safer, IMO, to assign the memory dynamically - something like (UNTESTED):
    use Inline C => Config => BUILD_NOISY => 1; __C__ void list_pools_c (rados_t clu) { Inline_Stack_Vars; int buf_sz = rados_pool_list(clu,NULL,0); char *b; int r; Newx(b, buf_sz, char); r = rados_pool_list(clu,b,buf_sz); if (r != buf_sz) { printf("buffer size mismatch: got %d the first time, but %d " "the second.\n", buf_sz, r); } Inline_Stack_Reset; while(1) { if(b[0] == '\0') { Inline_Stack_Done; Safefree(b); break; } Inline_Stack_Push(sv_2mortal(newSVpv(b,0))); b += strlen(b) +1; } }
    I've moved the Inline_Stack_Vars to the top as is my usual practice - though I don't think that matters here.
    And I've also added the BUILD_NOISY option so that any warnings from the compilation of the C code can be seen. (FAIK, the module might already have turned BUILD_NOISY on, in which case my re-iteration of it can be removed.)
    Not sure if any of that will help - if it doesn't you could try inserting printf("Got to A\n"); statements into the C code in order to locate the segfault more precisely.

    Cheers,
    Rob

      Hey Rob,

      Well... I'm not getting a segfault anymore! :)

      Now it get "Out of memory"...

      root@kitt:~/Ceph-RADOS-0.01# perl testrados2.pl connected Kbytes: 4880394048 Kbytes used: 1327705520 Kbytes avail: 3552688528 Objects: 164563 error(17) in create_pool:File exists create pool this_test_pool success Testing list_pools Out of memory!

      I'm not sure how to enable build_noisy, but this is what my use Inline::C declaration looks like:

      use Inline C => 'DATA', VERSION => '0.01', NAME => 'Ceph::RADOS', LIBS => '-L/usr/lib -lrados', INC => '-I/usr/include/rados', TYPEMAPS => 'lib/Ceph/types', BUILD_NOISY => 1;

      I think that's what you were getting at.

      Anyway, this is a giant stack trace, so I'm not sure if a) any of it's relevant and b) how to share it effectively. (I guess I can just paste it in a code block...)

      also the gdb seession is relatively useless:

      Starting program: /usr/bin/perl testrados2.pl [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so +.1". [New Thread 0x7ffff3901700 (LWP 9280)] [New Thread 0x7ffff298e700 (LWP 9281)] [New Thread 0x7ffff218d700 (LWP 9282)] [New Thread 0x7ffff198c700 (LWP 9283)] [New Thread 0x7ffff118b700 (LWP 9284)] [New Thread 0x7ffff098a700 (LWP 9285)] [New Thread 0x7ffff7fe7700 (LWP 9286)] [New Thread 0x7ffff0189700 (LWP 9289)] [New Thread 0x7fffe3fff700 (LWP 9290)] [New Thread 0x7fffe37fe700 (LWP 9291)] connected Kbytes: 4880394048 Kbytes used: 1327705520 Kbytes avail: 3552688528 Objects: 164563 error(17) in create_pool:File exists create pool this_test_pool success Testing list_pools Out of memory! [Thread 0x7ffff7fe7700 (LWP 9286) exited] [Thread 0x7fffe37fe700 (LWP 9291) exited] [Thread 0x7fffe3fff700 (LWP 9290) exited] [Thread 0x7ffff0189700 (LWP 9289) exited] [Thread 0x7ffff098a700 (LWP 9285) exited] [Thread 0x7ffff118b700 (LWP 9284) exited] [Thread 0x7ffff198c700 (LWP 9283) exited] [Thread 0x7ffff218d700 (LWP 9282) exited] [Thread 0x7ffff298e700 (LWP 9281) exited] [Thread 0x7ffff3901700 (LWP 9280) exited] [Inferior 1 (process 9276) exited with code 01]
      <pThanks for your help! At least I'm getting a different error. We're making progress! :)

        Greetings,
        Thanks for the additional info.

        While still groping in the dark a bit...
        BUILD_NOISY || VERBOSE || DEBUG = 1 || TRUE;
        Just thinking out loud.

        "Out of memory!"
        Any chance you can flush some, or all of this data to disk, and use it there? That would at least prevent memory exhaustion.
        Hmm...

        connected Kbytes: 4880394048 Kbytes used: 1327705520 Kbytes avail: 3552688528 Objects: 164563 error(17) in create_pool:File exists create pool this_test_pool success Testing list_pools Out of memory!
        error(17) in create_pool:File exists Is it possible that File is/has been created too early? This might mean that the buffers/pools are filling up, causing the memory exhaustion. As they are expecting File, which can't be operated on, because it can't be re-created.
        Best guess given the data I have available.

        HTH

        --Chris

        #!/usr/bin/perl -Tw
        use perl::always;
        my $perl_version = (5.12.5);
        print $perl_version;
Re: [Ceph::RADOS] Help Debugging Inline C
by oiskuu (Friar) on Nov 07, 2013 at 19:01 UTC
    First, continue with your posted C. Using Newx (a malloc) will just leak the memory.
    Second: put a debugging fprintf(stderr, "rados_pool_list()=%d\n", buf_sz); after the first rados_pool_list() call.

    What does it print? A reasonable positive size? (I assume your C compiler is C99 capable; if not, you can rewrite the char buf[buf_sz]; as char *buf = alloca(buf_sz);).

    Now, if rados_pool_list() returned a negative (error) value, try passing it "" in stead of NULL.
    A quick glance at librados.cc makes me think there may be a bug there.

    Oooomh, a quick update: another thing you should do is replace

      int buf_sz;
    
    with a
      size_t buf_sz;
    
    This, unfortunately will not improve the situation, since rados_pool_list() is prototyped as int, where it really ought to be ssize_t rados_pool_list(). File a ticket with the project?

      Using Newx (a malloc) will just leak the memory

      How would that happen ? Surely the allocated memory is being freed (Safefree(b)) before the function returns ?

      Cheers,
      Rob
        You are trying to free b which points somewhere into the allocated buffer. I hope you do make use of malloc debuggers :-)

        Also, if rados_pool_list() returned negative, you'd be allocating 4GB chunks something bogus (not sure what casts and/or checks are involved).

        Like I said, the rados code seems to be in flux. Take a look at librados.cc. Looks messy to me.

      Awesome! Thanks for this.

      The first thing I did was add the debugging string, which returned a negative number. Thanks for pointing out where in the original code the negative number was coming from.

      Ultimately, replacing the NULL with "" did the trick and I no longer get a segfault!

      Any thoughts on moving Inline_Stack_Vars to the beginning of the sub definition?

      That sounds like something that would be worth filing a bug report about. What is your reasoning behind the prototyping?

      Thanks for your help. If you have any more, general advice on writing a Perl wrapper around a C API I'd love to hear it. This is my first foray into trying to extend a C library so I've got a bit of learning to do :)

      Thanks again for your help!

        Regarding rados library and the rados_pool_list() — seeing that code instantly reminded me of snprintf.

        The snprintf/vsnprintf functions return n. of chars that would have been written with large enough buffer. The length argument is size_t, whereas return type is int... (Historical stdio matter?) SUSv2 and C99 are in disagreement concerning a snprintf call with len==0; with C99 allowing buf==NULL in that case.

        The ssize_t type allows a -1 error return where a size_t would otherwise be suitable. Like e.g. read() or pwrite.

        hi I am looking forward the "perl library" for RADOS. could you share the code with me. thanks
Re: [Ceph::RADOS] Help Debugging Inline C
by mlsorensen (Initiate) on May 21, 2014 at 05:52 UTC
    Hi guys. I just did a search and noticed people were actually using the proof of concept code I dumped. I have committed it into a github repo: https://github.com/mlsorensen/perl-Ceph-Rados And fixed the list pools as mentioned. It still needs some work, but github is a better place to collaborate if people are improving it.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1061509]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (4)
As of 2014-12-22 06:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (111 votes), past polls