Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Mail::Box fails miserably when trying to open 30_000 messages maildir

by monsieur_champs (Curate)
on Jun 26, 2005 at 02:52 UTC ( #469989=perlquestion: print w/ replies, xml ) Need Help??
monsieur_champs has asked for the wisdom of the Perl Monks concerning the following question:

Begginings

I'm working for a client that asked me to build something he can use to inspect a catch-all mailbox at his ISP linux box. My client is an organized and clever half-techie that understands little about perl and a lot about linux. He is using Debian stable distro, with perl 5.6.1 and a bunch of libs that I've asked him to install for my use.

Mail::Box

I've choosen Mail::Box because of its stability and powerfull control and broad range of supported formats.

At development, everything goes fine, and Mail::Box::Manager uses Mail::Box::Maildir and Mail::Box::Message to create me an ideal world where all works as expected: messages go back and forth, and I can see and handle all requirements.

Production Hell

Things change a lot at production (I have little access to production, so please take it easy! -- this is a business requirement from my client). The same program that works at my development environment and performs quite well fails miserably when facing the 30_000 (yes, that's four zeros on the right hand) messages on a single maildir folder. My main problem is that I don't have a formal failure: Mail::Box just leave open() telling everybody that there is no messages at this maildir folder(?!?!). I'm really confused about this error and can't figure out a good way to tell if I'm missing something really important or just need a good afternoon of debugging Perl internals.

I've wrote the following code to try to expose the fail. Hope I've setted all erros at the maximum noise level. Comments and related cases are welcome. My client will make tests in one or two days, and I will have more information to complement this post then.

#!/usr/bin/perl use strict; use warnings; use Getopt::Long; use Pod::Usage; use Mail::Box::Manager; my $options; GetOptions( 'mail-folder=s' => \$options->{folder}, 'dump-subject=s' => \$options->{dumpfile}, ); pod2usage( -message => "$0: syntax error: pay attention!\n\n", -exitval => 1, -verbose => 1, # Give "Synopsis" and "Arguments" -filehandle => \*STDERR, ) unless( ( $options->{folder} and -d $options->{folder} ) # or # ( $options->{dumpfile} and -f $options->{dumpfile} ) ); my $manager = new Mail::Box::Manager; my $folder; eval{ $folder = $manager->open( folder => $options->{folder}, create => 0, access => 'r', type => 'maildir', expand => 'LAZY', log => 'DEBUG', # adds a lot of noise trace => 'DEBUG', # adds a lot of noise ); }; die "Error opening maildir [$options->{folder}]: '$@'\n\n" if $@; print qq{Folder [} . $folder->name . qq{] aberto com [} . scalar @$folder . qq{] mensagens.\n\n}; if( $options->{dumpfile} ){ open DUMP, '>', $options->{dumpfile} or pod2usage( -message => qq{Can't create dumpfile: $!\n\n}, -exitval => 2, -verbose => 1, -filehandle => \*STDERR ); print( DUMP $_->subject(), $/ ) foreach @$folder; close DUMP or die qq{Can't close(!?!) dumpfile $options->{dumpfile}\n\n}; } # fi eval{ $folder->close }; die "Error closing maildir [".$folder->name."]: '$@'.\n\n" if $@; __END__ =pod =head1 NAME mail-box-test - Simple test to see if Mail::Box::Maildir is working +correctly. =head1 SYNOPSIS perl mail-box-test --mail-folder='/path/to/mail/dir/' [--dump-subjec +t=/path/to/subjects.txt] =head1 ARGUMENTS =over 4 =item --mail-folder <FOLDER> Points to the maildir you want to use for testing. No maildir means test failure, so please choose a maildir to test. =item --dump-subject <DUMPFILE> Force dumping of subject lines of a maildir to a specified file on the disk. =back =head1 DESCRIPTION This script tests a Mail::Box resource usage under linux for a client. I'm facing a funny problem when trying to open maildirs with more than 21,000 messages in it: Mail::Box is telling me its opening the maildir correctly but no messages are found inside it. =head1 AUTHOR Luis Campos de Carvalho, a.k.a. Monsieur Champs. mailto: monsieur_champs [at] yahoo [dot] com [dot] br =cut

Comment on Mail::Box fails miserably when trying to open 30_000 messages maildir
Select or Download Code
Re: Mail::Box fails miserably when trying to open 30_000 messages maildir
by fglock (Vicar) on Jun 26, 2005 at 05:53 UTC

    "30000" smells like "signed 16-bit overflow". This problem may even be in an underlying system library (because Mail::Box doesn't use XS).

    I'd start by investigating which mail box format the production environment uses.

      Production uses maildir only.

      Sorry, fglock, but I can't see the point. Why this smells like an overflow? File system is out of question, 30_000 files is large but not really a problem...

        Well, you know what happens when you presume, right? You make a pre of su and me.

        There are some numbers in computing, which are boundries, and can cause problems (sort of like that whole y2k issue).

        Near 30,000 is the number 32,768, which is 2**15. Now, you'd think to yourself, but wouldn't there be problems at 2**16, which is a nice round number in computer terms?

        Well, no, because for an integer of (x) bits, if it's signed, it ranges from (-1*(2**(x-1)-1) to 2**(x-1). So, for a 16 bit number, it goes from -32767 to 32768. If the module in question uses XS (compiled C code), it's possible that it was compiled with a 16 bit signed number in there, which will have problems if you try to deal with numbers greater than 32,768.

        If the number is exactly 30,000 or less, this probably isn't the issue. If it's over 32,768, this could be a problem.

        From looking at the docs for Mail::Box, however, it looks to be pure perl, so I don't think this is the issue in this case.

        still, it could be reasonable to write simple test script to open that dir and just count:
        opendir DIR, "that-dir"; my @files = readdir DIR; # or whatever methos Mail::Box uses print "count is ", $#files+1;
        Just to check if perl's readdir is out of question on your current build (with given libc and so on)
Re: Mail::Box fails miserably when trying to open 30_000 messages maildir
by xdg (Monsignor) on Jun 26, 2005 at 14:58 UTC

    I strongly suggest joining the Mail::Box mailing list (see perl.overmeer.net) and posting your issue to the list -- the module author, Mark Overmeer, and many knowledgeable users are pretty quick to respond.

    -xdg

    Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re: Mail::Box fails miserably when trying to open 30_000 messages maildir
by TedPride (Priest) on Jun 26, 2005 at 20:45 UTC
    Why not just write a small routine that tests the folder to see if there's more than x number of emails, and if so, moves the excess to a series of secondary folders, each of which can then by loaded without problems? I assume your objective is to filter the catch-all email and move the good messages to a different place, so separating the catch-all into several locations temporarily probably won't cause trouble.

    This would be classified as the "easy" way out. I know programmers love to do things the hard way.

      That's a nice workaround. Solving the problem the hard way will probably help others not to be biten in the future.

      Flavio
      perl -ple'$_=reverse' <<<ti.xittelop@oivalf

      Don't fool yourself.
Re: Mail::Box fails miserably when trying to open 30_000 messages maildir
by BrowserUk (Pope) on Jun 27, 2005 at 12:07 UTC

    An off-the-wall guess that I have no way to verify. Could it be that you are running out of memory? A quick browse of Mail::Box and it's associated modules leads me to believe that they form a quite heavily nested hierarchy of modules with each level adding another hash or two for each item. Large volumes of nested hashes, even when each individual leaf hash is quite small, can rapidly consume large volumes of space.

    On my system, Perl sometimes dies silently when it runs out of space.

    Maybe you could monitor programs memory usage when running it on this large directory?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
      as of my personal experience, 5.6.1 version was unable to get more than 1Gb of memory, whereas 5.8.x versions behaved betted with respect to this (but was much slower on getting memory, BTW), so I would agree with your reasonable point.

      Good point. But my test script stills printing all planned output, even that below the "count messages point". I suppose that an out-of-memory Perl shouldn't be able to print anything...

      I'm awaiting for the test script results, so I can tell you more details. Please, if you think that the posted script is not enought to expose the problem, tell me, and I will try to write a more precise test. Patches welcome, too.

        Hmm. Probably not memory then.

        I took a quick scan of the code in Mail::Box::Manager and notice something that might be relevant. I the code for M::B::M::open(), I see this:

        return if $require_failed{$class};

        and scanning back to see where $require_failed is being set and see this:

        unless($folder_type) { # Try to autodetect foldertype. foreach (@{$self->{MBM_folder_types}}) { next unless $_; (my $abbrev, $class, @defaults) = @$_; next if $require_failed{$class}; eval "require $class"; if($@) { $require_failed{$class}++; next; } if($class->foundIn($name, @defaults, %args)) { $folder_type = $abbrev; last; } } }

        I may be misinterpreting the code, but it looks to me that if it attempts to auto detect the folder type and then fails to require the module for the folder type it detects, it sets the flag to indicate the failure and skips on without logging an error. Then later, it checks the flag and if it is set, fails silently returning undef.

        Could it be misdetecting the folder type and failing silently as a result?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://469989]
Approved by tlm
Front-paged by tye
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2014-07-26 10:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (175 votes), past polls