Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

maximum number of lines for negative lookahead assertion (?!)

by lupey (Monk)
on May 03, 2005 at 20:41 UTC ( [id://453725]=perlquestion: print w/replies, xml ) Need Help??

lupey has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm using the negative lookahead assertion in a regular expression to parse tokens of a text file that start with a '>'. Some of the tokens can be very long like 500, 1000 or up to 2000 lines long. It seems that the negative lookahead assertion fails on tokens that are too many lines long. For example, if you run the following code on a MacOSX (1 GB RAM), Perl version 5.6.0 or on a Cygwin/Windows 2000 (512 MB RAM) Perl version 5.8.6, the program crashes at line 858 and line 1547, respectively. Is this a bug or it is the limit of the negative lookahead assertion?
#!/usr/bin/perl use strict; # initially, create a scalar that is 1 line long my $line = ">"."\n" x 1; my $incr = 1; while (1) { # run until it crashes while ($line =~ /^(>.*)\n(^(?!>).*\n)+/gm) { print "Number of lines: ", length($line)-1,"\n"; } # add another line $line .= "\n" x $incr; }

Replies are listed 'Best First'.
Re: maximum number of lines for negative lookahead assertion (?!)
by hubb0r (Pilgrim) on May 04, 2005 at 04:16 UTC
    Just ran it on my laptop:

    p3-450 256MB ram running ubuntu linux with perl: This is perl, v5.8.4 built for i386-linux-thread-multi
    and it segfaulted at line 10482:
    Number of lines: 10480 Number of lines: 10481 Number of lines: 10482 Segmentation fault

    Also ran on one of my desktops: Athlon XP 2400+ w/ 768Mb ram running FC3 with perl: This is perl, v5.8.5 built for i386-linux-thread-multi
    and it segfaulted at 10079:
    Number of lines: 10077 Number of lines: 10078 Number of lines: 10079 Segmentation fault

    And on one of my servers: Dual Athlon MP 2.1 Ghz w/ 2Gb ram running FC3 with perl: This is perl, v5.8.3 built for i386-linux-thread-multi
    and it segfaulted at 15120:
    Number of lines: 15118 Number of lines: 15119 Number of lines: 15120 Segmentation fault

    Note: Multiple runs on different machines produced the same results on each machine (I would hope so!) but I'm wondering what would produce such wildly different results on somewhat similar architectures?
Re: maximum number of lines for negative lookahead assertion (?!)
by ikegami (Patriarch) on May 03, 2005 at 21:29 UTC

    I ran your program until 5000 without problem (at which point I killed it).

    Win2k, 512MB of RAM, ActivePerl v5.6.1.

    I even made sure no line numbers were skipped:

    >perl 453725.pl > ! ^C >perl -e "printf """Number of lines: %s\n""", $_ for 2..5000" > @ >fc ! @ Comparing files ! and @ FC: no differences encountered
      ha, I'm a little more patient:-), I ran more than 10,000, then killed it, it was going well, uses about 10mb memory: WinXP/Cygwin, 512 Mb, ActivePerl 5.8.3
Re: maximum number of lines for negative lookahead assertion (?!)
by johnnywang (Priest) on May 03, 2005 at 22:00 UTC
    Actually, can someone (OP?) comment on the regex? I'm a little shaky on how it's matching, and what the inner "while" does? thanks.

      It simply matches a line starting with '>', followed by one or more lines that do not start with '>'. If the input had multiple lines starting with '>' -- it only has one in this program -- it would do the body of the inner while for every one of them.

      Keep in mind '^' means start of line, not start of input, when /m is used.

      It sounds like someone is trying to parse FASTA files.

        what does the variable $line catch? there are two parathes besides the non-capturing one. From the output it's the number of lines, is that what the /g does? then why need the while?
Re: maximum number of lines for negative lookahead assertion (?!)
by grinder (Bishop) on May 04, 2005 at 07:24 UTC

    I'm not really surprised that 5.6.0 comes up in the context of a weird crash. More unsettling is the fact that 5.8.6 also displays the behaviour.

    I tried running your code on a FreeBSD 4.10 system, and pushed your program out to 30000+ lines without crashing, using 5.005_03, 5.6.2, 5.8.6 and the latest bleadperl.

    I note with interest that 5.005_03 is the fastest, 5.6.2 is a bit slower, 5.8.6 slower still, and bleadperl slowest of all. This is probably due to the fact that I have DEBUGGING defined, or perhaps demerphq's recent trie additions.

    I don't suppose that your Cygwin or Mac builds of Perl do something extra, like running in utf-8 by default?

    - another intruder with the mooring in the heart of the Perl

      How do I find out if my Cygwin or Mac Perl runs utf-8 by default?

      Thanks,

      Paul
Re: maximum number of lines for negative lookahead assertion (?!)
by wazoox (Prior) on May 04, 2005 at 10:14 UTC
    I've run your script on an FC3 machine and it crashed at 8864; however on a Mandrake 9.0 machine running perl 5.8.0, it's still running and it reached 25000 so far. My best guess is something's going wrong with UTF-8...
    update: Enough done, I killed it at 55264.
Re: maximum number of lines for negative lookahead assertion (?!)
by lupey (Monk) on May 04, 2005 at 11:30 UTC
    Thank you everybody for your replies and testing my code.

    I tried it on a 333MHz Celeron (PII) processor running Active State Perl 5.6.1 and it ran fine until I stopped it at 15000. Now after reading your replies, I wonder how far it would have gone if I had let it go.

    I've tried the code on a UNIX timeshare (SunOS 5.8 Generic_117350-04 sun4u sparc SUNW,Ultra-4) running Perl 5.8.0
    Number of lines: 8801 Segmentation Fault

    The program dumped core on the UNIX machine but not on my PCs or Macs. Can I use the core in any way to determine what the problem is?

    Thank you,

    Paul
Re: maximum number of lines for negative lookahead assertion (?!)
by dynamo (Chaplain) on May 03, 2005 at 20:51 UTC
    It can't be crashing on line 858 and 1547, your program is only 17 lines long.

    But, I know that what you really meant is that the negative lookahead assertion is crashing on line 858 and 1547 of the _input_ file.

    So, what's on these lines?

    Can you use different (shorter) input with most of the stuff outside of the offending lines cut out, to isolate the problem?

    What exactly is the error message it gives you when it crashes?

      Did you look at his program? It builds the input file that causes the crash.
        I missed that detail.. oops.

        So anyway, I ran the script on my machine.. and no crashes happened.

        Stopped it at about 11,000.

        MacOS 10.3.9, perl 5.8.4, Camelbones Developer pkg installed (which changes the perl config a bit).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://453725]
Approved by tlm
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-07-19 19:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.