Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: regex on gigabyte string

by BrowserUk (Pope)
on Jan 26, 2013 at 18:27 UTC ( #1015522=note: print w/ replies, xml ) Need Help??


in reply to regex on gigabyte string

Whilst loading strings > 4GB is no problem on a 64-bit Perl (assuming you have the memory), unfortunately, there are still many places in the core where such huge strings are simply not supported.

Two examples:

  1. substr doesn't accept offsets > 2GB
  2. Regexes don't operate on strings > 2GB.

Its a pain in the lower lumbar region, but probably won't change any time soon.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re: regex on gigabyte string
Re^2: regex on gigabyte string
by focusonz (Initiate) on Jan 26, 2013 at 19:34 UTC

    Whoa back!

    I am using a construct of if( substr($bigstring, $begtagidx, 5) eq "<c r=" ) Where $begtagidx is out to 4 billion and have not seen problem.

    But the data verification process is not yet terminated so I will have to get back to you cloistered people on this.

    thanks for the pearls of scripture !
      Where $begtagidx is out to 4 billion and have not seen problem.

      Okay. It seems that limitation has been lifted with 5.16 (I still use 5.10.1 as my primary Perl where it is the case):

      say $];; 5.016001 $s = 'fred'; $s x= 1024**3;; print substr( $s, -4 );; fred

      But the 2GB limit on regex still persists in 5.16:

      [19:51:25.70] C:\test>\perl64-16\bin\perl \perl64\bin\p1.pl [0] Perl> say $];; 5.016001 [0] Perl> $s = 'fred'; $s x= 1024**3;; [0] Perl> ++$n while $s =~ /fred/g; say $n;; Use of uninitialized value $n in say at (eval 9) line 1, <STDIN> line +3. [0] Perl> $s = 'fr'; $s x= 1024**3;; [0] Perl> ++$n while $s =~ /fr/g; say $n;; Use of uninitialized value $n in say at (eval 11) line 1, <STDIN> line + 5. [0] Perl> $s = 'fr'; $s x= 1020**3;; [0] Perl> ++$n while $s =~ /fr/g; say $n;; 1061208000 [0] Perl>

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1015522]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (8)
As of 2014-08-01 07:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (257 votes), past polls