Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: regex on gigabyte string

by BrowserUk (Pope)
on Jan 26, 2013 at 18:27 UTC ( #1015522=note: print w/ replies, xml ) Need Help??


in reply to regex on gigabyte string

Whilst loading strings > 4GB is no problem on a 64-bit Perl (assuming you have the memory), unfortunately, there are still many places in the core where such huge strings are simply not supported.

Two examples:

  1. substr doesn't accept offsets > 2GB
  2. Regexes don't operate on strings > 2GB.

Its a pain in the lower lumbar region, but probably won't change any time soon.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re: regex on gigabyte string
Re^2: regex on gigabyte string
by focusonz (Initiate) on Jan 26, 2013 at 19:34 UTC

    Whoa back!

    I am using a construct of if( substr($bigstring, $begtagidx, 5) eq "<c r=" ) Where $begtagidx is out to 4 billion and have not seen problem.

    But the data verification process is not yet terminated so I will have to get back to you cloistered people on this.

    thanks for the pearls of scripture !
      Where $begtagidx is out to 4 billion and have not seen problem.

      Okay. It seems that limitation has been lifted with 5.16 (I still use 5.10.1 as my primary Perl where it is the case):

      say $];; 5.016001 $s = 'fred'; $s x= 1024**3;; print substr( $s, -4 );; fred

      But the 2GB limit on regex still persists in 5.16:

      [19:51:25.70] C:\test>\perl64-16\bin\perl \perl64\bin\p1.pl [0] Perl> say $];; 5.016001 [0] Perl> $s = 'fred'; $s x= 1024**3;; [0] Perl> ++$n while $s =~ /fred/g; say $n;; Use of uninitialized value $n in say at (eval 9) line 1, <STDIN> line +3. [0] Perl> $s = 'fr'; $s x= 1024**3;; [0] Perl> ++$n while $s =~ /fr/g; say $n;; Use of uninitialized value $n in say at (eval 11) line 1, <STDIN> line + 5. [0] Perl> $s = 'fr'; $s x= 1020**3;; [0] Perl> ++$n while $s =~ /fr/g; say $n;; 1061208000 [0] Perl>

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1015522]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (10)
As of 2015-07-02 04:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (27 votes), past polls