Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: Multi-line Regex Performance

by pboin (Deacon)
on Nov 01, 2005 at 15:52 UTC ( #504616=note: print w/replies, xml ) Need Help??


in reply to Re: Multi-line Regex Performance
in thread Multi-line Regex Performance

The key for checking whether one of the multi-line records is a keeper or not is at position 19 for a length of ten on the first line (segment '01' in our parlance.)

There could be hash characters in some of the fields -- many of them are freeform and take addresses, comments, etc. There will not be any in the first position though, other than the ones that denote new records. Thanks sauoq.

Replies are listed 'Best First'.
Re^3: Multi-line Regex Performance
by sauoq (Abbot) on Nov 01, 2005 at 16:27 UTC
    The key for checking whether one of the multi-line records is a keeper or not is at position 19 for a length of ten on the first line

    This doesn't really tell us anything that the substr() hadn't already told us. The thing is, we can't reliably count whitespace in the data you provided.

    But anyway, since what you want is always on that first line, it's probably a lot easier (and more efficient than using a regular expression) to just read the data line by line ignoring lines that don't match /^##/ and doing what you want with the ones that do. This would have the added benefit of not keeping 300+MB in RAM.

    while (<>) { next unless /^##/; my $key = substr $_, 19, 10; do_stuff_with($key); }

    -sauoq
    "My two cents aren't worth a dime.";
    

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://504616]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (1)
As of 2021-07-26 02:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?