Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: How best to strip text from a file?

by hperange (Beadle)
on Nov 02, 2012 at 03:36 UTC ( #1001910=note: print w/ replies, xml ) Need Help??


in reply to How best to strip text from a file?

I think the regex you are looking for, in this particular case is:

/^\s+Order ID:([a-zA-Z0-9-]+)\s+fiscal cycle:(\d+)/

perlrequick perlre

To approach the original problem, I think you should develop a routine which will read a record into a buffer, and have a separate routine which will handle the parsing of one record. You can then use different routines within your parsing "framework" to handle the parsing of different structures.

Some pseudocode:
my $in_rec = 0; my ($head_re, $tail_re) = (qr/Start of record/, qr/End of record/); my @record; while (<>) { chomp; if ($in_rec) { if (/$tail_re/) { $in_rec = 0; parse_record(@record); } push @record, $_; } else { if (/$head_re/) { $in_rec = 1; @record = (); } push @record, $_ if $in_rec; } }

I hope this makes sense, also bear in mind this is only pseudocode, trying to demonstrate the logic I would go for, not actual parsing.


Comment on Re: How best to strip text from a file?
Select or Download Code
Re^2: How best to strip text from a file?
by bobdabuilda (Sexton) on Nov 07, 2012 at 02:47 UTC

    Thanks for that! I did actually grab a copy of your pseudocode in passing when I noticed your quick reply, so I could start mulling it over, to see how best to fit it in with what I'm doing.

    I think that that, combined with the code below, I should be able to sort something out. Thanks again :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1001910]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (16)
As of 2014-07-23 20:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (152 votes), past polls