"be consistent"

Re: How best to strip text from a file?

by hperange (Beadle)
on Nov 02, 2012 at 03:36 UTC

in reply to How best to strip text from a file?

I think the regex you are looking for, in this particular case is:
/^\s+Order ID:([a-zA-Z0-9-]+)\s+fiscal cycle:(\d+)/

perlrequick perlre

To approach the original problem, I think you should develop a routine which will read a record into a buffer, and have a separate routine which will handle the parsing of one record. You can then use different routines within your parsing "framework" to handle the parsing of different structures.

Some pseudocode:
my $in_rec = 0; my ($head_re, $tail_re) = (qr/Start of record/, qr/End of record/); my @record; while (<>) { chomp; if ($in_rec) { if (/$tail_re/) { $in_rec = 0; parse_record(@record); } push @record, $_; } else { if (/$head_re/) { $in_rec = 1; @record = (); } push @record, $_ if $in_rec; } }

I hope this makes sense, also bear in mind this is only pseudocode, trying to demonstrate the logic I would go for, not actual parsing.

Replies are listed 'Best First'.
Re^2: How best to strip text from a file?
by bobdabuilda (Beadle) on Nov 07, 2012 at 02:47 UTC

    Thanks for that! I did actually grab a copy of your pseudocode in passing when I noticed your quick reply, so I could start mulling it over, to see how best to fit it in with what I'm doing.

    I think that that, combined with the code below, I should be able to sort something out. Thanks again :)

