Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Problem with a regex?

by Jim (Curate)
on Jul 15, 2011 at 17:11 UTC ( [id://914660]=note: print w/replies, xml ) Need Help??


in reply to Problem with a regex?

The ^L is the FORM FEED control character. It's used to separate pages ("records") of the report.

You can probably split on the FORM FEED character rather than on the text of the report header. Better yet, don't slurp the entire "large report" into memory, but instead process each report page one at a time by setting $/ ($INPUT_RECORD_SEPARATOR) to the FORM FEED character "\f".

#!/usr/bin/perl use strict; use warnings; use autodie qw( open close ); use English qw( -no_match_vars ); # Report pages are separated by FORM FEED control characters local $INPUT_RECORD_SEPARATOR = "\f"; open my $report, '<', 'QISC001'; while (my $page = <$report>) { # Parse and transform each report page here... } close $report; exit 0;

Jim

UPDATE: You mentioned you're splitting the report into separate "stores." I presume this means you're carving the report into individual files, one per page. This script is untested, but it illustrates some general ideas you might find useful.

#!/usr/bin/perl use strict; use warnings; use autodie qw( open close ); use English qw( -no_match_vars ); @ARGV == 1 or die "Usage: perl $PROGRAM_NAME <report file>\n"; # Report pages are separated by FORM FEED control characters local $INPUT_RECORD_SEPARATOR = "\f"; my $report_file = shift @ARGV; open my $report_fh, '<', $report_file; while (my $page = <$report_fh>) { my ($page_number, $store_number, $post_date) = $page =~ m{ PAGE:\s+(\d+) .+? STORE:\s+(\d+) .+? POST\s+DATE:\s+(\d\d/\d\d/\d\d\d\d) }msx; # For example, 07/14/2011 => 20110714 $post_date =~ s{(\d\d)/(\d\d)/(\d\d\d\d)}{$3$1$2}; # For example, 20110714-001-001.rpt my $page_file = sprintf "%s-%03d-%03d.rpt", $post_date, $store_number, $page_number; open my $page_fh, '>', $page_file; print {$page_fh} $page; close $page_fh; } close $report_fh; exit 0;

Replies are listed 'Best First'.
Re^2: Problem with a regex?
by TStanley (Canon) on Jul 15, 2011 at 18:22 UTC
    This did the trick. Thanks for your help.

    TStanley
    --------
    People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf. -- George Orwell

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://914660]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (2)
As of 2026-05-14 00:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.