Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Keeping the first occurence of a pattern, and removing the other occurences

by TStanley (Canon)
on Aug 04, 2011 at 15:42 UTC ( [id://918572]=perlquestion: print w/replies, xml ) Need Help??

TStanley has asked for the wisdom of the Perl Monks concerning the following question:

I have the following report:
^LSTORE 001 =============== DEPT: PRODUCE EXTEND + MRKDWN REASON EXT. MRKDWN ITEM DESCRIPTION SIZE QTY WGT RETAIL RETAIL + RETAIL CD DESCRIPTION LOSS VENDOR 0300008 28OZ FRT PLATTER/DIP 00028OZ 1 0.0 8.99 8.99 + 8.99 01 DAMAGED/UNSALEABLE 0.00 102827 0080948 EXPRESS FANCY GREENS 00007OZ 6 0.0 2.99 17.94 + 17.94 01 DAMAGED/UNSALEABLE 0.00 103128 0321855 CLAMSHL HYDRO BOSTON 00COUNT 1 0.0 1.99 1.99 + 1.99 01 DAMAGED/UNSALEABLE 0.00 104040 0058309 12OZ MONTEREY MROOM 00012OZ 1 0.0 2.29 2.29 + 2.29 01 DAMAGED/UNSALEABLE 0.00 105524 0058309 12OZ MONTEREY MROOM 00012OZ 1 0.0 2.29 2.29 + 2.29 01 DAMAGED/UNSALEABLE 0.00 105524 0084448 10OZ SPINACH PACK 12 00010OZ 1 0.0 1.69 1.69 + 1.69 01 DAMAGED/UNSALEABLE 0.00 107505 REASON CODE TOTAL: + 35.19 0.00 DEPT TOTALS: + 35.19 0.00 ^LSTORE 002 =============== DEPT: PRODUCE 0084508 2LB STRAWBERRIES 00002LB 20 0.0 3.69 73.80 + 73.80 01 DAMAGED/UNSALEABLE 0.00 101224 DEPT: PRODUCE EXTEND + MRKDWN REASON EXT. MRKDWN ITEM DESCRIPTION SIZE QTY WGT RETAIL RETAIL + RETAIL CD DESCRIPTION LOSS VENDOR
What I am trying to get rid of, is the second or more occurences of the department name. The first instance after the store name is fine, but the others are not. My thought is to split the file into records based on the form feed character, then work on each record. So the initial code would be:
#!/usr/bin/perl -w use strict; open my $IN,"<","ISC001" or die "Can not open ISC001: $!\n"; open my $OUT,">","ISC-OUT2" or die "Can not open ISC-OUT2: $!\n"; $/="^L"; while(<$IN>){ ...... } close $IN; close $OUT;
One thing consistent through the file is that the occurences of the department name that I need to remove occur before the header lines, so I'm guessing a regex similar to:
$_ =~s|DEPT:\s+PRODUCE\n{2,}(\s{63}EXTEND\s+MRKDWN\s+REASON\s+EXT. MR +KDWN\n)|$1|g;
would do what I need. Am I heading in the right direction with this guess, or am I going in the wrong direction?

TStanley
--------
People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf. -- George Orwell

Replies are listed 'Best First'.
Re: Keeping the first occurence of a pattern, and removing the other occurences
by thenaz (Beadle) on Aug 04, 2011 at 16:09 UTC

    Use a look-behind assertion to delete all occurrences that don't follow a store name heading.

    s/(?<!===\n)DEPT:\s+PRODUCE\n+/\n/g;

    Of course, I am assuming the input has UNIX-style line endings (newline character only). If not, you may need to replace "\n" with "\r\n" where appropriate.

Re: Keeping the first occurence of a pattern, and removing the other occurences
by johngg (Canon) on Aug 04, 2011 at 16:55 UTC

    Bear in mind that the "^L" that you are seeing is probably a text representation of a form feed control character so you might need to change your input record separator to something like "\cL", IIRC.

    Cheers,

    JohnGG

Re: Keeping the first occurence of a pattern, and removing the other occurences
by Marshall (Canon) on Aug 05, 2011 at 04:11 UTC
    I wouldn't split anything or make things more complicated than necessary. keep a %seen hash and start it over at the top of each page or STORE. If there are other "only want to see it once per store" things (like maybe column headers), just add a line similar to the DEPT line code.
    my %seen; while (<DATA>) { %seen = () if (/^\s*STORE/); #start over for each STORE #just /\f/ might be fine too next if $seen{$_}; $seen{$_}++ if /^DEPT/; #no more DEPT lines print; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://918572]
Approved by Perlbotics
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-04-19 22:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found