Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
I have the following report:
^LSTORE 001 =============== DEPT: PRODUCE EXTEND + MRKDWN REASON EXT. MRKDWN ITEM DESCRIPTION SIZE QTY WGT RETAIL RETAIL + RETAIL CD DESCRIPTION LOSS VENDOR 0300008 28OZ FRT PLATTER/DIP 00028OZ 1 0.0 8.99 8.99 + 8.99 01 DAMAGED/UNSALEABLE 0.00 102827 0080948 EXPRESS FANCY GREENS 00007OZ 6 0.0 2.99 17.94 + 17.94 01 DAMAGED/UNSALEABLE 0.00 103128 0321855 CLAMSHL HYDRO BOSTON 00COUNT 1 0.0 1.99 1.99 + 1.99 01 DAMAGED/UNSALEABLE 0.00 104040 0058309 12OZ MONTEREY MROOM 00012OZ 1 0.0 2.29 2.29 + 2.29 01 DAMAGED/UNSALEABLE 0.00 105524 0058309 12OZ MONTEREY MROOM 00012OZ 1 0.0 2.29 2.29 + 2.29 01 DAMAGED/UNSALEABLE 0.00 105524 0084448 10OZ SPINACH PACK 12 00010OZ 1 0.0 1.69 1.69 + 1.69 01 DAMAGED/UNSALEABLE 0.00 107505 REASON CODE TOTAL: + 35.19 0.00 DEPT TOTALS: + 35.19 0.00 ^LSTORE 002 =============== DEPT: PRODUCE 0084508 2LB STRAWBERRIES 00002LB 20 0.0 3.69 73.80 + 73.80 01 DAMAGED/UNSALEABLE 0.00 101224 DEPT: PRODUCE EXTEND + MRKDWN REASON EXT. MRKDWN ITEM DESCRIPTION SIZE QTY WGT RETAIL RETAIL + RETAIL CD DESCRIPTION LOSS VENDOR
What I am trying to get rid of, is the second or more occurences of the department name. The first instance after the store name is fine, but the others are not. My thought is to split the file into records based on the form feed character, then work on each record. So the initial code would be:
#!/usr/bin/perl -w use strict; open my $IN,"<","ISC001" or die "Can not open ISC001: $!\n"; open my $OUT,">","ISC-OUT2" or die "Can not open ISC-OUT2: $!\n"; $/="^L"; while(<$IN>){ ...... } close $IN; close $OUT;
One thing consistent through the file is that the occurences of the department name that I need to remove occur before the header lines, so I'm guessing a regex similar to:
$_ =~s|DEPT:\s+PRODUCE\n{2,}(\s{63}EXTEND\s+MRKDWN\s+REASON\s+EXT. MR +KDWN\n)|$1|g;
would do what I need. Am I heading in the right direction with this guess, or am I going in the wrong direction?

TStanley
--------
People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf. -- George Orwell

In reply to Keeping the first occurence of a pattern, and removing the other occurences by TStanley

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others chanting in the Monastery: (12)
    As of 2019-06-26 17:53 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      Is there a future for codeless software?



      Results (110 votes). Check out past polls.

      Notices?