Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Alright! Getting somewhere. Worked out part of the issue was based around the location of the WriteExcel block set up to write the Order details - placed it inside the "if (/Distribution--/) {" loop, and it sorted that issue out nicely. Makes sense now that I look back on it, as AFTER that distribution matches, it's finished processing all of the Order header fields - whereas prior to that, the first few times it hit the WriteExcel stuff, it was only partially processed.

So, now it's trimming the Order list down nicely. TOO nicely. On looking back over the data, I noticed that each order had potential of having more than one title - so I was skipping data that should be kept.

So, added in another check on the title, and that's not behaving as expected... of course...

With this latest version, and the example data provided, the output I am expecting is 2 order entries for PC-9999, and 1 for PC-1111. 2 entries for the first, because there are 3 entries, but 2 of those 3 are duplicates (3rd one I added a "2" onto the title to make it different). However, it's only printing out one order for each... and I can't work out why...

Any suggestions on how to track the issue down please?

use strict; use warnings; use Data::Dumper; use Spreadsheet::WriteExcel; # Place a filename into $recordsFile to read Orders from that file # else the Orders below __DATA__ will be used for demo purposes #my $recordsFile = 'finished_report_sample.txt'; my $recordsFile = ''; my ( @records, @orders ); my $recSeparator = 'Order ID:'; # Orders will initially be array elements 1 .. n in @orders; element 0 + is initially the first page header { # Set the record separator local $/ = $recSeparator; # If there's a file name, try to read from that file if ($recordsFile) { open my $fh, '<', $recordsFile or die $!; @records = <$fh>; close $fh; } # End If else { @records = <DATA>; } # End Else } # End preparatory loop # Remove the first page header shift @records; # Add Order ID: back into each record for later matching $_ = "$recSeparator$_" for @records; ########## Added for writing to Excel # Open a new xls file then create a sheet my $workbook = Spreadsheet::WriteExcel->new('distlist.xls'); my $worksheet= $workbook->add_worksheet(); # Write headings $worksheet->write(0,0,'Fiscal Year'); $worksheet->write(0,1,'Vendor'); $worksheet->write(0,2,'PO Number'); $worksheet->write(0,3,'Orderline'); $worksheet->write(0,4,'Title'); $worksheet->write(0,5,'ISBN/ISSN'); $worksheet->write(0,6,'# copies for Title'); $worksheet->write(0,7,'Distribution'); $worksheet->write(0,8,'Date Received'); $worksheet->write(0,9,'Date Loaded'); $worksheet->write(0,10,'Number of Copies'); # Initialise spreadheet counters my $row=1; my $column=0; # Set this up ready for checking for duplicate orders my $previousOrder=""; my $previousTitle=""; # Iterate through each record (Order) for my $record (@records) { my %hash; # For testing WriteExcel # $row+=1; # Treat the record string like a file, opening it for reading open my $sh, '<', \$record or die "Unable to open record string: $ +!"; # Read the string like a file, one line at a time now while (<$sh>) { $hash{orderID} = $1 if !defined $hash{orderID} and /Ord +er ID:(\S+)/; $hash{fiscalCycle} = $1 if !defined $hash{fiscalCycle} and +/cycle:(\d+)/; $hash{vendorID} = $1 if !defined $hash{vendorID} and /Ve +ndor ID:(\S+)/; $hash{requisitionNum} = $1 if !defined $hash{requisitionNum} a +nd /\s+(\d+).+requisition/; $hash{copies} = $1 if !defined $hash{copies} and /copi +es:(\d+)/; $hash{'ISBN/ISSN'} = $1 if !defined $hash{'ISBN/ISSN'} and +m{ISBN/ISSN:(\S+)}; $hash{title} = $1 if !defined $hash{title} and /Title +:(.+)/; my ($hashReference) = \%hash; # Had to put this in to suppress warnings about $$hashReference{ti +tle} not being populated yet during the loop no warnings 'uninitialized'; # Check to see if it's a repeat order and title, skip if it is. if (($previousOrder eq $$hashReference{orderID}) && ($previousOrde +r ne "")){ if (($previousTitle eq $$hashReference{title}) && ($previousTi +tle ne "")) { print "Order: $previousOrder HashOrder: $$hashReference{ord +erID} Title: $previousTitle HashTitle: $$hashReference{title} \n"; print "Order already processed. Skipping...\n"; last; } # End if } # End If else { # Distributions started? if (/Distribution--/) { $worksheet->write($row,0,$$hashReference{fiscalCycle}); $worksheet->write($row,1,$$hashReference{vendorID}); $worksheet->write($row,2,$$hashReference{orderID}); $worksheet->write($row,3,$$hashReference{requisitionNum}); $worksheet->write($row,4,$$hashReference{title}); $worksheet->write($row,5,$$hashReference{'ISBN/ISSN'}); $worksheet->write($row,6,$$hashReference{copies}); # Set the current title and Order number for duplicate + checking $previousOrder = $$hashReference{orderID}; $previousTitle = $$hashReference{title}; # Save the current record separator my $oldRecSeparator = $/; # Set a new record separator local $/ = 'Distribution--'; # Read the string like a file, a distribution 'chunk' +at a time while (<$sh>) { #I realise this hashing is now superfluous, with data bein +g written # direct to Excel, but am keeping changes to a minimum until + I get # the overall functionality correct. my %tempHash; ( $tempHash{holdingCode} ) = /code:(\S+)/; ( $tempHash{copies} ) = /copies:(\d+)/; ( $tempHash{dateReceived} ) = /received:(\S+)/; ( $tempHash{dateLoaded} ) = /loaded:(\S+)/; $worksheet->write($row,7,$tempHash{holdingCode}); $worksheet->write($row,8,$tempHash{dateReceived}); $worksheet->write($row,9,$tempHash{dateLoaded}); $worksheet->write($row,10,$tempHash{copies}); $row+=1; push @{ $hash{distribution} }, \%tempHash; } # End While # Restore the old record separator $/ = $oldRecSeparator; } # End If } # End Else } # End While # Work with the filled-in %hash by sending a reference to it to a +subroutine # This is a complete record # writeToSpreadSheet( \%hash ); # print Dumper \%hash; # Done 'reading' the string close $sh; } # End For - last of the loops $workbook->close(); # Printing in a subroutine's not a good idea, but done here only to sh +ow how to access the hash #sub writeToSpreadSheet { # my ($hashReference) = @_; # # The $$ notation dereferences the hash reference # print $$hashReference{vendorID}, "\n"; # # The @{} notation deferences the array reference; the arrow opera +tor deferences to get hash value # for my $distribution ( @{ $$hashReference{distribution} } ) { # print $distribution->{holdingCode}, "\n"; # } # # print "\n"; #} __DATA__ List of Distributions Produced Tuesday, 9 October, 2012 at 1:38 PM Order ID:PO-9999 fiscal cycle:21112 Vendor ID:VEND99 order type:SUBSCRIPT 15) requisition number: copies:9 call number:XX(9999999.999) ISBN/ISSN:9999-999X Title:Item title here. ISSN:9999-999X Publication info:More text here about stuff Distribution-- packing list:STUFF-I-DONT-NEED-999 holding code:CODEINFO1 copies:1 date received:27/6/2012 date lo +aded:27/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-999 holding code:CODEINFO3 copies:2 date received:27/9/2012 date lo +aded:27/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-999 holding code:CODEINFO2 copies:1 date received:25/8/2012 date lo +aded:27/6/2012 Order ID:PO-9999 fiscal cycle:21112 Vendor ID:VEND99 order type:SUBSCRIPT 15) requisition number: copies:9 call number:XX(9999999.999) ISBN/ISSN:9999-999X Title:Item title here. ISSN:9999-999X Publication info:More text here about stuff Distribution-- packing list:STUFF-I-DONT-NEED-999 holding code:CODEINFO1 copies:1 date received:27/6/2012 date lo +aded:27/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-999 holding code:CODEINFO3 copies:2 date received:27/9/2012 date lo +aded:27/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-999 holding code:CODEINFO2 copies:1 date received:25/8/2012 date lo +aded:27/6/2012 Order ID:PO-9999 fiscal cycle:21112 Vendor ID:VEND99 order type:SUBSCRIPT 15) requisition number: copies:9 call number:XX(9999999.999) ISBN/ISSN:9999-999X Title:Item title here 2. ISSN:9999-999X Publication info:More text here about stuff Distribution-- packing list:STUFF-I-DONT-NEED-999 holding code:CODEINFO1 copies:1 date received:27/6/2012 date lo +aded:27/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-999 holding code:CODEINFO3 copies:2 date received:27/9/2012 date lo +aded:27/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-999 holding code:CODEINFO2 copies:1 date received:25/8/2012 date lo +aded:27/6/2012 List of Distributions Produced Tuesday, 9 October, 2012 at 1:38 PM Order ID:PO-1111 fiscal cycle:21112 Vendor ID:VEND11 order type:SUBSCRIPT 15) requisition number: copies:417 call number:XX(11111111.111) ISBN/ISSN:1111-111X Title:Item title here. ISSN:9999-999X Publication info:More text here about stuff Distribution-- packing list:STUFF-I-DONT-NEED-111 holding code:CODEINFO9 copies:5 date received:11/6/2012 date lo +aded:12/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-111 holding code:CODEINFO8 copies:4 date received:11/9/2012 date lo +aded:12/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-111 holding code:CODEINFO7 copies:3 date received:11/8/2012 date lo +aded:12/6/2012 Distribution-- packing list:STUFF-I-DONT-NEED-111 holding code:CODEINFO6 copies:2 date received:11/8/2012 date lo +aded:12/6/2012

In reply to Re^10: How best to strip text from a file? by bobdabuilda
in thread How best to strip text from a file? by bobdabuilda

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others taking refuge in the Monastery: (9)
    As of 2014-09-18 13:19 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (115 votes), past polls