http://www.perlmonks.org?node_id=1003540


in reply to Re^10: How best to strip text from a file?
in thread How best to strip text from a file?

Hi, bobdabuilda!

Am glad this is shaping up for you, and I think you've done well building on to this...

Here are a few items for you to consider...

Use another hash to keep track of previously-seen Orders. Without re-posting the entire code, here's what you can add to make this functional:

Add a %seen hash:

my ( @records, @orders, %seen );

Add a RECORD: label at the start of the line, in front of the for that iterates through each record (Order):

RECORD: for my $record (@records) { ...

Finally, use an if() statement to process the Order ID (instead of a single line like with the other fields):

if (/Order ID:(\S+)/) { next RECORD if $seen{$1}; $seen{$1}++; $hash{orderID} = $1; }

I suspect you can see what this does, but if an Order ID: match is found, it first checks to see if the ID's already been 'seen.' If so, it gets the next record for processing, else it tags the Order as seen and sets the hash.

Write to your Excel file from within your subroutine. 1) You've got a complete record by the time the subroutine is called in the script (see its current position). You can send your subroutine the row number:

writeToSpreadSheet( $row, \%hash );

Then, in the subroutine:

sub writeToSpreadSheet { my ($row, $hashReference) = @_; ...

2) By writing to the Excel spread sheet from within the subroutine, you're somewhat modularizing your program by keeping its functionality separate.

Let me know how your spread sheet writing is going or if you encounter any other issues...

Replies are listed 'Best First'.
Re^12: How best to strip text from a file?
by bobdabuilda (Beadle) on Nov 13, 2012 at 03:54 UTC

    Hmmm yeah, ok... I see how that works. Will have a play with it and work out how best to get it to match both an Order ID and a Title at the same time to ensure matching record/title combo's are skipped.

    You mention in your comments that printing from a subroutine isn't a good idea. Would you mind explaining the reasoning for that (combined with the fact you're telling me to do it here lol) please?

      My apologies, as I think I may have misunderstood you about Order ID/Title. If you're looking to skip all orders that have the same Order ID and Title, you can use a similar construct as before except at the title-matching line:

      if(/Title:(.+)/){ next RECORD if $seen{$hash{orderID}}{$1}; $seen{$hash{orderID}}{$1}++; $hash{title} = $1; }

      The above builds a hash of hashes, like this:

      $hash{OrderID}{title}

      This will effectively keep track whether an identical Order ID and title has been previously seen.

      You can then restore the OrderID line to its former self:

      $hash{orderID} = $1 if !defined $hash{orderID} and /Order ID:(\ +S+)/;

      You mention in your comments that printing from a subroutine isn't a good idea.

      I meant printing to the console. Yet, I didn't find anything about this being bad practice when searching for it, so I need to reevaluate my position on this. Nevertheless, writing to a spread sheet from within a subroutine is just fine (and this sounds inconsistent with printing not being fine, so I appreciate you asking me about it).

        Hmmm... I'll have to go back and have a play with what you've suggested above. Before I noticed you'd come back again, I went off and had a play as promised, and came up with the following change to deal with the Order/Title issue :

        if (/Distribution--/) { # Check to see if the Order/Title combo has been seen before if ($seen{$$hashReference{orderID},$$hashReference{title}}) { # Visual prompt for debugging print "Bollox\n"; next RECORD; } # End if else { # Add the "unseen" Order/Title combo to the hash $seen{$$hashReference{orderID},$$hashReference{title}}++; #Write the Order "header" info to the spreadsheet $worksheet->write($row,0,$$hashReference{fiscalCycle}); $worksheet->write($row,1,$$hashReference{vendorID}); $worksheet->write($row,2,$$hashReference{orderID}); $worksheet->write($row,3,$$hashReference{requisitionNum}); $worksheet->write($row,4,$$hashReference{title}); $worksheet->write($row,5,$$hashReference{'ISBN/ISSN'}); $worksheet->write($row,6,$$hashReference{copies});

        And it SEEMS to be working ok so far - but that's only with a small subset of the data. About to branch out and run it on a larger subset and see how it goes with regard to both results and performance... or at least I will do once I fix the server I need to get the data from, which appears to have halted a couple of core services needed for me to get to it... never a dull moment ;)

        As you can see I've not gotten a chance as yet to re-visit how/when I am doing the writing to Excel, but I'll do that once I've confirmed the functionality of the script as a whole, and once I've had a bit of a play with your suggestions also.

        Thanks once again for the help... it's been VERY valuable and very much appreciated. I'm sure I could have done it without you... eventually... but to be frank - I don't have enough hair left to be able to spare what it would have cost me ;)