Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re^13: How best to strip text from a file?

by Kenosis (Priest)
on Nov 13, 2012 at 04:30 UTC ( #1003551=note: print w/ replies, xml ) Need Help??

in reply to Re^12: How best to strip text from a file?
in thread How best to strip text from a file?

My apologies, as I think I may have misunderstood you about Order ID/Title. If you're looking to skip all orders that have the same Order ID and Title, you can use a similar construct as before except at the title-matching line:

if(/Title:(.+)/){ next RECORD if $seen{$hash{orderID}}{$1}; $seen{$hash{orderID}}{$1}++; $hash{title} = $1; }

The above builds a hash of hashes, like this:


This will effectively keep track whether an identical Order ID and title has been previously seen.

You can then restore the OrderID line to its former self:

$hash{orderID} = $1 if !defined $hash{orderID} and /Order ID:(\ +S+)/;

You mention in your comments that printing from a subroutine isn't a good idea.

I meant printing to the console. Yet, I didn't find anything about this being bad practice when searching for it, so I need to reevaluate my position on this. Nevertheless, writing to a spread sheet from within a subroutine is just fine (and this sounds inconsistent with printing not being fine, so I appreciate you asking me about it).

Comment on Re^13: How best to strip text from a file?
Select or Download Code
Re^14: How best to strip text from a file?
by bobdabuilda (Sexton) on Nov 13, 2012 at 05:38 UTC

    Hmmm... I'll have to go back and have a play with what you've suggested above. Before I noticed you'd come back again, I went off and had a play as promised, and came up with the following change to deal with the Order/Title issue :

    if (/Distribution--/) { # Check to see if the Order/Title combo has been seen before if ($seen{$$hashReference{orderID},$$hashReference{title}}) { # Visual prompt for debugging print "Bollox\n"; next RECORD; } # End if else { # Add the "unseen" Order/Title combo to the hash $seen{$$hashReference{orderID},$$hashReference{title}}++; #Write the Order "header" info to the spreadsheet $worksheet->write($row,0,$$hashReference{fiscalCycle}); $worksheet->write($row,1,$$hashReference{vendorID}); $worksheet->write($row,2,$$hashReference{orderID}); $worksheet->write($row,3,$$hashReference{requisitionNum}); $worksheet->write($row,4,$$hashReference{title}); $worksheet->write($row,5,$$hashReference{'ISBN/ISSN'}); $worksheet->write($row,6,$$hashReference{copies});

    And it SEEMS to be working ok so far - but that's only with a small subset of the data. About to branch out and run it on a larger subset and see how it goes with regard to both results and performance... or at least I will do once I fix the server I need to get the data from, which appears to have halted a couple of core services needed for me to get to it... never a dull moment ;)

    As you can see I've not gotten a chance as yet to re-visit how/when I am doing the writing to Excel, but I'll do that once I've confirmed the functionality of the script as a whole, and once I've had a bit of a play with your suggestions also.

    Thanks once again for the help... it's been VERY valuable and very much appreciated. I'm sure I could have done it without you... eventually... but to be frank - I don't have enough hair left to be able to spare what it would have cost me ;)

      Yes, nice one! The keys you're creating to track Order ID/title will look something like "PO-9999,Item title here.", so that should work.

      As a point of clarification, since you've declared %hash right below the start of the for loop that iterates through the records, it's not necessary to use a hash reference within that hash's scope. A hash reference to %hash was sent to the subroutine, so the entire hash wouldn't have to be copied in order to access the hash's values for writing to the Excel spread sheet.

      Here are some equivalents:

      $$hashReference{orderID} eq $hash{orderID} is true $$hashReference{title} eq $hash{title} is true

      This means that the following:

      if ($seen{$$hashReference{orderID},$$hashReference{title}}) { ...

      Can be written as:

      if ($seen{$hash{orderID},$hash{title}}) { ...

      It's certainly OK if you prefer working with the hash reference. I'd tend to work with the hash within its scope, and a reference to it outside its scope, e.g., within a subroutine.

      It's been nice working with you on this, and it's clear to me that you would have done it on your own, but I appreciate this opportunity, as I learn from each task.

      As before, please let me know if you have any more questions about this...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1003551]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (13)
As of 2014-09-18 14:17 GMT
Find Nodes?
    Voting Booth?

    How do you remember the number of days in each month?

    Results (116 votes), past polls