in reply to Re^4: How best to strip text from a file?
in thread How best to strip text from a file?
You're most welcome, bobdabuilda!
...there are usually numerous Orders containing the multiple distributions...
Suspected so. What separates these Orders? One option is to set the record separator ($/) to the text that separates Orders, and then do the matching on each Order.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^6: How best to strip text from a file?
by bobdabuilda (Beadle) on Nov 08, 2012 at 01:29 UTC | |
Yes, that's what I've been looking at (trying) doing. The orders are only separated by a blank line, but they all start wth the "Order ID:" text, so looking at using that as the separator. The report also spans multiple pages, including a header on each page, which complicates things just that little bit more also... but I'll worry about that later, once I have the logic for the full order sorted. The page header should be automatically filtered out by the regex the way it stands anyway... I think. One thing I *could* do with a suggestion on, is how to handle breaking out of the loop at the end of each Order. About the only way I can think of to know to stop processing distributions, is to look for the start of the next Order record. In order to do that, though, the line containing data I want has to be read in at the "end" of the loop for the previous Order... and then back up at the start of the loop, it reads the next line of the file in, dropping the previous one, which contains (some of) the data I'm after. Probably easier to show you what I mean in pseudocode to give a better idea :
So, from the above, the issue I am having is the two While loops... the second one "eats" the order info of any Orders following the first. I'm sure I could put some post-While processing there to trap the data before it loops to the next line... but that just seems a bit... uncouth, for wont of a better word. Can't help thinking it should be more elegant (not to mention less likely to fail) than that. | [reply] [d/l] |
by Kenosis (Priest) on Nov 08, 2012 at 05:05 UTC | |
Hi, bobdabuilda You've given this much thought, and I think you're pseudocode is on target. The orders are only separated by a blank line, but they all start wth the "Order ID:" text, so looking at using that as the separator. The "Order ID:" as record separator makes sense. The page header should be automatically filtered out by the regex the way it stands anyway... I think. You're correct. I've taken the liberty to implement an interpretation of this. It does use two loops, but the outer loop is a for loop that iterates over an array of Order records:
Output
Included a subroutine and a call to it that shows how to handle accessing the hash a record at a time. The code is commented, to assist with understanding it. Let me know if you have any questions about this... Enjoy! | [reply] [d/l] [select] |
by bobdabuilda (Beadle) on Nov 09, 2012 at 04:55 UTC | |
Wow... that's awesome! Thank you VERY much! Worked great on my Windows test box using ActivePerl... but I get compile errors on the 'Nix server that I need to run it on (am only a tenant, not an admin etc. so no option of upgrading)... so I strongly suspect I'm coming across Perl versioning issues. Version on the server is "This is perl, v5.8.8 built for sun4-solaris" - which I suspect isn't compatible with something you've used in this script?
Note - this compiles and runs without issue on ActivePERL on my PC, as I'm sure it did on yours/// Any suggestions on reading I should do to work out how best to fit this into the version of Perl on the server? Sorry to be a pain... you're being extremely helpful, and I'm being nothing but more problems lol | [reply] [d/l] |
by Kenosis (Priest) on Nov 09, 2012 at 06:02 UTC | |
by bobdabuilda (Beadle) on Nov 13, 2012 at 00:06 UTC | |
| |
by bobdabuilda (Beadle) on Nov 13, 2012 at 01:51 UTC | |
|