Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: Bug in script when users use long names

by boftx (Chaplain)
on Sep 25, 2013 at 00:28 UTC ( #1055574=note: print w/ replies, xml ) Need Help??


in reply to Bug in script when users use long names

I will offer a (very) general approach, or hint rather.

It appears that your data samples, especially sample 2, represents a form of fixed length records that consist of a header record that can be identified by have a non-space character in col 1 (assume 1-based col numbering) and continuation records identified by space characters in cols 1 and 2.

Given that you can easily find the start of a master record, and that there appears to be a standard structure to the continuation records, then unpack becomes your best friend and makes the actual extraction trivial.

The presence of the header title records is what reduces this to a simple problem that unpack is uniquely intended to handle. All that you really need to do is keep track of what record/sub-record row number you are on so you apply the correct unpack pattern to it and extract the data to normal variables.

On time, cheap, compliant with final specs. Pick two.


Comment on Re: Bug in script when users use long names
Select or Download Code
Re^2: Bug in script when users use long names
by Sparky (Initiate) on Sep 25, 2013 at 00:42 UTC

    Thanks Boftx, I'll check it out. Strange,pack/unpack never came up while I was figuring this out in all my perl books.

      Looking at this further I notice that there doesn't appear to be any kind of format version number in the header info. Shame on your source for that. But even so, it appears to a simple matter to just check to see if the Date/Time header is present in the first header line and if not you know you are dealing with (what I presume is) the newer data format.

      On a side note, this kind of data format has been largely replaced by XML for B2B data exchange. Some vendors have opted for a CSV format, but it is just simply amazing to see how many programmers can not generate a proper CSV file, or an XML file for that matter. There is still something to be said for a fixed length record after all. :)

      On time, cheap, compliant with final specs. Pick two.

        To me, it looks more like both are the same, just the "Name" column wider, and the line breaks inserted by some tool, that makes it more human readable but more difficult for automatic processes (my guess is Windows, perhaps Powershell?)

        Sparky: perhaps your source can output CSV or XML or something better parseable?

        Otherwise, you could record the position (line/column) of the relevant column header ("Creation Date/Time") and follow boftx's advice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1055574]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (14)
As of 2014-09-30 16:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (378 votes), past polls