Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Data Salad Address Problem

by bofh_of_oz (Hermit)
on Jul 28, 2005 at 16:02 UTC ( #479013=note: print w/ replies, xml ) Need Help??


in reply to Data Salad Address Problem

First, determine the logical formatting rules for the data. In your case:

- Records seem to be separated by a blank line (two \n)
- Every field takes a certain number of characters on every line
- Every field can take multiple lines
- ZIP code is in the format /\d{5}-\D{4}/

You can use a multiline regexp, process each line with substr pushing elements into corresponding array(s) or appending to the strings/whatever. I'm not clear about ZIP codes - if they can be in field 4 or 5, use regex; if they are only in field 5 (and we do not see it because of HTML scrambling the text separators), then you'll be fine.

HTH

P.S. If you want, we can work on the code later...

--------------------------------
An idea is not responsible for the people who believe in it...


Comment on Re: Data Salad Address Problem
Re^2: Data Salad Address Problem
by djp (Hermit) on Jul 29, 2005 at 03:29 UTC
    Don't assume that you can derive the logical formatting rules correctly from the data supplied. Make every effort to get the supplier of the data to provide you with the rules they used to create the data.
      I agree. However, that works only in about 30% of the situations as data suppliers make every effort not to open their "private and confidential" data formats. That is, if they indeed understand them... Often, studying the data is the only way to grab the logic. Granted, one'd need a lot more data for statistical analysis than provided here, but I just outlined the idea...

      --------------------------------
      An idea is not responsible for the people who believe in it...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://479013]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2014-11-26 05:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (162 votes), past polls