Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

parse array to replace specific spaces with pipes

by neurotoxx (Novice)
on Jul 09, 2009 at 23:21 UTC ( #778727=perlquestion: print w/ replies, xml ) Need Help??
neurotoxx has asked for the wisdom of the Perl Monks concerning the following question:

I'm having a trouble formatting a file into a pipe delimited set of records.

The record file (file.txt) looks like this:

0211121253 Mike Dell ID06533 0211121253 Chris Jones ID02014 0211121253 Pa Kettle ID65255 0111119112 Mitch Poo ID05983

I'm trying to read the file in to a handle and put a pipe ("|") at the beginning of each line. Then replace the spaces in each line with a pipe and then add a pipe to the. So, I would end up with a text file end of each line. It should resemble the following:

|0211121253|Mike|Dell|ID06533| |0211121253|Chris|Jones|ID02014| |0211121253|Pa|Kettle|ID65255| |0111119112|Mitch|Poo|ID05983|

Here's part of my code. I'm using the DBI module to query the original data from Oracle. I believe my error is because I'm working with an array and not string vars.

I would appriciate some advice on how this could be done.

$sth->execute; open (OUTFILE, "> file.txt") or die "Cant open dat file : $!"; while (@row = $sth->fetchrow_array) { $_ =~ s/\s/\|/g; print OUTFILE "@row\n"; } close OUTFILE;
Thanks, Neurotoxx

Comment on parse array to replace specific spaces with pipes
Select or Download Code
Re: parse array to replace specific spaces with pipes
by ELISHEVA (Prior) on Jul 09, 2009 at 23:40 UTC

    Replace $_ =~ s/\s/\|/g; with local $"='|';.

    $" sets the delimiter between array elements in a double quoted string. By default, it is set to a single space which is why you were originally seeing a space between array elements. The space wasn't part of the array elements, nor the array, which is why trying to substitute spaces with pipes didn't work.

    $" is a global variable. The keyword local keeps your custom setting of that variable from affecting other parts of your code. Your custom setting will only apply within the while loop.

    Best, beth

      Thanks beth. That works great too. I now understand the difference in working with scalars and lists. I need to work with the individual elements treating those as scalar data.

      Cheers, Neurotoxx
Re: parse array to replace specific spaces with pipes
by Polyglot (Monk) on Jul 09, 2009 at 23:41 UTC
    $sth->execute; open (OUTFILE, "> file.txt") or die "Cant open dat file : $!"; while (@row = $sth->fetchrow_array) { $_ =~ s/\s/\|/g; $_ =~ s/(.*)/|$1|/; print OUTFILE "@row\n"; } close OUTFILE;
    Does that fix it?

    Blessings,

    ~Polyglot~

    UPDATE: Elisheva has given a better answer. My answer assumes that the data was read from a file, one line at a time. Her answer recognizes that each of the parts of the line have not been read in together, but rather individually as separate records, such as from a database. I had thought your difficulty was in not seeing the pipe at the beginning and end of the line, but if indeed you were not seeing any pipes at all, then her answer will give the proper approach.

      That is correct. I did some more reading on arrays last night and now I understand how the data structure is different from scalar data. I did use your search expression, and it works great. Thank you very much, -Neurotoxx

      Thanks. The regex expressions work great. However, I have a couple more issues with the data. For example, I would like to replace only the spaces after the first element and before the last element. I need to keep the spaces that are part of the last name. For example:

      0211121253 Mike Dell ID06533 0211121253 Leo De Vinci ID02014 0211121253 Joe De La Marco ID65255

      Should look like this:

      0211121253|Mike|Dell|ID06533 0211121253|Leo|De Vinci|ID02014 0211121253|Joe|De La Marco|ID65255

      Also, I would like it to delete any lines that are missing field elements.

      0211121253|Mike|Dell|ID06533 Missing data|Leo|De Vinci|ID02014 <--delete line 0211121253|Joe|De La Marco|ID65255

      I have been trying to create a regex that keys on "ID" and replaces the space before it. I've been trying these:

      $_ =~\b\sID\d; $_ =~^\d+;

      But these expressions replace instead of add the pipe.

      I would really appreciate a way to add the pipes and not replace the search value in the regex.

      Thank you so much for your help,

      vince
Re: parse array to replace specific spaces with pipes
by jethro (Monsignor) on Jul 09, 2009 at 23:45 UTC

    join() to the rescue

    while (@row = $sth->fetchrow_array) { print OUTFILE '|',join('|',@row),"|\n"; }
      print OUTFILE '|',join('|',@row),"|\n";

      You could let join do all the work.

      print OUTFILE join '|', '', @row, "\n";

      I hope this is of interest.

      Cheers,

      JohnGG

        Join is very good. :) Thanks Jethro and Johngg

Re: parse array to replace specific spaces with pipes
by bichonfrise74 (Vicar) on Jul 10, 2009 at 02:04 UTC
    Another way...
    #!/usr/bin/perl use strict; open( my $out_file, '>', "/tmp/outfile" ) or die "Error: Cannot open file\n"; while( my $line = <DATA> ) { $line =~ s/\s/\|/g; print $out_file "|" . $line . "\n"; } close( $out_file ); __DATA__ 0211121253 Mike Dell ID06533 0211121253 Chris Jones ID02014 0211121253 Pa Kettle ID65255 0111119112 Mitch Poo ID05983
Re: parse array to replace specific spaces with pipes
by rovf (Priest) on Jul 10, 2009 at 08:23 UTC

    jethro's solution seems to be the most natural for me. In your solution, I don't see where $_ gets its value. Does fetchrow_array silently sets it?

    -- 
    Ronald Fischer <ynnor@mm.st>
Re: parse array to replace specific spaces with pipes
by roboticus (Canon) on Jul 10, 2009 at 12:23 UTC
    neurotic:

    While the other monks have provided the answer to your question, I'm going to provide you with a cautionary example:

    0213312247 Richard O. Boticus ID19897

    ...roboticus

      Ah, yes...that can be bad. I was able to structure the query to ignore middle initials, but a few slipped in.

      __DATA__ 0211121253|Mike|Dell|ID06533 0211121253|A|Chris|Jones|ID02014 # this is not good 0211121253|Pa|Kettle|ID65255 0111119112|Mitch|Poo|ID05983

      However, this leads me to the question: How do I remove the initial and/or period, leaving the data structure the same?

      pseudo code: if row contains an element with one letter then remove letter continue pipe delimiting...

      Thanks, vince

        neurotoxx:

        That would depend on your data source:

        • If it's a database, you could add some delimiters during the query
        • If it's a fixed-format flat file, you could take field lengths into account (substr, unpack)
        • If there are a fixed number of fields in the row, you could put all the "extras" in the name field
        • Perhaps you could take advantage of the fact that one of the fields matches /ID\d+/

        That's generally the problem with "squishy" data. Sometimes you have to look it over and get clever....

        ...roboticus

        You can use next to skip the rest of an iteration in a loop:

        while( ... ) { ... next if &lt;something>; ... }

        The <something> is treated as an expression. Expressions can be $variable =~ /<regex>/ or just a single variable, which evaluates to false if undef, 0 or ''.

Re: parse array to replace specific spaces with pipes
by Bloodnok (Vicar) on Jul 10, 2009 at 13:00 UTC
    rovfs understandable confusion (wrt the setting of $_) apart, you appear to be slightly confused as to what is going on:

    You say I'm trying to read the file..., whereas your code clearly generates the file (file.txt) using data read, a row at a time, from an Oracle query.

    Given this, it strikes me that the simplest solution to your formatting problem (which would, at the same time, solve the problem identified by roboticus) - is, IMO, to modify the query to force Oracle to change the default column separator - to '|' as required - in the report.

    A user level that continues to overstate my experience :-))

      That was my original plan, but I wasn't able to figure out how to set that in the SQL query. I'll check the docs. Thanks for you help, I sincerely appreciate it.

      -Neurotoxx
        The simplest approach is to specify the list of columns concatenated with the '|' character - something like (I can't remember the exact syntax):
        SELECT id + '|' + FirstName + '|' + Surname FROM some_table WHERE ....

        Otherwise, IIRC, you need to issue a set command (to set the column separator) prior to running the query - RTFM for further details.

        A user level that continues to overstate my experience :-))

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://778727]
Approved by ELISHEVA
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2014-10-22 03:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (112 votes), past polls