Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

working with non-delimited files

by semio (Friar)
on May 24, 2007 at 21:04 UTC ( [id://617334]=perlquestion: print w/replies, xml ) Need Help??

semio has asked for the wisdom of the Perl Monks concerning the following question:

fellow monks,

I'm working with a number of data files that have no defined delimiter. The goal is to take the data, delimit it into appropriate columns, and then insert the data into a db. I could obtain the required results via the following code, but I would like to shoot for something more succinct. Thanks in advance for any insights and suggestions for improvement. cheers!

#!/usr/bin/perl -w use strict; while (<DATA>) { my $line = $_; $line =~ s/\s+/\|/; $line =~ s/\s+/\|/; $line =~ s/\s+/\|/; $line =~ s/\s+/\|/; $line =~ s/\s+/\|/; print $line; } __DATA__ This# is stand alone data "but this data needs to , sta +y together/ and here is some more -but this is a single column
..which produces the desired output:

This#|is|stand|alone|data|"but this data needs to , stay together/
and|here|is|some|more|-but this is a single column

Replies are listed 'Best First'.
Re: working with non-delimited files
by FunkyMonk (Chancellor) on May 24, 2007 at 21:35 UTC
    I'd use split & join, rather than regex...

    print join "|", split /\s+/, $_, 6 for <DATA>
Re: working with non-delimited files
by runrig (Abbot) on May 24, 2007 at 22:02 UTC
    If the requirements are that the first five fields are space delimited, and the last field is the rest of the line, then just use the LIMIT argument of split:
    my @fields = split " ", $line, 6; print join("|", @fields);

    Update: Hmm, I thought I read all the replies, but FunkyMonk already had it.

Re: working with non-delimited files
by blazar (Canon) on May 24, 2007 at 21:17 UTC

    You want to know about the /g modifier. Read about it in perldoc perlre:

    Update: on a second reading you probably want (code updated to reflect the change):

    #!/usr/bin/perl use strict; use warnings; while (my $line=<DATA>) { $line =~ s/\s+/|/ for 1..5; print $line; } __DATA__ This# is stand alone data "but this data needs to , sta +y together/ and here is some more -but this is a single column
      You've overwritten $_

      Try

      while (my $data = <DATA>) { $data =~ s/\s+/\|/ for 1..5; print $data; }
      Edit: Damn. Too Slow.
        Edit: Damn. Too Slow.

        You're right: I updated my node and said so. Then the update was wrong too, and I made a further update silently, in the hope that no one would notice... ;-) Of course you solution with split is far better, except that even for such a tiny thing I would use while instead of for. Until we're in full Perl 6 times, that is...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://617334]
Approved by Joost
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-19 03:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found