Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Regexp mystery (to me)

by barkingdoggy (Initiate)
on Mar 03, 2008 at 19:35 UTC ( [id://671697]=perlquestion: print w/replies, xml ) Need Help??

barkingdoggy has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to use Perl to extract fields in a csv file I exported from Excel...
#!/usr/bin/perl use strict; use warnings; our $list; our @clients; our $filedef1=$ARGV[0]; #name of client CSV file &read_clients (); # define regex components my $accode = qr(^"(.*)",.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*)x; my $name = qr(^.*,"(.*)",.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*)x; # do regex matches print "Extractions:\n"; my @extractions = $list =~ m{(?: $name)}mxgc; print "$extractions[$_], " for 0.. $#extractions; print "End of Program!\n"; ##Beginning of subroutine for reading the document source file. sub read_clients { open FILEDEF1, "< $filedef1" or die "error reading $filedef1-$!"; while (<>) { push (@clients, <FILEDEF1>); } close FILEDEF1; $list = join(' ',@clients); print $list; } ##End of block for reading the document source file.
This code works like I want it to. When I substitute $accode for $name in the line my @extractions = $list =~ m{(?: $name)}mxgc; It only extracts the first record/line match for $accode, while I get every record/line match if I do a separate run and match for $name instead. In other words, I am not getting global, multiline matching on $accode while I am on $name. Any idea why? I want every record match for $accode.

Replies are listed 'Best First'.
Re: Regexp mystery (to me)
by jwkrahn (Abbot) on Mar 03, 2008 at 20:19 UTC
    This code works like I want it to.

    It does?    Really?    OK.


    #!/usr/bin/perl
    use strict;
    use warnings;
    our $list;
    our @clients;
    our $filedef1=$ARGV[0]; #name of client CSV file

    Why are you declaring those variables here when you are only using them inside the read_clients() subroutine?

    &read_clients ();

    You shouldn't use & when calling subroutines, see perlsub for reasons why.

    # define regex components
    my $accode = qr(^"(.*)",.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*)x;

    my $name = qr(^.*,"(.*)",.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*)x;

    Why include the empty fields after the captured field?

    # do regex matches
    print "Extractions:\n";
    my @extractions = $list =~ m{(?: $name)}mxgc;

    Why are you using the /c option? It is only relevant if you are using the \G zero-width assertion in the pattern.

    print "$extractions[$_], " for 0.. $#extractions;
    print "End of Program!\n";
    ##Beginning of subroutine for reading the document source file.
    sub read_clients
    {
    open FILEDEF1, "< $filedef1" or die "error reading $filedef1-$!";
    while (<>)

    The special <> readline operator will treat @ARGV as a list of file names and open and read each line from all of those files. Since $filedef1 is the first element of @ARGV the file will be opened and the first line from that file will be read into the $_ variable.

        {
        push (@clients, <FILEDEF1>);

    You are pushing all the lines from the file onto the @clients array from inside the loop so you should have the number of lines times the file in the array.

        }
    close FILEDEF1;
    $list = join(' ',@clients);

    You are joining the lines together with a single space character. That may confuse the /m option on regular expressions? That means that every line except the first will have a space at the beginning.

     print $list;
    } ##End of block for reading the document source file.

      You shouldn't use & when calling subroutines, see perlsub for reasons why.
      A quick read of perlsub doesn't show what you might mean, and IMO there is no problem with consistently using & and parentheses.
        With parentheses it's fine, as long as you're not using prototypes. Without the parentheses it could cause a problem.

      Thank you. That space joining the lines together is the problem. BOY, IS MY FACE RED! Mystery explained!
        That may have been the problem in this specific bug, but you should really use Text::xSV instead of parsing CSV files with a regex. Regex-based solutions cannot parse CSV files correctly in many cases.

        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Regexp mystery (to me)
by dragonchild (Archbishop) on Mar 03, 2008 at 19:37 UTC
    Use Text::xSV instead.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Regexp mystery (to me)
by hipowls (Curate) on Mar 03, 2008 at 20:23 UTC

    There is also Text::CSV_XS it requires a compiler or a binary distribution. It is easy to use, flexible, fast and covers all those corner cases you will bump up against.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://671697]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2024-03-28 15:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found