http://www.perlmonks.org?node_id=1026108

reaper9187 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
i'm here to seek your wisdom on a certain issue. I'm implementing a text processing script where the input file looks like
"A","B","C","D" "A","B","C","D,E,F" "A","B","C","D" "A","B","C","D,R,T"
Now , i need help with regex pattern matching(i have to admit .i'm not very good at it). i need to check the last column entry and if that has multiple values separated by commas(fr eg: "D,E,F" and "D,R,T" in this example), then i need to insert row between the next row and display the entry there... so, in short the o/p should look something like:
"A","B","C","D" "A","B","C","D" "E" "F" "A","B","C","D" "A","B","C","D" "R" "T"
PLease help.. Some pointers and tips would be great ..!! thanks in advance.. :)

Replies are listed 'Best First'.
Re: regex pattern match
by McA (Priest) on Mar 29, 2013 at 09:35 UTC

    Hi,

    take Text::CSV and parse line by line, so you should have all elements A and B and C and D,E,F in one array. Then you access the last element and split it with:

    my @lastletters = split /,/, $elements[-1]; if(@lastletters > 1) { # more than one letter print "first row\n"; print "additional rows\n"; }

    McA

Re: regex pattern match
by hdb (Monsignor) on Mar 29, 2013 at 09:50 UTC

    Unless you have reason to split the first n-1 pieces as well, you only need the last one separated.

    Once you have it, you replace the commata with quote, newline, appropriate number of spaces, quote

    Voila!

    while(<DATA>) { chomp; my ($first, $last ) = /(.+",)(.+)/; print $first; $first = " " x length( $first ); $last =~ s/,/"\n$first"/g; print "$last\n"; } __DATA__ "A","B","C","D" "A","B","C","D,E,F" "A","B","C","D" "A","B","C","D,R,T"

      Nice approach.

        I do not like the line where I replace $first with spaces. It really should be:

        $first =~ s/./ /g;

        I love regular expressions! However, things can get easily out of hand and become unmaintainable. Also, these are quite format dependent. Should there be only one column, it would not work anymore.

Re: regex pattern match
by vagabonding electron (Curate) on Mar 29, 2013 at 16:27 UTC

    Since Text::ParseWords is a core it should not be a problem to use it.

    Here is my attempt:

    #!/usr/bin/perl use strict; use warnings; use Text::ParseWords; while ( my $line = <DATA> ) { chomp $line; my @fields = quotewords( ',', 0, $line ); my @lastfield; if ( index( $fields[-1], ',') > -1 ) { @lastfield = quotewords(',',0,$fields[-1]); $fields[-1] = shift @lastfield; } @fields = map { '"'.$_.'"' } @fields; # if quotes are obligatory print join( ',', @fields ), "\n"; while ( my $rest = shift @lastfield ) { $rest = '"'.$rest.'"'; my @temp = (' ') x scalar @fields - 1; # three spaces, # two of these because of quotes push @temp, $rest; print join(' ', @temp ), "\n"; } } __DATA__ "A","B","C","D" "A","B","C","D,E,F" "A","B","C","D" "A","B","C","D,R,T"

    The output:

    "A","B","C","D" "A","B","C","D" "E" "F" "A","B","C","D" "A","B","C","D" "R" "T"
Re: regex pattern match
by kcott (Archbishop) on Mar 30, 2013 at 06:34 UTC

    G'day reaper9187,

    Here's my take on a solution:

    $ cat fred.dat "A","B","C","D" "A","B","C","D,E,F" "A","B","C","D" "A","B","C","D,R,T"
    $ perl -Mstrict -Mwarnings -E ' while (<>) { my ($start, $end) = /^(.+?)("[^"]+")$/; my @finals = split /,/ => substr $end, 1, -1; say $start, q{"}, shift(@finals), q{"}; say q{ } x length($start), q{"}, shift(@finals), q{"} while @final +s; } ' fred.dat "A","B","C","D" "A","B","C","D" "E" "F" "A","B","C","D" "A","B","C","D" "R" "T"

    -- Ken

Re: regex pattern match
by reaper9187 (Scribe) on Apr 01, 2013 at 17:42 UTC
    Oh .. i'm sorry ,...
    The problem was with the regex replacing the directory path within the double quotes.$current_path and $new_path are two variables that i need for substitution.Then i perform a simple regex substitution. However, the double quotes and backslash made it extra hard to perform regex matching .. . The answer may not look great but it works.!!
    $line =~s/\Q$current_path/$new_path\E/ig;
Re: regex pattern match
by reaper9187 (Scribe) on Mar 29, 2013 at 09:41 UTC
    Hi .. Thank you for the quick reply .. Is it somehow possible to do it without using any module...?? I've reached thus far with the code
    $in_file = "test.txt"; $out_file = "testout.txt"; open (IN, "<$in_file") or die "Can't open $in_file: $!\n"; open (OUT, ">$out_file") or die "Can't open $out_file: $!\n"; while ( $line = <IN> ) { @fields = split /,/, $line; $line = join ",", $fields[0], @fields[1..5]; print OUT $line; print OUT "\n"; } close IN; close OUT; #read in output file and print to screen to confirm open (TEST, "<$out_file") or die "Can't open $out_file: $!\n"; while ( <TEST> ) { print; } close TEST;
    There's only one problem bugging me .. How do i exclude the double quoted string from the pattern match in the split statement ..Once i am able to do that , the rest should be easy.. else it jus increases the amount of processing i need to do...

      Hi

      what do you think about the following:

      #strip LF from line chomp $line; # delete the very first and last " from line for ($line) { s/^\s*"//; s/"\s*$//; } # now split @fields = split /"\s*,\s*"/, $line; # now you should have the pieces I mentioned above

      McA

Re: regex pattern match
by reaper9187 (Scribe) on Apr 01, 2013 at 14:16 UTC
    Ok .. So here's the breakup.
    I have a text file filled with entries like :
    "ABC","DEF","B","C:\Users\Dave\Documents";
    The user inputs the following arguments: 1. The part of text to be replaced 2. The new text(to replace the substituted text)

    So for eg: the User i/p for the text to be replaced(option1) is : C:\Users\Dave
    and the I/p for option2 is : E:\Dave

    the final output should be:
    "ABC","DEF","B","E:\Dave\Documents";
    Please help ..1!!
Re: regex pattern match
by reaper9187 (Scribe) on Apr 01, 2013 at 10:08 UTC
    A quick addition to the problem .. I have a separate part of text that i need to replace .. for eg: I/p(read from a file) is of the form:
    "ABC","DEF","This is a test","Hello";
    and the o/p needs to look something like :
    "ABC","DEF","This is a substitution","Hello";
    In short i need to replace pattern within quotes in a line .. I need help with the pattern match .. please help ..!!! Update:
    Could some one please help me debug this code..??
    #!/usr/bin/perl -w my $variable; my @fields; my $line =0 ; use vars qw($current_path $filename); print "Enter the desired path: "; my $path = <STDIN>; chomp( $path ); my $in_file = "test.txt"; my $out_file = "testout.txt"; open (DATA, "<$in_file") or die "Can't open $in_file: $!\n"; open (OUT, ">$out_file") or die "Can't open $out_file: $!\n"; while ( <DATA> ) { chomp; my ($first, $last ) = /(.+",)(.+)/; $last =~ s//"\n$first"/g; my @parameters = split /"\s*,\s*"/, $first; $parameters[1] =~ tr/"//d; my $path = "$path\\"; my $new_path = $path.$parameters[1]; @array = split(/,/,$last); foreach $entry(@array) { $entry =~ tr/"//d; $entry =~ s|(.+)\\|$1::|; ($current_path,$filename) = split/::/,$entry; $last =~s/$current_path/$new_path/ig; #rename the last entry which con +tains a path to folder name with the new path print "$last\n"; } print "\n"; } close DATA; close OUT;
      my ($first, $last ) = /(.+",)(.+)/; $last =~ s//"\n$first"/g;

      I second hdb's reply in general.

      Let me add that the
          $last =~ s//"\n$first"/g;
      statement from the posted code quoted above is very unlikely to do what you want: it will match the  // null regex pattern globally and do a substitution against what is matched. If there has been a previous successful regex match (e.g., the match in the preceding line), the  // regex matches using the previously matched regex. (If there has been no previous successful regex match, the  // regex matches anything!) As it stands, this substitution seems to be a no-op. If it is part of some carefully thought out strategy, I advise you to abandon it immediately: it has 'maintenance nightmare' written all over it! (In the code example below,  \x22 stands in for an unbalanced  " (double-quote) character.)

      >perl -wMstrict -le "$_ = qq{\"ABC\",\"DEF\",\"This is a test\",\"Hello\"}; ;; my ($first, $last ) = /(.+\x22,)(.+)/; print qq{'$last'}; ;; $last =~ s//\"\n$first\"/g; print qq{'$last'}; " '"Hello"' '"Hello"'

      Your question is not easy to understand. Would it be possible to provide:

      • A sample input file, just a couple of lines,
      • the desired output file, and
      • some more ideas what you want to match and replace?
      Also, it seems, as your example grows more complex, you should be using Text::CSV, which allows you to first split your line properly along the commata, and then to split any entry along commata.

Re: regex pattern match
by reaper9187 (Scribe) on Apr 01, 2013 at 14:37 UTC
    ok .. so i've figured out the solution as soon as i posted this .. Thanks everyone for the help.. :)
      So, please, contribute to learning by others by posting your solution.

      If you didn't program your executable by toggling in binary, it wasn't really programming!