rocroc has asked for the wisdom of the Perl Monks concerning the following question:
Here is a text file containing data I need to munge:
"ADELMAN","John","adad","Ray"
"AGAN","John","agag","Aditya"
"AHMED","John","ahah","Conor"
Here is a perl script to do some munging:
my $username;
my $color;
while(<>){
chomp;
s/"//g;
($username,$color) = (split /,/,$_)[2,3];
if ("agag" =~ m/($username)/){print STDOUT "here is the username:
+$username\n"}
}
Here is the output when I invoke perl test.pl test.txt at the command line:
here is the username:
so note what's happening here: Evidently, the interpolation of the variable $username works just fine in the match "agag" =~ m/($username)/. But then, $username get's treated as empty (uninitialized?) inside the call to print.
I'm totally stumped.
please help.
Re: scoping problem?
by Corion (Patriarch) on Dec 06, 2011 at 17:27 UTC
|
Are you sure you're modifying / running the same Perl script? Because for me, the following works:
use strict;
my $username;
my $color;
while(<DATA>){
chomp; s/"//g;
($username,$color) = (split /,/,$_)[2,3];
if ("agag" =~ m/($username)/){
print STDOUT "here is the username: $username\n"
}
}
__DATA__
"ADELMAN","John","adad","Ray"
"AGAN","John","agag","Aditya"
"AHMED","John","ahah","Conor"
... and outputs:
here is the username: agag
| [reply] [d/l] [select] |
|
And are you sure you're getting the contents of the text file to <>?
Bad path to text file? Wrong name of text file? Failure to test open ($FH, "< $text_file") or die "Can't open $text_file, $! ? or something similar?
| [reply] [d/l] [select] |
Re: scoping problem?
by rocroc (Initiate) on Dec 06, 2011 at 20:26 UTC
|
Ok, so first of all, thanks AnomolousMonk. The text file did have a blank line at the end. This means that my initial understanding of the problem was wrong: I had thought that $username was being interpolated like I expected in the matching expression but not in the print statement.
Now, indeed, when that blank line is read, the match is successful, and so the print get's called, but the variable $username is empty. This means that $username is failing to be interpolated in BOTH the match AND the call to print.
But that is not the whole story. Here's a slightly modified script: (the #NEW comment marks the one line I've added to the original):
use strict;
my $username;
my $color;
while(<>){
chomp;
s/"//g;
($username,$color) = (split /,/,$_)[2,3];
print STDOUT "$username\n"; #NEW
if ("agag" =~ m/($username)/){print STDOUT "here is the username:
+$username\n"}
}
when I run this with the same invocation as before (perl test.pl test.txt), here's the output I get (and by the way, the text file still ends with a blank line):
adad
agag
ahah
here is the username:
Wierd, huh? The $username comes out as expected when used in the "naked" print statement (the line marked #NEW), but it fails to be interpolated in BOTH the match AND the print statement in the following line.
OK, now a second point here. (it get's wierder!) Corion tests the script by "hard coding" the data from the text file into the script like so:
use strict;
my $username;
my $color;
while(<DATA>){
chomp; s/"//g;
($username,$color) = (split /,/,$_)[2,3];
if ("agag" =~ m/($username)/){
print STDOUT "here is the username: $username\n"
}
}
__DATA__
"ADELMAN","John","adad","Ray"
"AGAN","John","agag","Aditya"
"AHMED","John","ahah","Conor"
Now, when I run this on my machine, I get the output I'm looking for -- i.e.
here is the username: agag
So, I'm still stumped. Is this a scoping problem (like $username goes out of scope when we get into the "if" statement)? Is there something I don't understand about the diamond operator (that's what Corion's finding suggests to me)? Could this be some sort of file permission issue (I'm running linux, and working in the shell as a regular user. and no, I'm not willing to run this script as root).
| [reply] [d/l] [select] |
|
How do you know the result of split gives you (at least) four items? Add a diagnostic print to show you $_ and you'll probably see what's happening.
It's not a scoping problem or a readline problem or a permission problem.
| [reply] [d/l] |
|
Thanks chromatic. Here's the script with additional tests to see what's in $_. at each point.
use strict;
use warnings;
my $username;
my $color;
while(<>){
chomp;
s/"//g;
($username,$color) = (split /,/,$_)[2,3];
print STDOUT "test of username: $username\n";#NEW
print STDOUT "test of dollar-underscore: $_\n";#NEW ALSO
if ("agag" =~ m/($username)/){
print STDOUT "here is the username: $username\n";
print STDOUT "here is dollar-underscore: $_\n";
}
}
here's the output of 'perl test.pl test.txt' (script includes "use warnings" this time, so they are included in the output)
test of username: adad
test of dollar-underscore: ADELMAN,John,adad,Ray
test of username: agag
test of dollar-underscore: AGAN,John,agag,Aditya
test of username: ahah
test of dollar-underscore: AHMED,John,ahah,Conor
Use of uninitialized value $username in concatenation (.) or string at
+ test.pl line 11, <> line 4.
test of username:
test of dollar-underscore:
Use of uninitialized value $username in regexp compilation at test.pl
+line 13, <> line 4.
Use of uninitialized value $username in concatenation (.) or string at
+ test.pl line 14, <> line 4.
here is the username:
here is dollar-underscore:
Note that $_ and $username are both coming out as they should (i.e. the split is working) before we get to the if statement. It just looks as though the match "agag" =~ m/($username)/
is failing. (note that the warnings are only being issued at the fourth "line" of the text file -- that is at the empty last line.)
still stumped. Perhaps I'm missing something really basic and obvious about the match operator?
| [reply] [d/l] [select] |
|
|
|
|
|
use strict;
use warnings;
open my $tempOut, '>', 'delme.txt' or die "Can't create temp file: $!\
+n";
print $tempOut <<FILE;
"ADELMAN","John","adad","Ray"
"AGAN","John","agag","Aditya"
"AHMED","John","ahah","Conor"
FILE
close $tempOut;
@ARGV = 'delme.txt';
while(<>){
chomp;
s/"//g;
my ($username,$color) = (split /,/,$_)[2,3];
next if ! defined $color;
print "here is the username: $username\n" if "agag" =~ m/($usernam
+e)/;
}
Prints:
here is the username: agag
However you seem to be parsing a CSV file so really you should be using one of the modules designed for that task such as Text::CSV.
Oh, and you really should use warnings in addition to strict!
True laziness is hard work
| [reply] [d/l] [select] |
|
| [reply] |
Re: scoping problem?
by bliz (Acolyte) on Dec 06, 2011 at 18:57 UTC
|
Not that it matters much for this example, but any reason for the matching versus just a comparison?
if ("agag" =~ m/($username)/){
vs
if ("agag" eq $username){
| [reply] [d/l] [select] |
|
because in the "real" script, I'm looking for filenames in a directory that contain the username, but have other stuff in them as well.
| [reply] |
Re: scoping problem?
by keszler (Priest) on Dec 06, 2011 at 17:28 UTC
|
| [reply] |
|
>perl -wMstrict -le
"my $s = 'foo,,';
;;
my ($field0, $field1, $field2, $field3) = split /,/, $s;
print qq{field0 '$field0' field1 '$field1' field3 '$field3'};
;;
if ('bar' =~ m{ $field1 }xms) { print qq{bar matches '$field1'} }
if ('bar' =~ m{ $field3 }xms) { print qq{bar matches '$field3'} }
"
Use of uninitialized value $field3 in concatenation (.) or string ...
field0 'foo' field1 '' field3 ''
bar matches ''
Use of uninitialized value $field3 in regexp compilation ...
Use of uninitialized value $field3 in concatenation (.) or string ...
bar matches ''
Note that without warnings (you are using warnings, aren't you?), this all proceeds quite silently:
>perl -Mstrict -le
"my $s = 'foo,,';
;;
my ($field0, $field1, $field2, $field3) = split /,/, $s;
print qq{field0 '$field0' field1 '$field1' field3 '$field3'};
;;
if ('bar' =~ m{ $field1 }xms) { print qq{bar matches '$field1'} }
if ('bar' =~ m{ $field3 }xms) { print qq{bar matches '$field3'} }
"
field0 'foo' field1 '' field3 ''
bar matches ''
bar matches ''
| [reply] [d/l] [select] |
Re: scoping problem?
by rocroc (Initiate) on Dec 07, 2011 at 17:35 UTC
|
SOLVED!!!!!
The problem was indeed the encoding of the file. opening the file via a file handle with the encoding:(UTF-16le) filter did the trick. Here is the working code:
use strict;
use warnings;
use Encode;
my $username;
my $color;
my $filename = shift @ARGV;
my $fh;
open($fh, '<:encoding(UTF-16le):crlf', $filename);
binmode STDOUT, ':encoding(UTF-8)';
while(<$fh>){
chomp;
s/"//g;
($username,$color) = (split /,/,$_)[2,3];
if ('agag' =~ m/($username)/){
print STDOUT "here is the username: $username\n";
}
}
Thanks so much to all of you for your help.
Interesting side note: I first learned perl back in 2001, coded a whole lot in a job I had through 2003, and got pretty proficient at it. Then, I didn't code at all (except for numerical stuff in C) until just this week, when all of a sudden I had to deal with some text files. I plowed ahead as if nothing had changed, coding in exactly the same style as I had 10 years ago. Reading all of the stuff on Unicode (by the way, those links are great Anonymous Monk), I see that that was very much the wrong choice! This is great, because it's given me a chance to learn about Unicode.
thanks again!
| [reply] [d/l] |
|
| [reply] |
Re: scoping problem?
by rocroc (Initiate) on Dec 07, 2011 at 14:55 UTC
|
Update on the issue of possible non-printable characters:
the bash command 'file' tells me that the text file I'm reading in is encoded as follows:
Little-endian UTF-16 Unicode
Could this be causing my calls to the match operator to work differently than I think they are? | [reply] [d/l] |
|
| [reply] [d/l] |
Re: scoping problem?
by rocroc (Initiate) on Dec 07, 2011 at 17:38 UTC
|
sorry, one more thing:
seems like I ought to go back and change the title of this node. Obviously, "scoping" was not the problem.
any suggestions?
| [reply] |
|
|