Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Perl hash keys not considered unique

by elfstones65 (Novice)
on Apr 23, 2018 at 12:49 UTC ( #1213425=perlquestion: print w/replies, xml ) Need Help??

elfstones65 has asked for the wisdom of the Perl Monks concerning the following question:

Hi I seem to have a strange issue where adding a keys into a hash is not being considered as unique
So say I have two files being the keys

$file1 = 'ENV.FILESOURCE.SOURCE.J2018058.N000001'; $file2 = 'ENV.FILESOURCE.SOURCE.J2018059.N000001'; open LOG, "<$log"; while (defined($line=<LOG>)) { chomp $line; if ($line =~/File \[\/opt\/app\/data\/pulsefba\/process/) { ($processingFile) = ($line =~ /process\/(.+)]/); $files{"$processingFile"} = 1; } if ($line =~ /Finished/) { ($currentFile) = ($line =~ /process\/(.+?)]/); ($read) = ($line =~ /: Read \[(.+)\] events - Processe +d /); ($processed) = ($line =~ /Processed \[(.+?)\]/); ($wrote) = ($line =~ /Wrote \[(.+?)\]/); ($skippedRead) = ($line =~ /Skipped Read \[(.+?)\]/); ($skippedProcess) = ($line =~ /Skipped Process \[(.+?) +\]/); ($skippedWrite) = ($line =~ /Skipped Write \[(.+?)\]/) +; ($totalCount) = ($line =~ /Total Trailer Count - \[(.+ +?)\]/); $files{"$currentFile"}{'read'} = $read; $files{"$currentFile"}{'processed'} = $processed; $files{"$currentFile"}{'wrote'} = $wrote; $files{"$currentFile"}{'skippedRead'} = $skippedRead; $files{"$currentFile"}{'skippedProcess'} = $skippedPro +cess; $files{"$currentFile"}{'skippedWrite'} = $skippedWrite +; $files{"$currentFile"}{'totalTrailerCount'} = $totalCo +unt; print "Debug: $currentFile set to Read: $read\n"; print "File ENV.FILESOURCE.SOURCE.J2018058.N000001 cou +nt: $files{'ENV.FILESOURCE.SOURCE.J2018058.N000001'}{'read'}\n"; print "File ENV.FILESOURCE.SOURCE.J2018059.N000001 cou +nt: $files{'ENV.FILESOURCE.SOURCE.J2018059.N000001'}{'read'}\n"; }

While looping through a file Output is like below. How Come is file with the J2018059 getting values in the hash for file J2018058?
Debug: ENV.FILESOURCE.SOURCE.J2018058.N000001 set to Read: 1000
File ENV.FILESOURCE.SOURCE.J2018058.N000001 count: 1000
File ENV.FILESOURCE.SOURCE.J2018059.N000001 count: 1000
Debug: ENV.FILESOURCE.SOURCE.J2018058.N000001set to Read: 2000
File ENV.FILESOURCE.SOURCE.J2018058.N000001 count: 2000
File ENV.FILESOURCE.SOURCE.J2018059.N000001 count: 2000
Debug: ENV.FILESOURCE.SOURCE.J2018058.N000001 set to Read: 3000
File ENV.FILESOURCE.SOURCE.J2018058.N000001 count: 3000
File ENV.FILESOURCE.SOURCE.J2018059.N000001 count: 3000
Debug: ENV.FILESOURCE.SOURCE.J2018058.N000001 set to Read: 4000
File ENV.FILESOURCE.SOURCE.J2018058.N000001 count: 4000
File ENV.FILESOURCE.SOURCE.J2018059.N000001 count: 4000
Debug: ENV.FILESOURCE.SOURCE.J2018058.N000001 set to Read: 5000
File ENV.FILESOURCE.SOURCE.J2018058.N000001 count: 5000
File ENV.FILESOURCE.SOURCE.J2018059.N000001 count: 5000

Replies are listed 'Best First'.
Re: Perl hash keys not considered unique (updated)
by haukex (Bishop) on Apr 23, 2018 at 13:19 UTC

    Welcome to the Monastery, elfstones65. In the future, please provide a Short, Self-Contained, Correct Example so that we can download and run the code and more easily see the issue that you are having.

    Anyway, I assume that when reading your input, you first encounter a line that matches your first regex /File .../, and sets $files{"$processingFile"} = 1;, so e.g. $files{"ENV.FILESOURCE.SOURCE.J2018058.N000001"} = 1;. Then, when your code encounters a line matching /Finished/ for the same file, you attempt to use $files{"ENV.FILESOURCE.SOURCE.J2018058.N000001"} as a hash reference by saying things like $files{"$currentFile"}{'read'}. However, $files{"ENV.FILESOURCE.SOURCE.J2018058.N000001"} contains the value 1 instead of a hash reference.

    Now Perl does something interesting: because you're not using strict 'refs', it will actually access a hash named %1 - and it will do this for all files that have the same value in the %files hash! So every file's /Finished/ data is ending up in the same hash, overwriting each other. You can see this yourself if you say use Data::Dumper; print Dumper(\%1); - you will see the data from the most recent Finished line collected in that hash. See Symbolic references.

    The best solution here is to Use strict and warnings! This will force you to avoid symbolic references, which is a good practice because it avoids the confusing behavior you're seeing. See also Why it's stupid to `use a variable as a variable name'.

    Update: Several edits to improve the explanation.

      You hit the nail on the head. Trying to have the hash with shall I say multiple levels was the cause.

       $files{"$processingFile"} = 1

      Simply changing it to

       $files{"$processingFile"}{"file"} = $processingFile;

      Such a minor mistake, I was going nuts!

      Much appreciated.
Re: Perl hash keys not considered unique
by huck (Parson) on Apr 23, 2018 at 13:18 UTC

    You are assuming that $files{'ENV.FILESOURCE.SOURCE.J2018059.N000001'} and $files{'ENV.FILESOURCE.SOURCE.J2018058.N000001'} point to unique hashs. but what if there was somewhere above this section a line that said:

    $files{'ENV.FILESOURCE.SOURCE.J2018059.N000001'}=$files{'ENV.FILESOURC +E.SOURCE.J2018058.N000001'};
    while there would be an entry in $files for both names, they would both point to the same hash, so modifying either $files{'ENV.FILESOURCE.SOURCE.J2018059.N000001'}{data} or $files{'ENV.FILESOURCE.SOURCE.J2018058.N000001'}{data} would mean that the same end data location has been modified.

Re: Perl hash keys not considered unique
by Eily (Monsignor) on Apr 24, 2018 at 14:07 UTC

    FYI, to avoid having to escape every / in your pattern, you can use the m operator that lets you use another special character as a delimiter for your regex (or a pair of characters if they are symmetric). $line =~ m<File \[/opt/app/data/pulsefba/process>

    Or, since you are searching for a fixed string, index could work as well.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1213425]
Approved by marto
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2020-11-29 08:35 GMT
Find Nodes?
    Voting Booth?

    No recent polls found