Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Searching between two hashes

by TStanley (Canon)
on Nov 10, 2017 at 20:11 UTC ( [id://1203135]=perlquestion: print w/replies, xml ) Need Help??

TStanley has asked for the wisdom of the Perl Monks concerning the following question:

Hey there fellow monks! I am trying to help out a client of ours by fixing up an import file of their medical equipment assets that have my company's asset tracking tag on them. Most of the assets in their system were added in without the tags, and they were tracking the MAC addresses of the tags on a separate spreadsheet. I exported the data out of their system (.csv file), and started to match the assets up using the asset identification number for each. One thing I discovered is that some of the assets they had listed on their spreadsheet were not even in the system, so I would have to add them in as well. My concern is the logic where I am comparing the files, setting aside any non-matches, writing to the final output file, and then adding in the items that didn't exist. My code is shown below.


#!C:\Perl64\bin\perl use strict; use warnings; my @input_array; my @output_array; my @extras; my ($INPUT,$data,$OUTPUT); open $INPUT,"<","transfers.csv" || die "Can't open transfer.csv: $!\n" +; open $data,"<","VHC-DVTPump.csv" || die "Can't open VHC-DVTPump.csv: $ +!\n"; open $OUTPUT,">","Output.csv" || die "Can't open Output.csv: $!\n"; while(<$INPUT>){ chomp; my($MAC,$Serial,$ID)=split /,/; #print "MAC: $MAC\tSerial: $Serial\tID: $ID\n"; my %hash=('MAC'=>$MAC,'Serial'=>$Serial,'ID'=>$ID); push @input_array,\%hash; } while(<$data>){ chomp; my($AssetName,$AssetId,$Serial,$Type,$ActivityStatus,$BusinessStatus +,$Description,$PrimaryCategory, $Category1,$Category2,$Category3,$Department1,$Department2,$Depar +tment3,$Department4,$Department5, $Department6,$Group1,$Group2,$NetworkID1,$TagType1,$TagClassifica +tion1,$TagProtected1,$TagDuress1, $NetworkID2,$TagType2,$TagClassification2,$TagProtected2,$TagDure +ss2,$MapID,$GatewayGroup,$X,$Y,$Z, $NISTCertificationDueDate,$RentalReturnDate,$Escalation3,$NISTCer +tificate,$Escalation2,$NISTTag, $ContactName,$Escalation1,$PMCompleted,$RentalInUser,$lastcolumn, +$EOL)=split /,/; my %hash=('Asset Name'=>$AssetName,'Asset Application ID'=>$AssetId, +'Serial Number'=>$Serial,'Asset Type'=>$Type, 'Activity Status'=>$ActivityStatus,'Business Status'=>$Busin +essStatus,'Description'=>$Description, 'Primary Category'=>$PrimaryCategory,'Category 1'=>$Category +1,'Category 2'=>$Category2, 'Category 3'=>$Category3,'Department 1'=>$Department1,'Depar +tment 2'=>$Department2,'Department 3'=>$Department3, 'Department 4'=>$Department4,'Department 5'=>$Department5,'D +epartment 6'=>$Department6, 'Group 1'=>$Group1,'Group 2'=>$Group2,'Network ID 1'=>$Netwo +rkID1,'Tag Type 1'=>$TagType1, 'Tag Classification 1'=>$TagClassification1,'Tag Protected 1 +'=>$TagProtected1,'Tag Duress 1'=>$TagDuress1, 'Network ID 2'=>$NetworkID2,'Tag Type 2'=>$TagType2,'Tag Cla +ssification 2'=>$TagClassification2, 'Tag Protected 2'=>$TagProtected2,'Tag Duress 2'=>$TagDuress +2,'Map ID'=>$MapID,'Gateway Group'=>$GatewayGroup, 'X'=>$X,'Y'=>$Y,'Z'=>$Z,'NIST Certification Due Date'=>$NIST +CertificationDueDate, 'RentalReturnDate'=>$RentalReturnDate,'Escalation3'=>$Escala +tion3,'NIST Certificate'=>$NISTCertificate, 'Escalation2'=>$Escalation2,'NIST Tag'=>$NISTTag,'ContactNam +e'=>$ContactName,'Escalation1'=>$Escalation1, 'PM Completed'=>$PMCompleted,'Rental In User'=>$RentalInUser +,'last column'=>$lastcolumn,'EOL'=>$EOL); push @output_array,\%hash; } close $INPUT; close $data; print $OUTPUT "Asset Name,Asset Application Id,Serial Number,Asset Typ +e,Activity Status,Business Status,Description,Primary Category,Catego +ry 1,Category 2,Category 3,Department 1,Department 2,Department 3,Dep +artment 4,Department 5,Department 6,Group 1,Group 2,Network ID 1,Tag +Type 1,Tag Classification 1,Tag Protected 1,Tag Duress 1,Network ID 2 +,Tag Type 2,Tag Classification 2,Tag Protected 2,Tag Duress 2,Map ID, +Gateway Group,X,Y,Z,NIST Certification Due Date,Rental Return Date,Es +calation3,NIST Certificate,Escalation2,NIST Tag,ContactName,Escalatio +n1,PM Completed,Rental In User,last column,EOL\n"; for my $ref(@input_array){ my $mac=$$ref{'MAC'}; my $ID = $$ref{'ID'}; for my $ref2(@output_array){ my $ID2 = $$ref2{'Asset Application Id'}; my $mac2 = $$ref2{'Network ID 1'}; if($ID = $ID2 and $mac2 = ''){ $$ref2{'Network ID 1'} = $mac; }else{ push @extras,\$ref; } } } for (@output_array){ my $ref = $_; my($AssetName,$AssetId,$Serial,$Type,$ActivityStatus,$BusinessStatus +,$Description, $PrimaryCategory,$Category1,$Category2,$Category3,$Department1,$D +epartment2, $Department3,$Department4,$Department5,$Department6,$Group1,$Grou +p2,$NetworkID1, $TagType1,$TagClassification1,$TagProtected1,$TagDuress1,$Network +ID2,$TagType2, $TagClassification2,$TagProtected2,$TagDuress2,$MapID,$GatewayGro +up,$X,$Y,$Z, $NISTCertificationDueDate,$RentalReturnDate,$Escalation3,$NISTCer +tificate,$Escalation2, $NISTTag,$ContactName,$Escalation1,$PMCompleted,$RentalInUser,$la +stcolumn,$EOL); ## Need to assign the variables their values print $OUTPUT "$AssetName,$AssetId,$Serial,$Type,$ActivityStatus,$Bu +sinessStatus,$Description, $PrimaryCategory,$Category1,$Category2,$Category3, +$Department1,$Department2, $Department3,$Department4,$Department5,$Department6,$Group1,$Grou +p2,$NetworkID1, $TagType1,$TagClassification1,$TagProtected1,$TagDuress1,$Network +ID2,$TagType2, $TagClassification2,$TagProtected2,$TagDuress2,$MapID,$GatewayGro +up,$X,$Y,$Z, $NISTCertificationDueDate,$RentalReturnDate,$Escalation3,$NISTCer +tificate,$Escalation2, $NISTTag,$ContactName,$Escalation1,$PMCompleted,$RentalInUser,$la +stcolumn,$EOL\n"; } for (@extras){ my $ref = $_; my($AssetName,$AssetId,$Serial,$Type,$ActivityStatus,$BusinessStatus +,$Description, $PrimaryCategory,$Category1,$Category2,$Category3,$Department1,$Depa +rtment2, $Department3,$Department4,$Department5,$Department6,$Group1,$Group2, +$NetworkID1, $TagType1,$TagClassification1,$TagProtected1,$TagDuress1,$NetworkID2 +,$TagType2, $TagClassification2,$TagProtected2,$TagDuress2,$MapID,$GatewayGroup, +$X,$Y,$Z, $NISTCertificationDueDate,$RentalReturnDate,$Escalation3,$NISTCertif +icate,$Escalation2, $NISTTag,$ContactName,$Escalation1,$PMCompleted,$RentalInUser,$lastc +olumn,$EOL); ## Need to assign the variables their values print $OUTPUT "$AssetName,$AssetId,$Serial,$Type,$ActivityStatus,$B +usinessStatus,$Description, $PrimaryCategory,$Category1,$Category2,$Category3, +$Department1,$Department2, $Department3,$Department4,$Department5,$Department6,$Group1,$Grou +p2,$NetworkID1, $TagType1,$TagClassification1,$TagProtected1,$TagDuress1,$Network +ID2,$TagType2, $TagClassification2,$TagProtected2,$TagDuress2,$MapID,$GatewayGro +up,$X,$Y,$Z, $NISTCertificationDueDate,$RentalReturnDate,$Escalation3,$NISTCer +tificate,$Escalation2, $NISTTag,$ContactName,$Escalation1,$PMCompleted,$RentalInUser,$la +stcolumn,$EOL\n"; } close $OUTPUT;

TStanley
--------
People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf. -- George Orwell

Replies are listed 'Best First'.
Re: Searching between two hashes
by AnomalousMonk (Archbishop) on Nov 10, 2017 at 20:50 UTC
    if($ID = $ID2 and $mac2 = ''){ $$ref2{'Network ID 1'} = $mac; }else{ push @extras,\$ref; }

    Are you aware that the expressions  $ID = $ID2 and  $mac2 = '' are assignments and not comparisons? In any case, the  $mac2 = '' assignment will always be boolean false and the if-conditional will always be false. Probably what you meant was
        if ($ID == $ID2 and $mac2 eq '') { ... } else { ... }
    because  $mac2 is certainly being compared to a string, and I suspect that  $ID == $ID2 is likewise a string comparison and therefore should be  $ID eq $ID2 instead.

    Update: An example:

    c:\@Work\Perl\monks>perl -wMstrict -le "my ($ID, $ID2, $mac2) = ('foo', 'foo', ''); ;; if($ID eq $ID2 and $mac2 eq ''){ print 'true block'; }else{ print 'false block'; } " true block c:\@Work\Perl\monks>perl -wMstrict -le "my ($ID, $ID2, $mac2) = ('foo', 'foo', ''); ;; if($ID = $ID2 and $mac2 = ''){ print 'true block'; }else{ print 'false block'; } " false block


    Give a man a fish:  <%-{-{-{-<

      Thanks for that. After 17 years of programming with Perl, I still get those particular operators confused. Part of it is also the fact that I don't do a whole lot of programming any more.

      TStanley
      --------
      People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf. -- George Orwell
        I still get those particular operators confused.

        Top tip: since you know that you fall victim to this (which is more than half the battle), don't write them as $mac2 = '' which is plausible, but instead like '' = $mac2 which is not. The latter is trapped at compile time. Then when you see the compiler errors you can replace the offending expression with the correct '' eq $mac2. Always put the literal value on the left side of the operator to prevent it being a plausible assignment. HTH.

        (Edited for typo)

Re: Searching between two hashes
by AnomalousMonk (Archbishop) on Nov 10, 2017 at 22:32 UTC

    Quite apart from the main concern of your post, the comparison of hashes, another thing that comes to mind is that the sight of those serried ranks of lexical variables marching down the page makes my brain hurt. Not to go all sundialsvc4 on you, but this looks like a guaranteed, copper-bottomed MaintenanceNightmare™.

    One approach to this type of organizational problem is to define constant sets of tags. These tags, which can be, e.g., strings, can then be used as hash keys, header lines (when concatenated together), etc. A tag set never changes its order once defined. They can be used with hash slices to assign to and extract data from hashes as an ordered set. E.g.:

    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "use constant INPUT_TAGS => ( 'Asset Name', 'Asset Application Id', 'Serial Number', 'Asset Type', 'Activity Status', ); ;; use constant OUTPUT_TAGS => ( 'Serial Number', 'Asset Type', 'Asset Name', ); use constant OUTPUT_FORMAT => '%30s'; ;; my $input_record = 'Northwest Gammage Flocculator,AAI-123,SN9876,Flocculator,Taking Up + Space'; ;; my %hash; @hash{ INPUT_TAGS() } = split ',', $input_record; dd \%hash; ;; print join ' | ', map sprintf(OUTPUT_FORMAT, $_), OUTPUT_TAGS; print join ' | ', map sprintf(OUTPUT_FORMAT, $_), @hash{ OUTPUT_TAGS( +) }; " { "Activity Status" => "Taking Up Space", "Asset Application Id" => "AAI-123", "Asset Name" => "Northwest Gammage Flocculator", "Asset Type" => "Flocculator", "Serial Number" => "SN9876", } Serial Number | Asset Type | + Asset Name SN9876 | Flocculator | Nor +thwest Gammage Flocculator
    Note that to disambiguate a constant list for use with a hash slice, the  () (subroutine invocation) operator must be used (update:  & could be used as well, e.g.  &INPUT_TAGS (update: but see choroba's pertinent comment here)) to prevent Perl from interpreting something like  INPUT_TAGS as a string.

    Another point is that rather than using a naive split, it's almost always better to use Text::CSV_XS or Text::CSV for extracting CSV-ish data.

    Update: Corrected small code example error.


    Give a man a fish:  <%-{-{-{-<

      Note that using & for constants prevents them from inlining:
      $ perl -MO=Deparse -e 'use constant C => "abc"; print C' use constant ('C', 'abc'); print 'abc'; -e syntax OK $ perl -MO=Deparse -e 'use constant C => "abc"; print &C' sub C () { 'abc' } use constant ('C', 'abc'); print &C; -e syntax OK
      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Searching between two hashes
by haukex (Archbishop) on Nov 11, 2017 at 10:35 UTC

    A couple of notes:

    • All three lists of my variables are identical
    • The keys of the big %hash are almost identical to the long print $OUTPUT "..." (only two whitespace and capitalization differences - one of them is causing a bug in your code: in the hash it's 'Asset Application ID' but later you say $$ref2{'Asset Application Id'})
    • The last two for loops are identical except for the variables being looped over (could be combined as for (@output_array, @extras))

    That's a lot of deduplication that could be done. In order to be able to test, I made a few changes to your code:

    • The = vs eq bugs noted by AnomalousMonk fixed
    • The precedence issue in open ... || die fixed
    • Output code added
    • I changed push @extras,\$ref; to push @extras,$ref2; because that made more sense to me, although I'm not sure about it

    I admit I don't quite understand the logic going on in the loops over @input_array and @output_array, because they produce a lot of repeated output (more fitting variable names would also help), perhaps you want to use a hash for lookups instead of nested loops. In that case, perhaps you don't need to slurp all of VHC-DVTPump.csv into memory, but could process it line-by-line.

    But anyway, the main point I wanted to make was not to understand the logic for now, but to reduce the duplication and use Text::CSV (also install Text::CSV_XS for speed). I think in this case Text::CSV's column_names, getline_hr, and print_hr are very useful. With some fake input data, the following produces the same output as your code modified as I described, hopefully it's a starting point:

    #!/usr/bin/env perl use warnings; use strict; use Text::CSV; my @COLUMNS = ("Asset Name","Asset Application ID","Serial Number", "Asset Type","Activity Status","Business Status","Description", "Primary Category","Category 1","Category 2","Category 3", "Department 1","Department 2","Department 3","Department 4", "Department 5","Department 6","Group 1","Group 2","Network ID 1", "Tag Type 1","Tag Classification 1","Tag Protected 1", "Tag Duress 1","Network ID 2","Tag Type 2", "Tag Classification 2","Tag Protected 2","Tag Duress 2","Map ID", "Gateway Group","X","Y","Z","NIST Certification Due Date", "Rental Return Date","Escalation3","NIST Certificate", "Escalation2","NIST Tag","ContactName","Escalation1", "PM Completed","Rental In User","last column","EOL"); sub read_csv { my ($filename,$cols) = @_; my $csv = Text::CSV->new({ binary=>1, auto_diag=>2 }); $csv->column_names($cols); open my $fh, '<', $filename or die "$filename: $!"; my $data = $csv->getline_hr_all($fh); $csv->eof or $csv->error_diag; close $fh; return $data; } my $out_filename = 'output.csv'; my $transfers = read_csv('transfers.csv', [qw/ MAC Serial ID /]); my $assets = read_csv('VHC-DVTPump.csv', \@COLUMNS); my @extra_assets; for my $trans (@$transfers) { for my $asset (@$assets) { if ( $$trans{'ID'} eq $$asset{'Asset Application ID'} && $$asset{'Network ID 1'} eq '' ) { $$asset{'Network ID 1'} = $$trans{'MAC'}; } else { push @extra_assets, $asset; } } } my $out_csv = Text::CSV->new({ binary=>1, auto_diag=>2, eol=>$/ }); $out_csv->quote_char(''); # to make output identical (I'd remove this) $out_csv->column_names(@COLUMNS); open my $out_fh, '>', $out_filename or die "$out_filename: $!"; $out_csv->print($out_fh, \@COLUMNS); for my $asset (@$assets, @extra_assets) { $out_csv->print_hr($out_fh, $asset); } close $out_fh;

    By the way, my understanding is that your input files don't contain a header row, but your output file does? If that's not correct and your input file(s) also contain a header row, note that you could do away with the arrays of column names in the code entirely and use Text::CSV's header function instead.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1203135]
Front-paged by Discipulus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-03-19 03:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found