http://www.perlmonks.org?node_id=878022


in reply to Removing Duplicates in a HoH

It's not clear what you are intending the code to actually do, but perhaps the issue isn't how to remove "duplicate" keys. Perhaps what you really ought to consider is if there is a better way to store the data in the first place such that you don't need to remove the duplicate keys? I notice that THERMOMETER has two different values for quantity, is arbitrarily removing one of the thermometer entries what you really want? I don't know, since I don't know if the quantity data is important.

Anyway, my main point is that it might be better to store your data by part number.

$HoH{$part_number}{quantity}= $qty; $HoH{$part_number}{description} = $description; ...

Replies are listed 'Best First'.
Re^2: Removing Duplicates in a HoH
by DunLidjun (Acolyte) on Dec 20, 2010 at 15:10 UTC

    I normally would store the information by part number however the file that produces this information is extremely large and has duplicate part numbers. This script is actually trying to combine and reduce the data to single part numbers as well as group the tags and add the quantities.

      Storing the information by part number does what you are looking to accomplish, see the response of scorpio17 which provides an example of what I'm talking about.

        I really like this reduction. I previously used the hash to remove duplicate lines like below:

        1: partnum=1003382553-M25,description=CNTFGL PUMP,quantity=1.0000,tags +=PU-200 1: partnum=1003382553-M25,description=CNTFGL PUMP,quantity=1.0000,tags +=PU-200 1: partnum=1003382553-M25,description=CNTFGL PUMP,quantity=1.0000,tags +=PU-200 1: partnum=1003382553-M25,description=CNTFGL PUMP,quantity=1.0000,tags +=PU-200 1: partnum=1003382553-M25,description=CNTFGL PUMP,quantity=1.0000,tags +=PU-200 1: partnum=1003382553-M25,description=CNTFGL PUMP,quantity=1.0000,tags +=PU-200 1: partnum=1003382553-M25,description=CNTFGL PUMP,quantity=1.0000,tags +=PU-200 1: partnum=1003382553-M25,description=CNTFGL PUMP,quantity=1.0000,tags +=PU-200 1: partnum=1003382553-M25,description=CNTFGL PUMP,quantity=1.0000,tags +=PU-200 1: partnum=1003382553-M25,description=CNTFGL PUMP,quantity=1.0000,tags +=PU-200 1: partnum=1003382553-M25,description=CNTFGL PUMP,quantity=1.0000,tags +=PU-200 1: partnum=1003382553-M25,description=CNTFGL PUMP,quantity=1.0000,tags +=PU-200 1: partnum=1003382553-M25,description=CNTFGL PUMP,quantity=1.0000,tags +=PU-200 1: partnum=1003382553-M25,description=CNTFGL PUMP,quantity=1.0000,tags +=PU-200 1: partnum=1003382553-M25,description=CNTFGL PUMP,quantity=1.0000,tags +=PU-200 1: partnum=1003382553-M25,description=CNTFGL PUMP,quantity=1.0000,tags +=PU-200 559: partnum=2203505000,description=CONDUCTIVITY CELL,quantity=2.0000, +tags=AE-100 559: partnum=2203505000,description=CONDUCTIVITY CELL,quantity=2.0000, +tags=AE-100 AE-200 559: partnum=2203505000,description=CONDUCTIVITY CELL,quantity=2.0000, +tags=AE-100 AE-200 559: partnum=2203505000,description=CONDUCTIVITY CELL,quantity=2.0000, +tags=AE-100 AE-200 559: partnum=2203505000,description=CONDUCTIVITY CELL,quantity=2.0000, +tags=AE-100 AE-200

        Once I loaded this into a hash it reduced to the following:

        1: partnum=1003382553-M25,description=CNTFGL PUMP,quantity=1.0000,tags +=PU-200 559: partnum=2203505000,description=CONDUCTIVITY CELL,quantity=2.0000, +tags=AE-100 AE-200

        The current modification by scorpio17 adds the duplicates to gether, unfortunately.

        Thanks for the help and the insight. I really appreciate it. Shawn Way