Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Removing Duplicates from a multiline entry

by Kenosis (Priest)
on Feb 27, 2013 at 19:21 UTC ( #1020950=note: print w/ replies, xml ) Need Help??


in reply to Removing Duplicates from a multiline entry

...I need to remove duplicate product entries...

Perhaps your mentioning the address comparison was only a solution proposal. If I'm understanding you correctly--that you only want to "remove duplicate product entries"--then consider the following:

use strict; use warnings; local $/ = ''; my ( %products, %records ); while (<>) { if (/(Product.+)/) { $products{$1}++; $records{$1} = $_; } } print $records{$_} for grep $products{$_} == 1, keys %records;

Usage: perl script.pl dataFile [>outFile]

Output on your data set:

Product 3 ------------------------------------------------------------------ storeId = 2123 phoneNumber = (111) 111-1111 availbilityCode = 1 stockStatus = Limited stock distance = 8.83 city = some city fullStreet = some address Product 2 ------------------------------------------------------------------ storeId = 2117 phoneNumber = (111) 111-1111 availbilityCode = 2 stockStatus = In stock distance = 7.49 city = some city fullStreet = some address

The script builds two hashes: one to track the number of times a product number occurs (%products) and one for the records (%records) keyed on the product number. A record is printed only if the product number was seen only once.

Hope this helps!


Comment on Re: Removing Duplicates from a multiline entry
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1020950]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2014-09-03 07:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (35 votes), past polls