Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re: Removing Duplicates from a multiline entry

by Kenosis (Priest)
on Feb 27, 2013 at 19:21 UTC ( #1020950=note: print w/replies, xml ) Need Help??

in reply to Removing Duplicates from a multiline entry

...I need to remove duplicate product entries...

Perhaps your mentioning the address comparison was only a solution proposal. If I'm understanding you correctly--that you only want to "remove duplicate product entries"--then consider the following:

use strict; use warnings; local $/ = ''; my ( %products, %records ); while (<>) { if (/(Product.+)/) { $products{$1}++; $records{$1} = $_; } } print $records{$_} for grep $products{$_} == 1, keys %records;

Usage: perl dataFile [>outFile]

Output on your data set:

Product 3 ------------------------------------------------------------------ storeId = 2123 phoneNumber = (111) 111-1111 availbilityCode = 1 stockStatus = Limited stock distance = 8.83 city = some city fullStreet = some address Product 2 ------------------------------------------------------------------ storeId = 2117 phoneNumber = (111) 111-1111 availbilityCode = 2 stockStatus = In stock distance = 7.49 city = some city fullStreet = some address

The script builds two hashes: one to track the number of times a product number occurs (%products) and one for the records (%records) keyed on the product number. A record is printed only if the product number was seen only once.

Hope this helps!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1020950]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2017-09-23 03:05 GMT
Find Nodes?
    Voting Booth?
    During the recent solar eclipse, I:

    Results (270 votes). Check out past polls.