Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^2: ignore duplicates and show unique values between 2 text files

by perlnoobster (Sexton)
on Apr 29, 2013 at 15:42 UTC ( #1031246=note: print w/ replies, xml ) Need Help??


in reply to Re: ignore duplicates and show unique values between 2 text files
in thread ignore duplicates and show unique values between 2 text files

Hi kennethk, I am unsure on how to "reply to all" But can the script be modified to take account of two columns i.e

FILE 1

261293 'snow > equipment' 261293 'snow > equipment > boots' 261293 'snow > equipment > facemasks' 261293 'snow > equipment > goggles' 261293 'snow > equipment > helmets' 261293 'surf > accessories > books'

FILE 2

261293 'snow > equipment' 261293 'snow > equipment > boots' 261293 'snow > equipment > facemasks' 261293 'snow > equipment > goggles' 261293 'surf > accessories > books'

OUTPUT

261293    'snow > equipment > helmets'

The two columns are separated by Tab, is this possible?

Thank you


Comment on Re^2: ignore duplicates and show unique values between 2 text files
Select or Download Code
Re^3: ignore duplicates and show unique values between 2 text files
by LanX (Canon) on Apr 29, 2013 at 15:58 UTC
    > is this possible?

    yes, but we won't post whole code!

    Apply

    my ($number,$article) = split /\s+/, $line, 2

    for each input line and decide which part should be unique.

    learn to do it yourself with split.

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    UPDATE

    added missing third parameter for split

      I think your posted code will not follow the posted spec. The posted lines contain additional whitespace, so my ($number,$article) = split /\s+/, $line will yield
      $number = 261293 $article = 'snow
      as opposed to split /\t/, $line, which would yield
      $number = 261293 $article = 'snow > equipment > helmets'
      Update: Parent code updated

      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        Well sorry I noticed my problem after I saw your code and updated it in the meantime. =)

        Anyway considering the bad data the OP showed till now, it's better not expecting '\t' as delimiter...

        ... at least as long as there is no third column coming into play. =)

        Cheers Rolf

        ( addicted to the Perl Programming Language)

Re^3: ignore duplicates and show unique values between 2 text files
by kennethk (Monsignor) on Apr 29, 2013 at 15:59 UTC
    This is Perl; just about everything is "possible". However, I fail to see why the two column example is functionally different than a full line comparison. "261293\t'snow > equipment > goggles'" will equal "261293\t'snow > equipment > goggles'" just as much as the two substrings would. Are you dealing with a case where the numbers change and you need to be insensitive to that?

    Breaking the two columns apart can easily be achieved with code like my @terms = split /\t/, $line;. See split.


    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      However, I fail to see why the two column example is functionally different than a full line comparison. "261293\t'snow > equipment > goggles'" will equal "261293\t'snow > equipment > goggles'" just as much as the two substrings would. Are you dealing with a case where the numbers change and you need to be insensitive to that?

      Yes, I totally agree, the same code should just work as well.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1031246]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (10)
As of 2014-09-18 20:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (124 votes), past polls