http://www.perlmonks.org?node_id=1031246


in reply to Re: ignore duplicates and show unique values between 2 text files
in thread ignore duplicates and show unique values between 2 text files

Hi kennethk, I am unsure on how to "reply to all" But can the script be modified to take account of two columns i.e

FILE 1

261293 'snow > equipment' 261293 'snow > equipment > boots' 261293 'snow > equipment > facemasks' 261293 'snow > equipment > goggles' 261293 'snow > equipment > helmets' 261293 'surf > accessories > books'

FILE 2

261293 'snow > equipment' 261293 'snow > equipment > boots' 261293 'snow > equipment > facemasks' 261293 'snow > equipment > goggles' 261293 'surf > accessories > books'

OUTPUT

261293    'snow > equipment > helmets'

The two columns are separated by Tab, is this possible?

Thank you

Replies are listed 'Best First'.
Re^3: ignore duplicates and show unique values between 2 text files
by kennethk (Abbot) on Apr 29, 2013 at 15:59 UTC
    This is Perl; just about everything is "possible". However, I fail to see why the two column example is functionally different than a full line comparison. "261293\t'snow > equipment > goggles'" will equal "261293\t'snow > equipment > goggles'" just as much as the two substrings would. Are you dealing with a case where the numbers change and you need to be insensitive to that?

    Breaking the two columns apart can easily be achieved with code like my @terms = split /\t/, $line;. See split.


    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      However, I fail to see why the two column example is functionally different than a full line comparison. "261293\t'snow > equipment > goggles'" will equal "261293\t'snow > equipment > goggles'" just as much as the two substrings would. Are you dealing with a case where the numbers change and you need to be insensitive to that?

      Yes, I totally agree, the same code should just work as well.

Re^3: ignore duplicates and show unique values between 2 text files
by LanX (Saint) on Apr 29, 2013 at 15:58 UTC
    > is this possible?

    yes, but we won't post whole code!

    Apply

    my ($number,$article) = split /\s+/, $line, 2

    for each input line and decide which part should be unique.

    learn to do it yourself with split.

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    UPDATE

    added missing third parameter for split

      I think your posted code will not follow the posted spec. The posted lines contain additional whitespace, so my ($number,$article) = split /\s+/, $line will yield
      $number = 261293 $article = 'snow
      as opposed to split /\t/, $line, which would yield
      $number = 261293 $article = 'snow > equipment > helmets'
      Update: Parent code updated

      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        Well sorry I noticed my problem after I saw your code and updated it in the meantime. =)

        Anyway considering the bad data the OP showed till now, it's better not expecting '\t' as delimiter...

        ... at least as long as there is no third column coming into play. =)

        Cheers Rolf

        ( addicted to the Perl Programming Language)