Re: (thoughts on) Detect common lines between two files, one liner from shell
by Albannach (Monsignor) on Dec 14, 2000 at 02:36 UTC
|
While I can see how Stephen is looking at this
as an obfuscation-in-progress, and others are looking at it
as just plain obfuscated, this type of thing is a great example
of why I love playing around with Perl. Don't get me wrong, I
certainly thought merlyn had left out a few lines when I
first looked at it, but after a couple of minutes it really
started to look beautiful (I'm sick, I know...).
In this fine example, merlyn:
- didn't redefine any defaults
- didn't use any obscure, poorly documented features
- didn't even use single-letter variable names
- heck, it's even full of wasted spaces
He simply made excellent use of the well-documented default
behaviours which even I use every day. It still amazes
me that St.Larry (and some Perl elves) thought up all these
behaviours which often look odd to me at first, but eventually dovetail together
so well it's hard to imagine all of these uses weren't considered.
Maybe after I've been here long enough I'll be able to come up with better
ways to fit the parts together too.
--
I'd like to be able to assign to an luser | [reply] |
|
Yeah, it's not an obfuscation by any means. Just a nice way to put together a lot of common features. In fact, I'd take it one step further to fulfill one additional
monkey wrench thrown in after that posting was made: what if the line appears more than once in either fileA or fileB or both, but you still want only one copy of the line?
Well, the answer is just as straightforward. Remove the dollar from the regex!
I'll leave that explanation as an exercise to the clever reader. {grin}
-- Randal L. Schwartz, Perl hacker
update: Bleh! My mistake, the dollar was added to handle this case!
I knew I had needed to deal with multiple hits somehow.
Remind me never to post again. {grin}
| [reply] |
|
This should also work for an arbitrary number of files, which AFAIK has no equivalent UNIX command. To show common lines in 4 files:
perl -ne 'print if ($seen{$_} .= @ARGV) =~ /32+1+0$/' fileA fileB fileC fileD
| [reply] |
|
Re: Detect common lines between two files, one liner from shell
by stephen (Priest) on Dec 13, 2000 at 23:31 UTC
|
To add insult to injury, you could do this:
perl -ne '@foo[10]="print"; eval( $foo[ substr( ( $seen{$_} .= @ARGV),
+ -2)])' file1 file2
Adding just that extra bit of obscurity to make it completely byzantine...
stephen
| [reply] [d/l] |
Re: Detect common lines between two files, one liner from shell
by b (Beadle) on Dec 13, 2000 at 22:16 UTC
|
| [reply] |
|
I will point to the pieces of documentation from which you
can figure it out. I suggest locating it with perldoc, but I will also provide links to site documentation.
The meaning of the -n and -e switches is explained in perlrun. This also tells you what $_ is during the script. As you scan through files, the contents of @ARGV change. The append is being done in scalar context. In that context @ARGV gives you the number of elements you have. The pattern will match when the hash value ends with "10". The two filenames are on the command line. The output is redirected to a file that you look at.
The trick is that for the hash value to get a 1 in it, the line must appear in the first file. For it to get a 0 in it, it must appear in the second. It will only match /10$/ on the first occurance in the second file when it already appeared in the first.
| [reply] |
|
Hey, thanks alot!
I couldn't understand where the files are read from. There is
no <> anywhere and the @ARGV is only the
file names.
The trick with the 0 and 1 is cool.
| [reply] [d/l] [select] |
|
Sorry, I'm new to this and I don't have too much time to read the tutorials, but this I still don't understand
$seen{$_} I know $_ is the current stream.
.= is like adding it at the end. But what is that
hash /10$/ ? But this only matches the exact
line length it doesn't look for a the same word.
what if I want to find a word in both files and print it out on screen?
Thanks.
| [reply] |
|
A reply falls below the community's threshold of quality. You may see it by logging in. |