Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: Building Networks of Matches

by hv (Prior)
on Dec 23, 2004 at 14:20 UTC ( [id://417091]=note: print w/replies, xml ) Need Help??


in reply to Building Networks of Matches

I didn't fully understand your suggestion for how you might go about it, but below is the simplistic approach I'd use as a starting point. The key features of the scan are a) to build two data structures simultaneously: one to contain known sets of equivalences, and the second to contain the elements that those sets match; b) when new equivalences are found, the data structures for the equivalent sets are merged.

#!/usr/bin/perl -w use strict; my $data = read_input(); my $sets = scan($data); for (@$sets) { printf "{ %s }\n", join ' ', sort { $a <=> $b } @$_; } sub read_input { my @data; local $_; while (<DATA>) { push @data, [ grep defined, split /\s+/ ]; } \@data; } sub scan { my $data = shift; my(%matches, %results); for my $index (0 .. $#$data) { my @equal; my $these = $data->[$index]; for my $key (keys %matches) { my $compare = $matches{$key}; if (grep exists $compare->{$_}, @$these) { push @equal, $key; } } $results{$index} = [ $index, map @{ delete $results{$_} }, @equal +]; $matches{$index} = { map(($_ => 1), @$these), map %{ delete $matches{$_} }, @equal }; } [ values %results ]; } __END__ a b c d e f b g h i j k l m f

If this isn't fast enough, my first thought to improve it would be to find some way of using bit vectors to represent the elements, so that matches can be checked with a bitwise-and of two strings. To do that, you'd need to find a way to translate elements into numbers that you can use as a bit offset.

However, if there are lots of elements most of which appear only once, it may be better to do a prepass to get a list of repeated elements, and then consider only those repeats in the main loop.

Hope this helps,

Hugo

Replies are listed 'Best First'.
Re^2: Building Networks of Matches
by bowsie (Initiate) on Dec 23, 2004 at 14:59 UTC
    This is VERY fast and very good - thank you! As for the prepass, I can do that easily with a sort unique in unix. :)

    You're a genius! This may be one of the best Xmas gifts I get this year!

    Thanks!

    Bowsie

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://417091]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2024-04-20 02:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found