Clever. I like it.
Would it not be a good idea to record the name of the file
only once? Like:
foo: index.html,5,18;
bar: index.html,6;page1.html,1;
baz: index.html,7;
That would make the index smaller. Then with some trivial
splitting, you end up with something like:
$seq{'foo'}{'index.html'} = [5,18];
$seq{'bar'}{'index.html'} = [6];
$seq{'bar'}{'page1.html'} = [1];
$seq{'baz'}{'index.html'} = [7];
Now I just need a clever algorithm to iterate over it all
and figure out which document has
foo bar baz in
order. Hmmm.