http://www.perlmonks.org?node_id=636741


in reply to Re: Make your 404 pages smarter with metaphone matching
in thread Make your 404 pages smarter with metaphone matching

As far as i can see, the script used filters all files by extension. Everything with .html gets indexed, everything else isn't.
  • Comment on Re^2: Make your 404 pages smarter with metaphone matching

Replies are listed 'Best First'.
Re^3: Make your 404 pages smarter with metaphone matching
by merlyn (Sage) on Sep 04, 2007 at 02:38 UTC
    Everything with .html gets indexed, everything else isn't.
    And ... what?

    That doesn't address my concern at all. If I have a private URL that ends in ".html", it'll still likely get indexed. Then someone guesses a URL similar to that, and boom, they're in.

    A good solution would also have an additional regex or blacklist of things that should never be offered as a suggestion.

      If I have a private URL that ends in ".html", it'll still likely get indexed.

      It's not likely, it will get indexed for sure. I don't think this is meant as a finished solution but to show a general way how to do such things.
      I am afraid however, there will be more cut & pasting than actual reading.