Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

PerlMonks and Google

by stefp (Vicar)
on Mar 31, 2001 at 22:43 UTC ( #68686=monkdiscuss: print w/ replies, xml ) Need Help??

... or any other search motor. I just happen to prefer Google. PerlMonks uses only one url path because it relies on CGI requests. On the other hand, indexation motors don't follow links with requests because it often leads to an unlimited number of virtual pages.

I suppose it should be pretty easy to make perlmonks accessible using another DNS address (say id.perlmonks.org) with the path directly built rom the id of the node accessed.

say: http://id.perlmonks.org/31579.html.

The links within such a page would follow the same syntax

I suppose that most indexation motors would corrrectly index such a site. I don't think that an Internet abbey must stay a secluded place!!! ..

-- stefp

Comment on PerlMonks and Google
(jeffa) Re: PerlMonks and Google
by jeffa (Chancellor) on Mar 31, 2001 at 22:57 UTC
    Very interesting. . . . getting archives in search engines is a good thing, IMHO.

    I am currently happy with the results when I search for jeffa on Google. I wonder if jcwren, kudra, and damian1301 are as well. . . :P

    Jeff

    R-R-R--R-R-R--R-R-R--R-R-R--R-R-R--
    L-L--L-L--L-L--L-L--L-L--L-L--L-L--
    
Re: PerlMonks and Google
by footpad (Monsignor) on Mar 31, 2001 at 23:02 UTC

    This is an interesting idea, though I expect it would take a bit of work.

    If pursued, I'd recommend hiding the Chatterbox and the Other Users nodelets, given the previous discussions regarding CB logging and the fact that google hits include the CB.

    Alternatively, you could hide all nodelets on the "id" server, thereby providing a convenient way to nicely print nodes without having to hide the nodelets yourself.

    In fact, you could even detect whether or not it's a (common) robot/spider and, if so, redirect them to the "id" version of the node, thereby (eventually) removing the current CB "logs" in google's cache.

    I know you couldn't get all of the robots, but you could certainly get most of them.

    Again, I realize there's some work (and therefore time) involved, but it might help raise awareness of the site, increase traffic, etc., etc.

    --f

Re: PerlMonks and Google
by Hero Zzyzzx (Curate) on Apr 01, 2001 at 00:16 UTC

    Sorry, this is a little off-topic, and not completely original, but interesting (at least to me) nonetheless.

    I've done something similar to this: make a dynamic site appear static for the purpose of search engine indexing.

    The way I did it was with the creation of each item, output an SSI statement into a file with proper parameters to call the dynamic content:
    Example: I have a directory called "/redir/" that has all my SSI files in it. Each file is tiny, and looks something like this:
    <!--#include virtual="/cgi-bin/process.pl?action=viewitem&id=97"--> Then each link on my site calls this SSI directive with the name "/redir/97.html". All the links look static but are created when requested because of the SSI trickery.

    The "caught in an infinite indexing loop" is important to think about, I don't know how it would be handled on the perlmonks site.

    Obviously google does index some dynamic content, as evidenced by jeffa's search above. Do we know how google decides what dynamic content to index? This whole point may be moot.

      I have checked the word perlmonks on Google and there is almost nothing. There must be many perllmonks pages talking about perlmonks! So I doubt google indexes any page but explicitely submitted ones

      Probably jeffa explicitely submitted his page.

      -- stefp

        I did indeed. Most of the links are not pointing to perlmonks but to pages from other sites speaking of perlmonks like paris.pm.org/meetings/2000/0906.html/

        I suspect links to perlmonks have been either explicitely submitted or reached by Google from other sites. Following CGI requests within a site can lead to infinite recursion so I doubt this is ever done by an indexing motor.

        Self referential note: material in italic was not present in the first edition of this node

        -- stefp

(crazyinsomniac) Re: PerlMonks and Google
by crazyinsomniac (Prior) on Apr 01, 2001 at 04:12 UTC

    I am really outraged.

    If you do a search for crazyinsomniac on google, you'll get two pages full of results, with most of them cached.

    In fact, if you do a search on any particular user from the monastery, and throw in perl monk, you'll get their homenode.

    I for one do not appreciate having a history of my homenode available a 3 months after it's last been updated.

    Can't we stop google from doing this?

    I really really would like to prevent this. It does not seem to me like they asked permission, and this has to violate some kind of copywright law. The web is supposed to be ever changing, and it really pisses me off when people try to take snapshots for 'commercial' use, yes i consider it commercial use, without permission.

    UPDATE:
    Yes, google's cache feature is useful, and yes I do like it and use it everyday, and yes I could just not post information in a public arena, BUT I still don't like it and feel they should ask or say someplace:

    Hey baby, we're a caching your goodies, so beware.

    UPDATE: (April 26, 2001, 11:10am PST)
    I do appreciate all the responses and what not, but you can't blame me for being paranoid. That is what sleep deprevation (among other things) does to you.

     
    ___crazyinsomniac_______________________________________
    Disclaimer: Don't blame. It came from inside the void

    perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

      In a nutshell, I don't believe we can. Google is as useful as it is because of that caching facility.
      As for it violating copyright law, I think that was already decided that it didn't, as otherwise you'd find all companies that operated a webcache violated that copyright.
      Personally, I don't have any issues at all with cached copies of stuff appearing on google (interestingly enough, it seems to have archived stuff from the chatbox too.. I found an old comment of mine appearing on there on a search for "Malkavian perl monk")..
      Bear in mind, it only appears if someone's actively looking for that particular data.
      If someone is actually interested in looking for Malkavian the Perl Monk, I'm pretty sure they'd come to the Perl Monks site, and look at my home node.
      If someone's interested enough in me to go and look me up on a search engine, I think I'd count myself flattered. :)
      The idea behind Google's web cache is that quite frequently, some very interesting stuff is lost, either by user account shutdown at universities, or machines temporarily falling off the internet. In those cases, google offer a snapshot of this data for a time (won't be forever, as they don't have infinite space to store it), under the understanding it's a cache, and thus likely out of date, but still allows you a glimpse of what otherwise wouldn't be reachable.
      Anyhow, that's just my take on it.

      Cheers,

      Malk

      If you don't want in a cache, then don't post it in a public arena. If it ain't in google's cache, then it'll be somewhere else.

      Of course, you could use frames, I suppose. Those are a lot harder to cache.

      You've got to be kidding!

      I'm sure I'm not the only one here who thinks Google's cached results are an immensly useful feature. I use them daily; perhaps hourly.

      If I don't want to make information publicly available or archived, I simply don't put it online, ever, in any form. I don't believe you can have it both ways; you can't advocate a free, searchable, enlightening and informative internet, while crying "copyright violation!" when you spot a stale about me page in some search enginge's web-cache. Perhaps they are violating copyright law; well so much the better, and kudos to them.

      Yes, Google is commercial, but they are really the only search engine built by and for people with Clue, and that makes them Good Guys, in my book. If Google hears enough complaints of this sort, they will certainly relent, and become another worthless fenced-in MSN/aol-style portal. That would be a very sad day, for users of Google and for the internet at large.

         MeowChow                                   
                     s aamecha.s a..a\u$&owag.print
      yep, simply tell the search robots to not to follow, whether in the html header or in the robots.txt.
      And just by the way, I think Google would even like to be not in need to also index more or less usefull information about us.
      If you have your own homepage, well let Google search it. But as it is here we can not ask for anyones permission, which would be needed as the person cannot intervent anyhow and cannot stop search engines like google from indexing things that belong somwhat also to privacy.
      Otherwise we prevent perlmonks from saying something about themselfes in their homenodes.
      But however I might think :
      Have a nice day
      All decision is left to your taste

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: monkdiscuss [id://68686]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2014-10-26 01:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (149 votes), past polls