Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

The Web is Set Up All Wrong

by InfiniteSilence (Curate)
on Apr 20, 2011 at 14:40 UTC ( [id://900358]=perlmeditation: print w/replies, xml ) Need Help??

You know, I now realize why I hate surfing the web for technical information nowadays: advertisements are ruining our brains.

Tim Berner's Lee said recently: How do we build the web so that every now and then it introduces us to people who are not friends of friends ...

I say something similar -- how do we change the web so that we can separate actual content from advertisements?

Consider the following Google search: http://www.google.com/search?q=windows+azure+sharpdevelop. The top links returned actually have nothing whatsoever to do with Microsoft's cloud computing platform, dubbed Azure. Instead, there is a link at the bottom of each page with 'Windows Azure' and the remainder of the page is relevant to the 'SharpDevelop' part. In essence, the page contains both keywords but they have nothing to do with eachother in the body.

This is worse than forcing people to watch commercials that have been stored along with a TV program on DVR. At least in that instance you are able to see the program you want. In this case the search results bait you with information and then hand you waste material.

What we appear to need are tools and standards to help ourselves cut through clutter and get to the information we need without wasting time. I propose we start thinking a bit outside of the box and take a look at implemeting new web standards and communication protocols and build Perl tools around them.

Celebrate Intellectual Diversity

Replies are listed 'Best First'.
Re: The Web is Set Up All Wrong
by davido (Cardinal) on Apr 20, 2011 at 15:26 UTC

    You're proposing then, that web developers make it easy to distinguish ads from content, so that search engines can return content rather than ads. You're also proposing that search engines redesign themselves to take advantage of these clues. But much of the trouble comes already from situations where developers of poor quality content with lots of ads try to seed search engines to return their pages with higher priority. You will always have people with incentive to try to game the system. Those people don't have much incentive to follow rules. Do you have any suggestion how to deal with that issue? Because I see that as the primary issue.


    Dave

      "...You're proposing then, that web developers make it easy to distinguish ads from content, so that search engines can return content rather than ads..."

      Let me say that I'd be happy if the search engine results returned relevant results rather than junk. I see this as a protocol/toolset problem akin to spam. Once upon a time people said that spam could never be beaten. We were doomed to read e-mails about Viagara until Hades opened popsicle stands, but someone build a neato tool called a spam filter. Now we still get spam, but it is tolerable.

      "...You will always have people with incentive to try to game the system..."

      There are always going to be people who will try to break into my car. Does that mean I should purchase a car without locks next time?

      Celebrate Intellectual Diversity

Re: The Web is Set Up All Wrong
by ww (Archbishop) on Apr 20, 2011 at 17:01 UTC
    The Perl nexus to this node is pretty thin... I could just as well suggest that we stage boycotts of motor vehicles to protest high gas prices and "build Perl tools around them."

    However, the relevant issues here are economics, business (mores || ethics || practices) and human nature.

    Which funding source (aka "advertiser") will support (aka "buy advertising") the internet if their ads are filtered out so that we see chiefly (only?) the content we want? Which of us will routinely seek out advertisements for products we (are unaware of || don't care about)?

    And re your hope "that we can separate actual content from advertisements," we can: we just have to use our powers of discrimination (and maybe enhance our search-fu) rather than hope someone (our new masters?) will do it for us.

      On the one end you say this has little to do with Perl and in the next I read, "...maybe enhance our search-fu...". If you are used to using Perl and the web are these not inter-related? I can guess that your retort might be, 'but I don't use Perl to search/surf/look up things on the web'...and that is precisely my point. I don't either. I use search engines. However, due to the great push to compete for mindshare on search engines computing information is becoming hopelessly cluttered. We need tools to help us unclutter the clutter. Would you rather we write those tools in Rexx?

      ...However, the relevant issues here are economics, business (mores || ethics || practices) and human nature..."

      I kind of saw them as 'freedom, peace of mind, ability to find cogent data first and useless data second' but we can go with yours too.

      "Which funding source (aka "advertiser") will support (aka "buy advertising") the internet ..."

      Let me quickly say that I am somewhat appalled by this statement. The web was created as a knowledge sharing tool. True, the perverse reality is that advertising has taken over and perhaps in many cases that is okay -- until you start looking for something important/useful on the web. Now all of a sudden you must dig through pages worth of crapola. Search engines are primarily supposed to give us the information we are asking for first and stuff that we might be interested in second. It can do it the other way around if it is clear that you are still basically getting what you want. When a search tool returns completely wrong data but I get a full dose of advertising 'sugar' I'm a bit disgusted. I would think you would be too.

      What I'm saying here is that search engines get their money from advertisers, so they are never going to add a -minus::Advertisements to their listings. We have to do it. Welcome to the new digital democracy.

      Celebrate Intellectual Diversity

        "perverse reality" ?!

        My understanding is that any perversity associated with "reality" is in the eye of the beholder.

        As for the rest of your argument, idealistic though it may be, consider:

        • These tools you want to build must somehow filter out what you regard as objectionable material. How are you going to define that? Seems to me it's pretty much the same problem alluded to many decades ago by the Justice of the US Supreme Court, who threw up his hands over an attempt to distinguish between protected content (words, images, et al) and pornography: he said he couldn't define the distinction with words (law) but that "I know it when I see it."
        • Are your tools going to replace the bots from Google, Bing and Yahoo (inter alia) to index the parts of the internet that satisfy your definition of sites which act "as a knowledge sharing tool?" (Where will you keep your server farm? How will you pay for the pipes, the electricity and the real estate? Will everyone seeking your www.nirvana need to pay you or do you plan to make your results available at your own expense, for the right to be the sole judge of what to include (Now, there's a notion that appalls me).
        • You also say you are somewhat appalled" and "a bit disgusted" by the notion (paraphrasing myself) that advertising revenue makes the web go 'round. I'm not inclined to get upset (at least, not after thinking about it) over the fact that apples fall down; life is unpredictable; and economic gain is often a driving motivation for members of homo sap.
Re: The Web is Set Up All Wrong
by moritz (Cardinal) on Apr 21, 2011 at 06:47 UTC
    What we appear to need are tools and standards to help ourselves cut through clutter

    Standards don't help at all unless there's some incentive to follow them.

    Currently you'd have a hard time convincing an advertiser to semantically mark it ads, because it goes against the goals of advertising.

    There have been lots of proposals for more semantic markup, but none of them have really been successful on a large scale. Why? Because it's extra hassle for the content creator: first you have to learn a new markup format, enter your data in that format, and then turn it back into HTML so that all the browsers in the world can view it again, and in the end nobody ends up using the more semantic version anyway.

Re: The Web is Set Up All Wrong
by JavaFan (Canon) on Apr 20, 2011 at 20:02 UTC
    Hmm, if I click the link you provide, the top two links provided by google both point to microsoft.

    Anyway, the answer is money. As soon as a significant number of people are willing to pay for a Google like service that weeds out "ads", someone will create the service.

    But people don't want to pay. They want it free. And free of ads and garbage even.

    That's not how the world works.

Re: The Web is Set Up All Wrong
by scorpio17 (Canon) on Apr 21, 2011 at 15:13 UTC

    Google is constantly working to improve the quality of its search results. And spammers are constantly working to find ways to game the system (maybe "spammer" is the wrong word - is there a name for someone who creates a webpage with poor quality content, yet somehow tricks google into giving them a high pagerank, for the sole purpose of getting ad revenue?). Here's one possible solution: google could add an upvote/downvote system, similar to what is used here on perlmonks. When people search for something and get bogus results they could downvote them. Google could aggregate this user feedback (positive and negative) and use it to modify their pagerank info. Of course, spammers would try to upvote their own pages... How can you build a system that can't be "gamed"? You really can't. It's a very hard problem. Solve it and maybe you'll become rich and famous.

Re: The Web is Set Up All Wrong
by John M. Dlugosz (Monsignor) on Apr 22, 2011 at 04:55 UTC
    A long time ago I noticed that the sponsored links are less useful than the real links. Searching for a Brand model xxxyyyz, the sponsored link triggers off Brand and is generic for the catalog of that site. The real links, possibly into the same store, will be for the xxxyyyz page.

    I've also run into sites that put every keyword for the site in the header of every page, so the search engine can't find the right page. I understand that SE's don't even look at the header keywords anymore.

    Generalizing it, if the search engine sees the same footer on every page, it can treat that as spam noise.

    I think microformats are of interest here. You can indicate that this page is built using X rather than about X.

    I've also run into the problem where every dictionary/glossary page fills a search because both words are on the page. They are not connected though; it just lists all those words separately.

    Maybe "nearness" is a good indicator that content is really relevant to both search terms at once. I think Google doesn't support "near" anymore though.

Re: The Web is Set Up All Wrong
by pemungkah (Priest) on Apr 21, 2011 at 19:43 UTC
    Does this search work better for you? https://blekko.com/ws/%22windows+azure%22+sharpdevelop. That's Blekko, specifically designed to filter the bad stuff out. (Note: I used to work there.)

    The alternative at Blekko is to add sites you trust to a slashtag and then search with that - like "windows azure" /sharpdevelop. It has the disadvantage that you don't get hits from sites that you didn't already know about. You can use a previous search (like the first one) to build a slashtag, so maybe a combination will help.

    It's definitely not perfect, mostly because Blekko's relevance isn't always quite as good as Google's, especially with multi-word queries, but it can be a lot better in many cases - and in highly-spammed categories, Blekko tends to do better, because they use an algorithm that detects highly-spammed searches and adds a pre-generated slashtag automatically.

    Again, I used to work there, and I think they've done some really cool stuff. And it's Perl, too. :)

    Google does seem to be attacking the problem, but not as aggressively as Blekko - mostly, I think (this is all my opinion, not Blekko's or Google's, or any other employers - I don't speak for or work for either of the search engines) because there's a tension between providing good results and dropping sites that make Google money, however skeevy that money is. At the moment, Blekko doesn't have that tension, as they're not selling ads nor do they have sponsored results. There's no impetus for them to do anything except to try to provide better results than Google does to lure traffic away from Google. When they finally do add monetization, it will be interesting to see how (and if) they manage to keep their anti-spam stance as aggressive as it now is.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://900358]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (8)
As of 2024-04-18 16:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found