Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Sometimes Perl is awesome: Duck Duck Go edition

by educated_foo (Vicar)
on Feb 07, 2012 at 20:16 UTC ( #952336=CUFP: print w/ replies, xml ) Need Help??

Because I have always assumed that Google would aggregate tracking information across its various web properties, I have long used Scroogle for most of my web searches. It turns out I was right: every mail message you read, every search you perform, and every page you visit with a Google-served ad or analytics tracker is linked to your unique identity. With Google making Scroogle unreliable lately, I have given Duck Duck Go (implemented in a mixture of Perl and JavaScript) another look, and I have been pleasantly surprised: it gives pretty good results with an uncluttered interface and a sane privacy policy.

It also has an easy-to-use API, without license keys or similar BS. While there are a couple of CPAN modules for this API, they're pretty heavy for such a simple task. Here's a simple script that, on Perl >= 5.14, has no non-core dependencies:

use JSON::PP; use HTTP::Tiny; sub ddg_clean { my $res = shift; for my $k (keys %$res) { delete $res->{$k} if $res->{$k} eq ''; if (ref $res->{$k} eq 'HASH') { ddg_clean($res->{$k}); delete $res->{$k} unless keys %{$res->{$k}}; } elsif (ref $res->{$k} eq 'ARRAY') { ddg_clean($_) for @{$res->{$k}}; delete $res->{$k} unless @{$res->{$k}}; } } } sub ddg { my $q = shift; my $h = new HTTP::Tiny; my $res = $h->get( 'http://api.duckduckgo.com/?' . $h->www_form_urlencode({ format => 'json', q => $q })); die unless $res->{success}; ddg_clean($res = decode_json($res->{content})); $res; } sub INDENT() { ' ' } sub ddg_format { my ($it, $lev) = @_; if (!ref $it) { wrap(INDENT x $lev, INDENT x $lev, $it); } elsif (ref $it eq 'HASH') { join "\n", map { my $val = ddg_format($it->{$_}, $lev+1); if ($val =~ /\n/) { INDENT x $lev . "$_:\n$val"; } else { $val =~ s/^\s+//; INDENT x $lev . "$_: $val"; } } sort keys %$it; } else { join "\n", map { ddg_format($_, $lev+1) } @$it; } } if (!defined caller) { eval 'use Text::Wrap'; print ddg_format(ddg("@ARGV"), 0), "\n"; }

Comment on Sometimes Perl is awesome: Duck Duck Go edition
Download Code
Re: Sometimes Perl is awesome: Duck Duck Go edition
by Anonymous Monk on Feb 07, 2012 at 21:13 UTC

    DuckDuckGo even supports https and an html interface, though I am at a loss as to why perlmonks choses to link to http versions instead of https

    but ddg seems to have issues with exact duplicates, esp to perlmonks (google is marginally better in eliminating exact duplicates)

    FWIW, I lament what search engines have become, almost useless, if they're not limiting results to 10 per page, if they're not auto correcting your queries, they limiting the total result and lying about it

      That was my "choice," not Perlmonks', and it wasn't deliberate. My feeling is that if someone is tracking you closely enough to sniff unencrypted packets, encryption won't stop them. Google's results are still better, but they come at a significant price. I have found DDG good enough for most queries. It falls down on more obscure stuff -- it has a smaller index -- but I always try it first before I resort to Scroogle and Google.

        I switched to DuckDuckGo primarily because google's results were becoming more frequently more frustrating and very often completely useless. The fact that DuckDuckGo has sane privacy features was "just" a huge added 'plus'.

        Over a decade ago I realized that the single most important feature of a search engine is: Show results that match all of the search terms before showing any that only match some of them. The number two feature for me is to make it obvious where the line between "matches all" and "matches some" falls in the presented results.

        Google spent a good decade following this maxim by virtue of only showing results that matched all search terms. Over that time, I've griped a lot about places that make this "classic rookie mistake" in regard to searching. CPAN is a great example of a place that does a truly horrid job when given more than one search term.

        But several months ago, google decided to start silently throwing in results that only matched some of your search terms. This was not just "the last straw" for me atop the growing pile of steps google kept taking to pay less and less attention to what I actually asked them to search for. It was also a fundamental mistake that, as always, turned google into a frequently useless search service.

        Yes, sometimes my DuckDuckGo search finds nothing and I fall back to another service. But for most other cases, DuckDuckGo gives better and often much better results than google. By searching DuckDuckGo first, I get both of my top features in a search engine. The obvious "line" is when I resort to google's much more numerous and much less selective list of results.

        I thought PM linked to the https+html version but it is indeed currently linking to the http+html version. https+html seems a better choice to me as well.

        - tye        

        I have found DDG good enough for most queries. It falls down on more obscure stuff -- it has a smaller index -- but I always try it first before I resort to Scroogle and Google.

        DDG's !bang syntax allows to have your cake and eat it too.

        If I find DDG's results inadequate, and want to continue the search with Google, I just prepend !g to my search terms. DDG will get me Google's results, with all the no-tracking goodness.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://952336]
Approved by LanX
Front-paged by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2014-08-01 02:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (256 votes), past polls