Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Download free Microsoft ebooks, fun with Mojolicious and CSS selectors

by marto (Bishop)
on Jul 21, 2017 at 15:35 UTC ( #1195726=CUFP: print w/replies, xml ) Need Help??

I was made aware that Microsoft are giving away free ebooks. Excuse the clickbaity page title, I have nothing to do with it. While people have posted wget scripts to download them all, it doesn't rename them so you end up with some random file names. I threw the script below together really quickly, consider it a cheap hacky but functional (no errors here) script. For each 'Category' it creates a directory, and uses Mojolicious/Mojo::UserAgent to get the page, parse what we need from it, download each file to the it's associated category directory, with the actual ebook name.

Caveats:

  • Ensure you have an up to date Mojolicious installed (cpanm Mojolicious).
  • Copy the script below into it's own directory before running.
  • Not all ebooks are available in all formats. I just select the top one in the list. Most are PDF, some are epub or .doc
#!/usr/bin/perl use strict; use warnings; no warnings 'utf8'; use Mojo::UserAgent; my $ebookURL = 'https://blogs.msdn.microsoft.com/mssmallbiz/2017/07/11/largest-free-m +icrosoft-ebook-giveaway-im-giving-away-millions-of-free-microsoft-ebo +oks-again-including-windows-10-office-365-office-2016-power-bi-azure- +windows-8-1-office-2013-sharepo/'; =head1 NAME ms-ebook-dl - Download free Microsoft ebooks =head1 DESCRIPTION A quick hack using L<Mojolicious> to download and properly name a bunc +h of free ebooks from Microsoft. =head1 INSTALLATION Ensure you have an up to date L<Mojolicious> installed: C<cpanm Mojolicious> Clone the repo: C<git clone https://github.com/MartinMcGrath/ms-ebook-dl> =head1 LICENSE This is released under the Artistic License. See L<perlartistic>. =head1 AUTHOR marto L<https://github.com/MartinMcGrath/> =head1 SEE ALSO L<http://perlmonks.org/?node_id=1195726> L<https://blogs.msdn.microsoft.com/mssmallbiz/2017/07/11/largest-free- +microsoft-ebook-giveaway-im-giving-away-millions-of-free-microsoft-eb +ooks-again-including-windows-10-office-365-office-2016-power-bi-azure +-windows-8-1-office-2013-sharepo/> =cut my $ua = Mojo::UserAgent->new; print "Get page\n"; my $res = $ua->get( $ebookURL )->res; # css selector we want the first table witin the entry-content div, sk +ipping # the first row which is a header, but not a 'th' tag. my $selector = 'div.entry-content table:first-of-type tr:not(:first-of +-type)'; warn "Parse page\n"; $res->dom->find( $selector )->each( sub{ my $category = $_->children->[0]->all_text; my $title = $_->children->[1]->all_text; my $url = $_->children->[2]->at('a')->attr('href'); my $type = $_->children->[2]->at('a')->all_text; # download each file print "downloading: $title\n"; # create category directory unless it already exists mkdir $category unless( -d $category ); $ua->max_redirects(5) ->get( $url ) ->result->content->asset->move_to($category . '/' . $title . '.' + . $type); # play nice sleep(7); });

Update: code updated with some POD, also on on github.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://1195726]
Front-paged by Corion
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2017-12-11 08:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (288 votes). Check out past polls.

    Notices?