Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Tonight I was going to do some work, but first I had to set up my environment on a new (shared) computer. That meant configuring my WM (fluxbox), which included replacing the drab light grey background that comes set by default. Since I had some time and didn't really want to work on real things, I ended up writing something I wanted to have for a while: a script to set the background (wallpaper) to NASA's Astronomy Picture of the Day.

The intended use is to run it as a cron job. The default functionality will maintain a symlink to the latest image providing easy access for a WM or whatever else you want to use it. The metadata (title, credit, description) is extracted from the page and stored in a YAML file, mainly because I'm considering adding an ImageMagick script to create an annotated image. This would require the display resolution to be provided in the config section so as to render the text in an appropriate size and position relative to the screen. Depending on your WM, it may be simpler to create overlays/widgets/gadgets/etc. to display this data. Or you can just ignore it.

You can also pass a date as the argument (in YYMMDD format, to match the URL format of the APOD archive) to fetch old pictures. The date is not validated beyond checking that it consists of 6 digits. Also the script will refuse to overwrite files, because that makes sense to me. I used a number of such "ghetto" Ways To Do It (noted in comments) because I wanted to keep it simple and functional. I had to use some pretty ghetto techniques to parse the APOD page, which is completely invalid and uses no semantic markup at all, so I figured I might as well ghetto-ize the whole script. If you want to use this, you should probably tinker with it to suit your needs.

Requires HTML::Parser, LWP and YAML::XS. Intended for POSIX systems only. If you're going to use this in Windows, you should at the very least set $LINK to undef

EDIT: added a line to remove old CURRENT link before updating it... apparently symlink will fail when asked to overwrite a link...

#!/usr/bin/perl -w # Downloads NASA Astromy Picture of the Day # with metadatas in YAML format (HTML is stripped) # # Images are downloaded to a directory and stored # by date. # # Script can also maintain a symlink to the latest # picture for anyone who cares to use it to any end # # Modify the configuration in the main package below use strict; package HTML::Parser::AOTD; use base q/HTML::Parser/; my $state = 'date'; my $count = 0; sub start { my ($self, $tagname, $attr, $attrseq, $text) = @_; if ($state eq 'date') { ++$count if $tagname eq 'p'; } elsif ($state eq 'image') { if ($tagname eq 'a') { $self->{image} = $attr->{href}; $state = 'title'; } } elsif ($state eq 'title') { if ($tagname eq 'b') { ++$count; } } elsif ($state eq 'desc') { if ($tagname eq 'center') { # ghetto whitespace tidying for ($self->{desc}) {s/\s/ /gs;s/ / /g;s/^ +//;s/ +$//} $count = 0; $state = 'done'; } } } sub text { my ($self, $text) = @_; if ($state eq 'date' && $count == 2) { $text =~ s/\s*$//; $text =~ s/^\s*//; $self->{date} = $text; $count = 0; $state = 'image'; } elsif ($state eq 'title' && $count) { for($text){s/^ +//;s/ +$//} $self->{title} = $text; $count = 0; $state = 'credit'; } elsif ($state eq 'credit') { if ($count > 1) { chomp $text; $self->{credit} .= $text; } elsif ($text =~ /Image Credit/i) { ++$count; } } elsif ($state eq 'desc') { if ($count > 1) { $self->{desc} .= $text; } elsif ($text =~ /Explanation/i) { ++$count; } } } sub end { my ($self, $tagname) = @_; if ($state eq 'credit') { ++$count if $tagname eq 'b'; if ($tagname eq 'center') { $count = 0; $state = 'desc'; } } elsif ($state eq 'desc') { ++$count if $tagname eq 'b'; } } 1; package main; use LWP::Simple; use YAML::XS; # CONFIG # base url my $AOTD_URL = q||; # directory to store data my $DIRECTORY = q|/home/shared/apod/|; # name of symlink to current image/yaml # written for Unix-like systems # to disable, set to undef my $LINK = "$DIRECTORY/CURRENT"; # ghetto date conversion my %month = ( January => '01', February => '02', March => '03', April => '04', May => '05', June => '06', July => '07', August => '08', September => '09', October => '10', November => '11', December => '12' ); sub usage { print <<END; Usage: [YYMMDD] See source for configuration options END exit } my $arg = shift; # ghetto help option usage if ($arg && $arg =~ /^--?h(elp)?$/); if ($arg) { if ($arg =~ /^\d{6}$/) { # ghetto date validation $AOTD_URL .= "ap$arg.html"; } else { warn "Invalid date. Using current image."; $arg = undef; } } my $p = HTML::Parser::AOTD->new(); my $html = get($AOTD_URL) or die "Could not fetch from $AOTD_URL. Maybe the date is wrong?\n"; $p->parse($html); for ($p->{date}) { s/ /-/g; s/([A-Za-z]+)/$month{$1}/e; # ghetto date conversion } my $file = "$DIRECTORY/$p->{date}.yaml"; die "$file exists - remove manually to download again\n" if -e $file; my ($ext) = $p->{image} =~ /\.(\w+)$/; my $imgfile = "$DIRECTORY/$p->{date}.$ext"; getstore("$AOTD_URL_BASE/$p->{image}", $imgfile) == RC_OK or die "Something went wrong when downloading $p->{image}: $!"; open my $fh, '>', $file or die "can't write to $file: $!"; print $fh Dump { title => $p->{title}, credit => $p->{credit}, desc => $p->{desc}, imgfile => $imgfile, }; close $fh; if ($LINK && !$arg) { -e && unlink for ($LINK, "$LINK.yaml") symlink $file, "$LINK.yaml"; symlink $imgfile, $LINK; } __END__ `,= (K) 3178 \/\/_____\/ /\ \ / / \ /__/__ ALL RIGHTS REVERSED

$,=qq.\n.;print q.\/\/____\/.,q./\ \ / / \\.,q.    /_/__.,q..
Happy, sober, smart: pick two.

In reply to NASA's Astronomy Picture of the Day by xyzzy

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others musing on the Monastery: (7)
    As of 2020-02-21 18:48 GMT
    Find Nodes?
      Voting Booth?
      What numbers are you going to focus on primarily in 2020?

      Results (97 votes). Check out past polls.