Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Crawling old podcast mp3 files

by tessio (Initiate)
on Nov 15, 2012 at 19:43 UTC ( #1004056=perlquestion: print w/ replies, xml ) Need Help??
tessio has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

There are some old links to mp3 files from a podcast I like scattered on the web. The links have the form:

http://p.audio.uol.com.br/mtv/colaboradores/matandorobosgigantes/games/MRG_87G_Aperte_Aqui_Para_Ser_Babaca.mp3

But when I try to access the path /games, or the file /games/index.html, to get the list of all mp3 files in the directory, I'm redirected to a error page.

Is there a way to crawl this directory for it's .mp3 files?

Thanks!

Comment on Crawling old podcast mp3 files
Select or Download Code
Re: Crawling old podcast mp3 files
by grondilu (Pilgrim) on Nov 15, 2012 at 20:13 UTC

    The webmaster does not want you to access the list of the files in this directory. So I doubt there is any way to do it, unless you brute-force the possible filenames, which can pretty much be considered as an attack towards the website. So don't do that.

    If you really want those files, I guess the best way is to send an email to the webmaster and ask politely.

Re: Crawling old podcast mp3 files
by Tommy (Chaplain) on Nov 15, 2012 at 20:18 UTC

    Can you clarify whether or not you already know the file names of the mp3 files that you want? Your post would suggest you do not know the filenames.

    If you do not know the filenames, is there a web page that lists them? Your original post would suggest there is not.

    If you do not know the filenames in adance, and you have no way to obtain the filenames, this isn't a Perl issue. You have to solve one of those two problems before you can consider using Perl tools to automatically crawl the podcasts with something like WWW::Mechanize

    --
    Tommy
    $ perl -MMIME::Base64 -e 'print decode_base64 "YWNlQHRvbW15YnV0bGVyLm1lCg=="'
Re: Crawling old podcast mp3 files
by Anonymous Monk on Nov 15, 2012 at 23:26 UTC
    To obtain a list of files on the server you must study SQL injection and the latest PHP CVEs.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1004056]
Approved by Tommy
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (17)
As of 2014-07-24 16:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (162 votes), past polls