Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

How to screen-scrape a page that uses a Java applet for authentication?

by grinder (Bishop)
on Sep 09, 2004 at 09:10 UTC ( #389595=perlquestion: print w/replies, xml ) Need Help??

grinder has asked for the wisdom of the Perl Monks concerning the following question:

Monks, I have an Overland NEO4100 tape library, a beast with 60-odd tapes and two tape drives. Sometimes its notion of which tape is stored where and the backup software get out of sync. This causes a certain amount of grief because the software tells the library to load tape A, and when it reads the tape header it finds it has tape B instead.

As far as I know this information (tape A is in slot 14, tape B is mounted on drive 2 etc. etc.) isn't available via SNMP. The tape library has a web interface that displays this information, and so I'd like to scrape it. Access to the web page is authenticated. No password, no web.

The problem is that the authentication is performed by a Java applet. All it does is take a password (two different passwords are admitted, for read-only or administration privileges) and a couple of checkboxes, one of which lets you choose a presentation that uses <frame>s (the default) or without.

If I GET the home page of the library, it gives, in all its glory:

<html> <head> <P><TABLE BORDER=0 CELLPADDING=0 CELLSPACING=0 WIDTH="100%"> <TR><TD WIDTH="100%" ALIGN=CENTER> <APPLET CODE="Login.class" WIDTH=545 HEIGHT=418 ALT="[ Login applet is not available ]"> <PARAM NAME=BrowserId VALUE=8b9b1edea86776b6953e24cdfab8e8ce> <PARAM NAME=Personality VALUE=0> <P ALIGN=LEFT>If this message is visible, you probably need to enable +Java operation for your brows. </APPLET> </TD></TR></TABLE> </BODY></HTML>

Does anyone have some clues on the best way to proceed? I have a feeling that I shall have to use tcpdump to see what the Java applet sends to server, and mimick that. Or is there Another Way To Do It?

Aside: I do like the Extreme Body Modification bit about having brows that are Java-enabled.

- another intruder with the mooring of the heat of the Perl

  • Comment on How to screen-scrape a page that uses a Java applet for authentication?
  • Download Code

Replies are listed 'Best First'.
Re: How to screen-scrape a page that uses a Java applet for authentication?
by Corion (Pope) on Sep 09, 2004 at 09:16 UTC

    As the Java applet has to respect your proxy settings, first try if the Java applet uses simple HTTP requests by setting up a small logging proxy, possibly through HTTP::Recorder or HTTP::Proxy.

    If the Java applet connects through a different port than port 80, and doesn't use HTTP, you will have to break out an actual network sniffer and sniff the connection. I would use Net::PCap as the sniffer, because in the end you will want to write Perl code to emulate the Java applet anyway, but of course if you're more comfortable with ethereal or tcpdump, use that.

Re: How to screen-scrape a page that uses a Java applet for authentication?
by paulbort (Hermit) on Sep 09, 2004 at 18:21 UTC
    Another way to proceed would be to hack on the browser side. If you're using Firefox, an Autofill extension might help.

    If you're using Windows, you might be able to use the Win32::CtrlGUI modules to get the handle of the javascript pop-up, and fill in the fields that way. ( I have no idea if the same kind of thing can be done in X.)

    If you are using, or are allowed to look at the source code for the AMANDA backup software, it has a complete and working implementation of using the changer's built-in bar code reader to determine which tapes are in which slots. (Overland's web site says that bar code support is built in.)

    I checked the Overland web site, but couldn't find any reference to an API.

    If you go the packet-sniffing route, be prepared for at least some encryption. The "BrowserID" parameter looks like a session ID or possibly part of an encryption key. Check to see if it changes from time to time, always increases, etc.

    Of all these options, I think your best bet is to talk to the changer as a SCSI device if possible. (I'm assuming the device is connected to a server via SCSI.)


    --
    Spring: Forces, Coiled Again!
Re: How to screen-scrape a page that uses a Java applet for authentication?
by duaneg (Novice) on Sep 09, 2004 at 18:54 UTC
    If you decide to start packet sniffing you might also want to attack it from the other side at the same time by decompiling or disassembling the Java applet. Java decompiles very well, thanks to its relatively high-level bytecode. Unless it has been obfusticated you can often get quite close to the original source code using tools like JODE. Or you can just look at the bytecode directly with the standard JDK disassembler, javap.
Re: How to screen-scrape a page that uses a Java applet for authentication?
by YuckFoo (Abbot) on Sep 09, 2004 at 21:27 UTC
    You might be able to use the mtx utility to query your tape library.

    YuckFoo

      I agree, grinder, try mtx. There is a standard set of SCSI commands for handling changers, which are likely whatever your backup system is using anyway. With a quick google search, I see that it can use a SCSI LVD/SE, SCSI HVD, or FC interface. Even if you have the FC version, it should accept commands and inquiries from mtx.

      Too lazy to log in,
      hattmoward =)

      Oh also, shameless backup plug: Bacula keeps your servers smelling fresh.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://389595]
Approved by Arunbear
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2019-08-26 09:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?