parsing weblogs

by VicBalta (Scribe)
on Aug 21, 2001 at 20:22 UTC

VicBalta has asked for the wisdom of the Perl Monks concerning the following question:


I want to parse my web log files and then push them into a data base. For my data base im using my SQL. This question isn't about the data base part. but how to parse the web log. I have searched the archives and cpan and what I have found are two modules. First is Apache::parselog and Second AnyData::Format::Weblog. these are my fields of my weblog file.

#Fields: date time c-ip s-sitename s-ip cs-method cs-uri-stem cs-uri-query sc-status sc-bytes cs-bytes time-taken cs-host cs(User-Agent)

If any one has worked wtih and know what module would best suit me let me know

Replies are listed 'Best First'.
Re: parsing weblogs
by count0 (Friar) on Aug 21, 2001 at 21:03 UTC
    I'm farily certain that the AnyData module isn't what you are looking for. Especially if you aren't using AnyData for anything else.

    The Apache::ParseLog is *wonderful* for parsing Apache logs. I personally love it... But the format you show there doesn't look like any Apache log that I've seen before (although with something like mod_log_config? you could change it. I don't know much about it).

    But I digress..
    In my opinion, it might just be easiest to parse that manually. Just use split to break up the fields and stick them into arrays. Or AoA for the entire log. There are tons of ways to work with the data once you break it up, depending on how and what you're storing in the database.
Node title "parsing weblogs" is misleading
by dave_aiello (Pilgrim) on Aug 22, 2001 at 17:25 UTC

    I suggest that you change the title of your question node, because the word "weblog" means a Web Site like Slashdot or Scripting News to me. When I was looking through the Newest Nodes and saw only the title, I thought that your question pertained to parsing Web Site content. Perhaps some other people here had a similar reaction.

    Dave Aiello
    Chatham Township Data Corporation

      yes, actually i had the same problem searching here on perlmonks for perl-weblog issues and finding the term "weblog" being used mostly meaning "apache logfile".

      what is a weblog?
      "a website that is updated frequently, with new material posted at the top of the page."
      or according to
      A weblog is personal -- it's done by a person, not an organization.(..)
      A weblog is on the Web -- it doesn't get printed, it can be updated frequently, it's very low cost to produce, and it can be accessed through a Web browser.
      A weblog is published -- words flow through templates, the process is automated, the writer and designer are elevated.(..)
      And finally, a weblog is part of communities. No weblog stands alone, they are relative to each other and to the world.(..)"

      but it seems to me the term is not completely defined, as slashdot is also called a weblog.
      if one wants to get more information on weblogs, i can recommend the page of rebecca blood which even wrote some books about 'blogging (find more information on her site).

      a good search engine for blogs is

      btw: the perlmonks site seems to be some hybrid of a Wiki and a Weblog.. or did i miss something?

