Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Re^2: parsing a bibliography

by ww (Archbishop)
on Dec 02, 2004 at 16:48 UTC ( #411853=note: print w/replies, xml ) Need Help??

in reply to Re: parsing a bibliography
in thread parsing a bibliography

wonderful! Wish I could ++ your solution repeatedly! This writeup led to a "Eureka!" moment; the kind of haze-clearing that makes PM so valuable to beginners like me.

request: please add to our understanding by commenting lines of regex, esp that part of line8 reading


(grouped but non-capture??)

and in line13,

( [^\[]+ )

which, as I read Owl (pocket ref), means capture one-or-more of a class including not-an-open_BRKT and close_BRKT ...which doesn't make sense to me, and -- more importantly, doesn't seem to WORK that way.

Replies are listed 'Best First'.
Re^3: parsing a bibliography
by BrowserUk (Pope) on Dec 02, 2004 at 17:36 UTC

    The regex commented.

    my( $authors, $title, $thing, $pub, $date, $comment, $no ) = m/ ^ ## Author(s): Capture the minimum needed to satisfy that: ## a) It ends with a '. ' ## b) And the next word is not an initial ## IE: Lookahead and check the next word starts with ## 1 uppercase *and* one lowercase character. -( .*? \. ) \s(?=[A-Z][a-z]) ## Title: Greedily capture something that ends with '. ' ( .+ ) \.\s+ ## Location: Non-greedily capture ## Ends with a ': '. ## Doesn't contain a ':' ( [^:]+? ) : \s+ ## Publisher: ## Single word followed by a ', ' (\S+), \s+ ## Year: Capture Four digits ## Discard anything else upto '. ' ( \d{4} ) [^.]* \. \s+ ## Comment: Greedy capture non-'[' characters ## Ie. Stop capturing when you see a '[' ( [^\[]+ ) ## No: Capture 1 (or more) digits between '[' & ']' ## Discard any trailing space to the EOS. \[ ( \d+ ) \] \s* $ /x;

    Examine what is said, not who speaks.
    "But you should never overestimate the ingenuity of the sceptics to come up with a counter-argument." -Myles Allen
    "Think for yourself!" - Abigail        "Time is a poor substitute for thought"--theorbtwo         "Efficiency is intelligent laziness." -David Dunham
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://411853]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (2)
As of 2020-08-13 00:11 GMT
Find Nodes?
    Voting Booth?
    Which rocket would you take to Mars?

    Results (68 votes). Check out past polls.