Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re^3: parsing a bibliography

by BrowserUk (Pope)
on Dec 02, 2004 at 17:36 UTC ( #411875=note: print w/replies, xml ) Need Help??

in reply to Re^2: parsing a bibliography
in thread parsing a bibliography

The regex commented.

my( $authors, $title, $thing, $pub, $date, $comment, $no ) = m/ ^ ## Author(s): Capture the minimum needed to satisfy that: ## a) It ends with a '. ' ## b) And the next word is not an initial ## IE: Lookahead and check the next word starts with ## 1 uppercase *and* one lowercase character. -( .*? \. ) \s(?=[A-Z][a-z]) ## Title: Greedily capture something that ends with '. ' ( .+ ) \.\s+ ## Location: Non-greedily capture ## Ends with a ': '. ## Doesn't contain a ':' ( [^:]+? ) : \s+ ## Publisher: ## Single word followed by a ', ' (\S+), \s+ ## Year: Capture Four digits ## Discard anything else upto '. ' ( \d{4} ) [^.]* \. \s+ ## Comment: Greedy capture non-'[' characters ## Ie. Stop capturing when you see a '[' ( [^\[]+ ) ## No: Capture 1 (or more) digits between '[' & ']' ## Discard any trailing space to the EOS. \[ ( \d+ ) \] \s* $ /x;

Examine what is said, not who speaks.
"But you should never overestimate the ingenuity of the sceptics to come up with a counter-argument." -Myles Allen
"Think for yourself!" - Abigail        "Time is a poor substitute for thought"--theorbtwo         "Efficiency is intelligent laziness." -David Dunham
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://411875]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (8)
As of 2016-10-23 15:31 GMT
Find Nodes?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?

    Results (301 votes). Check out past polls.