Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Enforcing growth of regex

by Eimi Metamorphoumai (Deacon)
on Nov 23, 2005 at 16:21 UTC ( [id://511158]=note: print w/replies, xml ) Need Help??


in reply to Enforcing growth of regex

I think I understand what your problem is. Basically, you're trying to use a regexp match in a loop, creating your own backoff by altering part that was matched. As you found out, that's really not going to work. Although I'm sure there are different ways to approach this, the most natural seems to be to put all your matching into a single regexp. So when you say you want the journal to be no more than 10 words, specify that in your regexp. Then the regexp engine can do the backoff for you, and life should be good. My code below is rather severely rewritten, mostly because that's what it took for me to understand what you were doing.
#!/usr/bin/perl # # parse publications strings # use warnings; use strict; use Data::Dumper; my $TITLE = 'title'; my $YEAR = 'year'; my $START_PAGE = 'start_page'; my $END_PAGE = 'end_page'; my $JOURNAL = 'journal'; my $TYPE = 'type'; my $AUTHORS = 'authors'; my $VOLUME = 'volume'; sub parse_pub ($) { my $string = shift @_; local $_; my %ret = (); @ret{$AUTHORS, $TITLE, $TYPE, $JOURNAL, $VOLUME, $START_PAGE, $END_PAGE, $YEAR} = $string =~ m/^\d+\.\s+ #citation number ([^:]+):\s+ #authors (.+?[.?!])\s+ #title (as short as possible) (\(\w+.?\)\s+)? #type (optional) ((?:\w+[.?!]?\s+){1,10}?) #journal ([\w()]+):\s+ #volume (\d+)-(\d+),\s+ #start page, end page (\d+)\.?$ #year /x or return undef; #not sure the best way to fail gracefully $ret{$JOURNAL} =~ s/\s+$//; return %ret; } my $line = "110. Wunder, E.; Burghardt, U.; Lang, B.; Hamilton, L.: Fa +nconi's anemia: anomaly of enzyme passage through the nuclear membran +e? Anomalous intracellular distribution of topoisomerase activity in +placental extracts in a case of Fanconi's anemia. Hum. Genet. 58: 149 +-155, 1981."; print "$line\n"; my %pub = parse_pub($line); #print Dumper(\%pub); print "J:$pub{$JOURNAL}\n\n";

Replies are listed 'Best First'.
Re^2: Enforcing growth of regex
by Hena (Friar) on Nov 24, 2005 at 07:53 UTC
    That does it, thanks :). I had it splitted to two sections, since there are citations that do not follow the "normal" format given above and the first would have parsed that input as well.

    But I quess that I can do another pattern matching if that pattern fails. Since that does the same thing really.

    Btw. How does that assingment to %ret works? AFAIK the match returns the list of words (camel book p.151 is an example). But how @ret turns to %ret I do not understand.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://511158]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (2)
As of 2024-04-26 03:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found