in reply to Extracting web data

Your use of '^' is causing problems. '^' marks the beginning of an line and '$' marks the end of a line ("the buck stops here").

If the PMID is always numeric simply use:

if($foo->identifiers() =~ /PMID: (\d+)/) {print FH "$1\n";}

Update: reworded post.

Replies are listed 'Best First'.
Re^2: Extracting web data
by smandape1 (Acolyte) on Jun 13, 2011 at 14:37 UTC

    Thank you for your reply. Well, I tried using the above code but it doesn't seem to work. I getting the same output as below.

    http://www.ncbi.nlm.nih.gov/pubmed/4012367 PMID: PMID: 4012367 http://www.ncbi.nlm.nih.gov/pubmed/20215333 PMID: doi:10.1093/fampra/cmq003PMID: 20215333 http://www.ncbi.nlm.nih.gov/pubmed/20429974 PMID: PMID: 20429974 http://www.ncbi.nlm.nih.gov/pubmed/20338007 PMID: doi:10.1111/j.1600-0838.2009.01081.xPMID: 20338007 http://www.ncbi.nlm.nih.gov/pubmed/17438827 PMID: PMID: 17438827 http://www.ncbi.nlm.nih.gov/pubmed/17447555 PMID: PMID: 17447555

    Also, the PMID is always a number.

      Did you uncomment the if statement and comment the other print? The code I posted cannot print non numeric data. If the regular expression I provided does not match anything then please provide exactly what $foo->identifiers() prints.

        Yes, I did uncomment the if statement and comment the other print. But, I didn't get any output. $foo-> identifiers() prints the following output.

        PMID: PMID: 4012367

        it prints this if its only PMID else, if there are two identifiers it prints both the doi and the PMID as follows

        PMID: doi:10.1093/fampra/cmq003PMID: 20215333