Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

find one by one occurances

by Selvakumar (Scribe)
on Jul 21, 2010 at 08:41 UTC ( #850574=perlquestion: print w/ replies, xml ) Need Help??
Selvakumar has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,
I want to find the year's in a line. For this i need to find the first year and then i need to collect two words before and after of the year and so on for all year. How can i do that?
For example, Input
Hobbs, F. 2005. Examining American Composition: 1990 and 2000. U.S. Census Bureau, Census 2000 Special Reports, CENSR-24. I.S. Government Printing Office, Washington, DC.
Output i required

Hobbs, F. 2005. Examining American American Composition: 1990 and 2000. 1990 and 2000. U.S. Census

Comment on find one by one occurances
Download Code
Re: find one by one occurances
by Corion (Pope) on Jul 21, 2010 at 08:45 UTC

    What have you tried?

    What is your code?

    Where do you have problems?

    Please show a small, self-contained script (maybe 20 lines, no longer than 50 lines) that shows input data, actual output, desired output. Please describe where you encounter problems.

    Maybe you want to see perlretut or perlre.

      Hi Corion,
      I have already given required input and output. See the below code what i am trying to do? I need to identify years in a line and it may come like 2005, 2005a, a2005, May 31, 2005 some thing like that. Here first i am trying to find the occurance.
      I need the below output:
      1. i need to find out the words which has 4 digits number. it's my first requirement.
      2. then i should collect all years and needs to find before/after two words due to full date appearances.

      $var='Hobbs, F. 2005a. Examining American Household Composition: b1990 + and 2000. U.S. Census Bureau, Census 2000 Special Reports, CENSR-24. + I.S. Government Printing Office, Washington, DC.'; #$var=~s/(\w?) (\w?) ([0-9]{4}+[a-zA-Z]?) (\w?) (\w?)/&identify_year($ +1.$2.$3.$4.$5)/ge; $var=~s/([0-9]{4})/&identify_year($1)/ge; sub identify_year { my ($input)=@_; print "$input\n"; return ($input); }

        Why do you try s/.../identify_year($1)/ge? What is that supposed to do? I thought your objective was to identify a year and the surrounding words?

        If you want to know whether there is one or more occurrence of a regular expression, you can use the following idiom:

        my $var = "This is the year 2000."; my @matches = ($var =~ /([0-9]{4})/g);

        The regular expression I gave will only find four digits. You will need to modify that regular expression to also recognize two words before that year and two words after that year.

Re: find one by one occurances
by linuxer (Deacon) on Jul 21, 2010 at 09:02 UTC

    You could split your string into words and examine each word.

    If the word contains four connected digits, print the previous two words, the examined word itself, and the following two words.

Re: find one by one occurances
by Ratazong (Prior) on Jul 21, 2010 at 09:09 UTC

    Apart from Corions request, please ask yourself the following questions

    • How do I identify a year? (4 digits might do the trick)
    • How do I identify a word? (any characters surrounded by blanks is probably a good start)

    Now you may try to find the sequence "word word year word word" in your text. This can be done by a regex (Corion has already pointed to the docu). Regexes have also the possibility to loop through a string to find all occurences - check the docu.

    Once the basics work, you can fine-tune as you want (word separated by colons, years like 100BC ...)

    HTH Rata

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://850574]
Approved by linuxer
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2014-12-29 04:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (184 votes), past polls