Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

find one by one occurances

by Selvakumar (Scribe)
on Jul 21, 2010 at 08:41 UTC ( [id://850574]=perlquestion: print w/replies, xml ) Need Help??

Selvakumar has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: find one by one occurances
by Corion (Patriarch) on Jul 21, 2010 at 08:45 UTC

    What have you tried?

    What is your code?

    Where do you have problems?

    Please show a small, self-contained script (maybe 20 lines, no longer than 50 lines) that shows input data, actual output, desired output. Please describe where you encounter problems.

    Maybe you want to see perlretut or perlre.

      Hi Corion,
      I have already given required input and output. See the below code what i am trying to do? I need to identify years in a line and it may come like 2005, 2005a, a2005, May 31, 2005 some thing like that. Here first i am trying to find the occurance.
      I need the below output:
      1. i need to find out the words which has 4 digits number. it's my first requirement.
      2. then i should collect all years and needs to find before/after two words due to full date appearances.

      $var='Hobbs, F. 2005a. Examining American Household Composition: b1990 + and 2000. U.S. Census Bureau, Census 2000 Special Reports, CENSR-24. + I.S. Government Printing Office, Washington, DC.'; #$var=~s/(\w?) (\w?) ([0-9]{4}+[a-zA-Z]?) (\w?) (\w?)/&identify_year($ +1.$2.$3.$4.$5)/ge; $var=~s/([0-9]{4})/&identify_year($1)/ge; sub identify_year { my ($input)=@_; print "$input\n"; return ($input); }

        Why do you try s/.../identify_year($1)/ge? What is that supposed to do? I thought your objective was to identify a year and the surrounding words?

        If you want to know whether there is one or more occurrence of a regular expression, you can use the following idiom:

        my $var = "This is the year 2000."; my @matches = ($var =~ /([0-9]{4})/g);

        The regular expression I gave will only find four digits. You will need to modify that regular expression to also recognize two words before that year and two words after that year.

Re: find one by one occurances
by Ratazong (Monsignor) on Jul 21, 2010 at 09:09 UTC

    Apart from Corions request, please ask yourself the following questions

    • How do I identify a year? (4 digits might do the trick)
    • How do I identify a word? (any characters surrounded by blanks is probably a good start)

    Now you may try to find the sequence "word word year word word" in your text. This can be done by a regex (Corion has already pointed to the docu). Regexes have also the possibility to loop through a string to find all occurences - check the docu.

    Once the basics work, you can fine-tune as you want (word separated by colons, years like 100BC ...)

    HTH Rata
Re: find one by one occurances
by linuxer (Curate) on Jul 21, 2010 at 09:02 UTC

    You could split your string into words and examine each word.

    If the word contains four connected digits, print the previous two words, the examined word itself, and the following two words.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://850574]
Approved by linuxer
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2024-03-29 10:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found