What have you tried?
What is your code?
Where do you have problems?
Please show a small, self-contained script (maybe 20 lines, no longer than 50 lines) that shows input data, actual output, desired output. Please describe where you encounter problems.
Maybe you want to see perlretut or perlre.
| [reply] [Watch: Dir/Any] |
Hi Corion,
I have already given required input and output. See the below code what i am trying to do? I need to identify years in a line and it may come like 2005, 2005a, a2005, May 31, 2005 some thing like that. Here first i am trying to find the occurance.
I need the below output:
1. i need to find out the words which has 4 digits number. it's my first requirement.
2. then i should collect all years and needs to find before/after two words due to full date appearances.
$var='Hobbs, F. 2005a. Examining American Household Composition: b1990
+ and 2000. U.S. Census Bureau, Census 2000 Special Reports, CENSR-24.
+ I.S. Government Printing Office, Washington, DC.';
#$var=~s/(\w?) (\w?) ([0-9]{4}+[a-zA-Z]?) (\w?) (\w?)/&identify_year($
+1.$2.$3.$4.$5)/ge;
$var=~s/([0-9]{4})/&identify_year($1)/ge;
sub identify_year
{
my ($input)=@_;
print "$input\n";
return ($input);
}
| [reply] [Watch: Dir/Any] [d/l] |
Why do you try s/.../identify_year($1)/ge? What is that supposed to do? I thought your objective was to identify a year and the surrounding words?
If you want to know whether there is one or more occurrence of a regular expression, you can use the following idiom:
my $var = "This is the year 2000.";
my @matches = ($var =~ /([0-9]{4})/g);
The regular expression I gave will only find four digits. You will need to modify that regular expression to also recognize two words before that year and two words after that year. | [reply] [Watch: Dir/Any] [d/l] [select] |
Apart from Corions request, please ask yourself the following questions
- How do I identify a year? (4 digits might do the trick)
- How do I identify a word? (any characters surrounded by blanks is probably a good start)
Now you may try to find the sequence "word word year word word" in your text. This can be done by a regex (Corion has already pointed to the docu).
Regexes have also the possibility to loop through a string to find all occurences - check the docu.
Once the basics work, you can fine-tune as you want (word separated by colons, years like 100BC ...)
HTH Rata
| [reply] [Watch: Dir/Any] |
You could split your string into words and examine each word.
If the word contains four connected digits, print the previous two words, the examined word itself, and the following two words.
| [reply] [Watch: Dir/Any] |