I am trying to parse inter-language links in Wikipedia. I need to know to what languages a given page has links.
Let's say that $string is the page's text:
$string = "bla bla bla [[en:English]][[de:German]][[ga:Irish]] bla bla
+ bla";
And i need the list:
qw(en de ga);
I can do this:
@matches = ( $string = m/\[\[(en|de|ga):.+?]\]/g );
And then i get qw(en de ga) in @matches, but that's because i have only one pair of capturing brackets, which is a limitation.
If i do, for example:
@matches = ( $string = m/\[\[(en|de|ga):(.+?)]\]/g );
Then i'll get qw(en English de German ga Irish).
Is there a clever way to get a list of all the results from one pair of capturing brackets?
I tried using Perl 5.10's named captures and %-. Either it can't be done this way or i am doing incorrectly. I tried this:
my $string = 'bla bla [[en:English]][[de:German]][[ga:Irish]] bla bla'
+;
if (my @matches = ($string =~ m/\[\[(?<lang>en|de|ga):(.+?)\]\]/g)) {
say 'matches';
say 'matches: ', Dumper(\@matches);
say 'minus : ', Dumper(\%-);
say 'plus : ', Dumper(\%+);
}
i get this output:
matches
matches: $VAR1 = [
'en',
'English',
'de',
'German',
'ga',
'Irish'
];
minus : $VAR1 = {
'lang' => [
'ga'
]
};
plus : $VAR1 = {
'lang' => 'ga'
};
You see - only ('ga'), but is there some way to get:
$VAR1 = {
'lang' => [
'en',
'de',
'ga'
]
};
Thanks in advance for any help.