Re^2: Perl Module for identifying country name

by maheshkumar (Sexton)
on Aug 03, 2012 at 15:07 UTC

in reply to Re: Perl Module for identifying country name
in thread Perl Module for identifying country name

Actually what I want is just to find which country names are there in a text file

for grep i think i will need to mention if it is United States or Germany right? This way I can miss the country name Canada if it is in the file

Re^3: Perl Module for identifying country name
by CountZero (Bishop) on Aug 03, 2012 at 15:54 UTC
    You can use a regular expression to find all (English) country names.
    (?-xism:(?:S(?:a(?:int (?:(?:Vincent and the Grenadine|Kitts and Nevi) +s|Lucia)|o Tome and Principe|(?:udi Arabi|mo)a|n Marino)|o(?:uth (?:( +?:Afric|Kore)a|Sudan)|lomon Islands|malia)|(?:(?:lov(?:ak|en)|yr)i|ri + Lank)a|w(?:(?:itzer|azi)land|eden)|e(?:ychelles|negal|rbia)|i(?:erra + Leon|ngapor)e|u(?:riname|dan)|pain)|B(?:o(?:(?:snia and Herzegovi|ts +wa)n|livi)a|a(?:h(?:amas|rain)|ngladesh|rbados)|u(?:r(?:kina Faso|und +i|ma)|lgaria)|e(?:l(?:arus|gium|ize)|nin)|r(?:azil|unei)|hutan)|M(?:a +(?:l(?:a(?:ysia|wi)|dives|ta|i)|urit(?:ania|ius)|c(?:edonia|au)|rshal +l Islands|dagascar)|o(?:n(?:(?:tenegr|ac)o|golia)|zambique|ldova|rocc +o)|icronesia|exico)|C(?:o(?:(?:sta Ric|lombi)a|te d'Ivoire|moros)|a(? +:m(?:bodia|eroon)|pe Verde|nada)|(?:entral African|zech) Republic|h(? +:i(?:le|na)|ad)|(?:roati|ub)a|yprus)|T(?:u(?:rk(?:menistan|ey)|nisia| +valu)|a(?:(?:jikist|iw)an|nzania)|rinidad and Tobago|o(?:nga|go)|imor +-Leste|hailand)|A(?:(?:n(?:tigua and Barbud|dorr|gol)|(?:l(?:ban|ger) +|ustr(?:al)?)i|r(?:gentin|meni))a|(?:fghanist|zerbaij)an)|P(?:a(?:l(? +:estinian Territories|au)|(?:pua New Guine|nam)a|kistan|raguay)|o(?:r +tugal|land)|hilippines|eru)|N(?:e(?:therland(?:s Antille)?s|w Zealand +|pal)|i(?:ger(?:ia)?|caragua)|or(?:th Korea|way)|a(?:mibia|uru))|G(?: +u(?:inea(?:-Bissau)?|(?:atemal|yan)a)|e(?:orgia|rmany)|re(?:nada|ece) +|a(?:mbia|bon)|hana)|E(?:(?:(?:quatorial Guin|ritr)e|(?:thiop|ston)i) +a|(?:(?:l Salv|cu)ad|ast Tim)or|gypt)|L(?:i(?:(?:b(?:eri|y)|thuani)a| +echtenstein)|e(?:banon|sotho)|a(?:tvia|os)|uxembourg)|U(?:nited (?:St +ates of America|Arab Emirates|Kingdom)|zbekistan|kraine|ruguay|ganda) +|D(?:e(?:mocratic Republic of the Congo|nmark)|ominica(?:n Republic)? +|jibouti)|I(?:r(?:a[nq]|eland)|nd(?:ones)?ia|celand|srael|taly)|K(?:( +?:azakh|yrgyz)stan|iribati|osovo|uwait|enya)|R(?:(?:(?:oman|uss)i|wan +d)a|epublic of the Congo)|H(?:o(?:n(?:g Kong|duras)|ly See)|ungary|ai +ti)|V(?:enezuela|anuatu|ietnam)|J(?:a(?:maica|pan)|ordan)|F(?:i(?:nla +nd|ji)|rance)|Z(?:imbabwe|ambia)|(?:Yeme|Oma)n|Qatar))

    BTW, you will not find "United States" with this regex since the official name is "United States of America".


