http://www.perlmonks.org?node_id=985257


in reply to Perl Module for identifying country name

It's a little difficult to comprehend what you're asking for, but my guess is that you could achieve your goal by using grep on the file for the country name that you're looking for. Does that help?

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Replies are listed 'Best First'.
Re^2: Perl Module for identifying country name
by maheshkumar (Sexton) on Aug 03, 2012 at 15:07 UTC

    Actually what I want is just to find which country names are there in a text file

    for grep i think i will need to mention if it is United States or Germany right? This way I can miss the country name Canada if it is in the file

      You can use a regular expression to find all (English) country names.
      (?-xism:(?:S(?:a(?:int (?:(?:Vincent and the Grenadine|Kitts and Nevi) +s|Lucia)|o Tome and Principe|(?:udi Arabi|mo)a|n Marino)|o(?:uth (?:( +?:Afric|Kore)a|Sudan)|lomon Islands|malia)|(?:(?:lov(?:ak|en)|yr)i|ri + Lank)a|w(?:(?:itzer|azi)land|eden)|e(?:ychelles|negal|rbia)|i(?:erra + Leon|ngapor)e|u(?:riname|dan)|pain)|B(?:o(?:(?:snia and Herzegovi|ts +wa)n|livi)a|a(?:h(?:amas|rain)|ngladesh|rbados)|u(?:r(?:kina Faso|und +i|ma)|lgaria)|e(?:l(?:arus|gium|ize)|nin)|r(?:azil|unei)|hutan)|M(?:a +(?:l(?:a(?:ysia|wi)|dives|ta|i)|urit(?:ania|ius)|c(?:edonia|au)|rshal +l Islands|dagascar)|o(?:n(?:(?:tenegr|ac)o|golia)|zambique|ldova|rocc +o)|icronesia|exico)|C(?:o(?:(?:sta Ric|lombi)a|te d'Ivoire|moros)|a(? +:m(?:bodia|eroon)|pe Verde|nada)|(?:entral African|zech) Republic|h(? +:i(?:le|na)|ad)|(?:roati|ub)a|yprus)|T(?:u(?:rk(?:menistan|ey)|nisia| +valu)|a(?:(?:jikist|iw)an|nzania)|rinidad and Tobago|o(?:nga|go)|imor +-Leste|hailand)|A(?:(?:n(?:tigua and Barbud|dorr|gol)|(?:l(?:ban|ger) +|ustr(?:al)?)i|r(?:gentin|meni))a|(?:fghanist|zerbaij)an)|P(?:a(?:l(? +:estinian Territories|au)|(?:pua New Guine|nam)a|kistan|raguay)|o(?:r +tugal|land)|hilippines|eru)|N(?:e(?:therland(?:s Antille)?s|w Zealand +|pal)|i(?:ger(?:ia)?|caragua)|or(?:th Korea|way)|a(?:mibia|uru))|G(?: +u(?:inea(?:-Bissau)?|(?:atemal|yan)a)|e(?:orgia|rmany)|re(?:nada|ece) +|a(?:mbia|bon)|hana)|E(?:(?:(?:quatorial Guin|ritr)e|(?:thiop|ston)i) +a|(?:(?:l Salv|cu)ad|ast Tim)or|gypt)|L(?:i(?:(?:b(?:eri|y)|thuani)a| +echtenstein)|e(?:banon|sotho)|a(?:tvia|os)|uxembourg)|U(?:nited (?:St +ates of America|Arab Emirates|Kingdom)|zbekistan|kraine|ruguay|ganda) +|D(?:e(?:mocratic Republic of the Congo|nmark)|ominica(?:n Republic)? +|jibouti)|I(?:r(?:a[nq]|eland)|nd(?:ones)?ia|celand|srael|taly)|K(?:( +?:azakh|yrgyz)stan|iribati|osovo|uwait|enya)|R(?:(?:(?:oman|uss)i|wan +d)a|epublic of the Congo)|H(?:o(?:n(?:g Kong|duras)|ly See)|ungary|ai +ti)|V(?:enezuela|anuatu|ietnam)|J(?:a(?:maica|pan)|ordan)|F(?:i(?:nla +nd|ji)|rance)|Z(?:imbabwe|ambia)|(?:Yeme|Oma)n|Qatar))

      BTW, you will not find "United States" with this regex since the official name is "United States of America".

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      My blog: Imperial Deltronics