Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Dealing with Names

by walkingthecow (Friar)
on Aug 28, 2008 at 18:31 UTC ( #707552=perlquestion: print w/replies, xml ) Need Help??

walkingthecow has asked for the wisdom of the Perl Monks concerning the following question:

I have an array that holds peoples names. The names can be as follows:
Daniel R Von Vanderschmidt
Daniel Von Vanderschmidt
Daniel De La Silvia
Daniel De Silvia
Daniel La Silvia
and so on...
Basically, I am trying to say if my script comes across Von *, join those two elements, or "De La" *, join those elements, or "La" *, join those elements. After the elements are joined, clear the De La...


So, for example:

0. Daniel 1.R 2.De 3.La 4.Amaya
becomes

0. Daniel 1. R 2. De La Amaya 3. 4.

With 0,1,2,3,4 being the number of elements...

Replies are listed 'Best First'.
Re: Dealing with Names
by betterworld (Curate) on Aug 28, 2008 at 18:37 UTC

    Hm, are you looking for something like this?

    my @names = ( 'Daniel R Von Vanderschmidt', 'Daniel Von Vanderschmidt', 'Daniel De La Silvia', 'Daniel De Silvia', 'Daniel La Silvia', ); for my $name (@names) { my @comps = $name =~ m{(?:Von|De La|La).*|\w+}g; print "[$_]" for @comps; print "\n"; }

    The output is:

    [Daniel][R][Von Vanderschmidt] [Daniel][Von Vanderschmidt] [Daniel][De La Silvia] [Daniel][De][Silvia] [Daniel][La Silvia]

    Update: Added the .* after re-reading your example output.

      I'm not trying to match them, I am trying to make anything with De La Whatever become one name in one element of the array.

      Say we have an array that looks like this:

      {Daniel}{De}{La}{Silva} (each bracket is an element)

      I am trying to make the array look like this:

      {Daniel}{De La Silva}{}{}

      However, say an array looks like this:

      {Daniel}{R}{Silva}

      then keep the array the way it is... only join those last names that have spaces (e.g. von Derfen).

        This is fun to play with but keep in mind that it cannot be solved perfectly. Von is not an entirely uncommon given/middle name, for example so Mark Von Shepard might correctly be {Mark}{Von}{Shepard} instead of {Mark}{Von Shepard}. If you don't get name data delimited correctly at the point of input, you're never going to reverse engineer it with 100% reliability.

        I think betterworld has given a pretty simple and elegant solution. Given the code you posted in Re^2: Dealing with Names, you could adapt betterworld's code like so:

        my $name_count = scalar split / /, $new_gecos; my @comps = $new_gecos =~ m{(?:Von|De La|La).*|\w+}g; push @comps, '' while scalar @comps < $name_count;

        The first and third lines are only there to give you the blank array elements you apparently want. I'm putting in an empty string there, but maybe you want undef. Season to taste.

Re: Dealing with Names
by hsmyers (Canon) on Aug 29, 2008 at 02:57 UTC
    You might investigate the CPAN module Lingua-EN-NameParse-1.24. While it is not obvious from your question, perhaps using Lingua's approach before the arrays are created might skip the problem entirely.

    --hsm

    "Never try to teach a pig to sing...it wastes your time and it annoys the pig."
Re: Dealing with Names
by GrandFather (Sage) on Aug 28, 2008 at 21:19 UTC

    I presume your problem is that you have the names stored disassembled in an array. So reassemble the names using a character that won't occur in the name text as a part separator, perform the fix up, then disassemble the string again:

    use strict; use warnings; my $match = join '|', 'Von', 'De La', 'De', 'La'; my @names = map {chomp; [split]} split "\n", <<NAMES; Daniel R Von Vanderschmidt Daniel Von Vanderschmidt Daniel De La Silvia Daniel De Silvia Daniel La Silvia NAMES for my $name (@names) { my $nameStr = join '~', @$name; $nameStr =~ s/~($match)~/~$1 /g; $name = [split '~', $nameStr]; } print join ('|', @$_), "\n" for @names;

    Prints:

    Daniel|R|Von Vanderschmidt Daniel|Von Vanderschmidt Daniel|De La Silvia Daniel|De Silvia Daniel|La Silvia

    Perl reduces RSI - it saves typing
Re: Dealing with Names
by toolic (Bishop) on Aug 28, 2008 at 18:39 UTC
    Please show the Perl code you have tried thus far. Enclose the code in "code" tags, as described in Writeup Formatting Tips.

    Also, show the exact output you get, any error/warning messages, and the exact output you expect, again in code tags.

      @name_breakdown=split(/ /,$new_gecos); if ($name_breakdown[1] eq "de" || $name_breakdown[1] eq "von" || $name +_breakdown[1] eq "van" or $name_brea kdown[1] eq "der" || $name_breakdown[1] eq "la" || $name_breakdown[1] +eq "del" || $name_breakdown[1] eq "el" || $name_breakdown[1|2] eq "le") { if ($name_breakdown[2] && $name_breakdown[3]) { $name_breakdown[1] = join(' ', $name_breakdown[1], $name_breakdow +n[2], $name_breakdown[3]); @END = splice(@name_breakdown, -2); } else { $name_breakdown[1] = join(' ', $name_breakdown[1], $name_breakdow +n[2]); pop @name_breakdown; } }
      This does not work because what if there is a middle intial (Daniel R De La Silva), it messes up. It does work if it is "Daniel De La Silva" though.
        • Turn the above into a subroutine, passing the subscript of $name_breakdown you wish to test (rather than 1).
        • Modify the code to use the parameter, rather than hardcoding 1, 2, and 3.
        • Return whether or not not the replacement took place.

        Call the routine with a parameter of 1. If it does not do the replace, call it with a parameter of 2.
        (If you're really ambitious, store the name prefaces in a hash, and use the hash as a test. Sooner or later, the list will change...)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://707552]
Approved by moritz
Front-paged by swampyankee
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2020-01-22 19:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?