Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

split then join based on common value in field

by brianjb (Novice)
on Jun 01, 2012 at 23:53 UTC ( #973890=perlquestion: print w/ replies, xml ) Need Help??
brianjb has asked for the wisdom of the Perl Monks concerning the following question:

Good afternoon! I am new to this forum. In fact this is my first post. I am querying a database and then trying to join the data onto a single line that has the same id in the first field. Here is an example of what the beginning date looks like:

9885 10.10.9.48 Room 1105A 9885 10.10.9.48 Jack 1105A--05D 9885 10.10.9.48 org_code B703 9885 10.10.9.48 Building 1268A 114948 10.10.184.0 nasa_nets off 114948 10.10.184.0 blockSecName test name 114948 10.10.184.0 blockTechName brian test 114948 10.10.184.0 blockAdminName test admin 114949 10.10.184.0 blockSecName John G. Smooth 114949 10.10.184.0 blockTechPhone 222-555-1212 114949 10.10.184.0 blockAdminName Lucy P. Wallice 114949 10.10.184.0 blockAdminId 8878787 114949 10.10.184.0 block_name unknown 114949 10.10.184.0 blockSecId 787878 114949 10.10.184.0 blockAdminEmail lucy.p.wallice@google.com 114949 10.10.184.0 blockTechName TEST LAN 114949 10.10.184.0 blockSecPhone 222-555-3232 114949 10.10.184.0 blockTechEmail terCInternal@google.com 114949 10.10.184.0 blockSecEmail John.goody@google.com 114949 10.10.184.0 nasa_nets off

Note that the first field has an ID. the second, third, and fourth may be different. I don't need to keep the first field at all. Here is what I would like the end result to look like:

10.10.9.48|Room=1105A|Jack=1105A--05D|org_code=B703|Building=1268A 10.10.184.0|nasa_nets=off|blockSecName=test name|blockTechName=brian t +est|blockA dminName=test admin|blockSecName=John G. Smooth|blockTechPhone=222-555 +-1212|bloc kAdminName=Lucy P. Wallice|blockAdminId=8878787|block_name=unknown|blo +ckSecId=78 7878|blockAdminEmail=lucy.p.wallice@google.com|blockTechName=TEST LAN| +blockSecPhon e=222-555-3232|blockTechEmail=terCInternal@google.com|blockSecEmail=Jo +hn.good y@google.com|nasa_nets=off

Notice that every line that had the same first field was put on the same line. In fact, I suppose we can just ignore the first field and just use the second field. So now that you know I am least trying, here is what I have tried:

$out_file="/tmp/brian_test.$datestring"; open(IN,"<$TMPCONTLIST") or die "Can't open $TMPCONTLIST!\n"; open(OUT,">$out_file") or die "Can't open $out_file!\n"; while ($line=<IN>) { @fields = split /\t/,$line; $line = join "|", @fields[1,2,3,4]; print OUT $line; print OUT "\n"; } close IN; close OUT;

It is giving me this....just a sample

10.10.9.48|User_POC_Phone|222-555-2322 | 10.10.9.48|Room|1105A | 10.10.9.48|Jack|1105A--05D | 10.10.9.48|org_code|B703 | 10.10.9.48|Building|1268A

Can you please help? Thanks in advance!!!

Comment on split then join based on common value in field
Select or Download Code
Re: split then join based on common value in field
by spazm (Monk) on Jun 02, 2012 at 00:15 UTC
    #!/usr/bin/perl use warnings; use strict; use 5.10.0; my $current_ip = ''; my @current = (); while (<DATA>) { chomp; my ( $id, $ip, $key, $value ) = split(/\t/); if ( $ip ne $current_ip ) { say join( '|', $current_ip, @current ) if @current; $current_ip = $ip; @current = (); } push @current, "$key=$value"; } say join( '|', $current_ip, @current ) if @current; #note: data should be tab sep __DATA__ 9885 10.10.9.48 Room 1105A 9885 10.10.9.48 Jack 1105A--05D 9885 10.10.9.48 org_code B703 9885 10.10.9.48 Building 1268A 114948 10.10.184.0 nasa_nets off 114948 10.10.184.0 blockSecName test name 114948 10.10.184.0 blockTechName brian test 114948 10.10.184.0 blockAdminName test admin 114949 10.10.184.0 blockSecName John G. Smooth 114949 10.10.184.0 blockTechPhone 256-555-1212 114949 10.10.184.0 blockAdminName Lucy P. Wallice 114949 10.10.184.0 blockAdminId 8878787 114949 10.10.184.0 block_name unknown 114949 10.10.184.0 blockSecId 787878 114949 10.10.184.0 blockAdminEmail lucy.p.wallice@nasa.gov 114949 10.10.184.0 blockTechName TEST LAN 114949 10.10.184.0 blockSecPhone 256-555-3232 114949 10.10.184.0 blockTechEmail terCInternal@mail.nasa.gov 114949 10.10.184.0 blockSecEmail John.goody@nasa.gov 114949 10.10.184.0 nasa_nets off
    outputs:
    10.10.9.48|Room=1105A|Jack=1105A--05D|org_code=B703|Building=1268A 10.10.184.0|nasa_nets=off|blockSecName=test name|blockTechName=brian t +est|blockAdminName=test admin|blockSecName=John G. Smooth|blockTechPh +one=256-555-1212|blockAdminName=Lucy P. Wallice|blockAdminId=8878787| +block_name=unknown|blockSecId=787878|blockAdminEmail=lucy.p.wallice@n +asa.gov|blockTechName=TEST LAN|blockSecPhone=256-555-3232|blockTechEm +ail=terCInternal@mail.nasa.gov|blockSecEmail=John.goody@nasa.gov|nasa +_nets=off
Re: split then join based on common value in field
by Kenosis (Priest) on Jun 02, 2012 at 02:01 UTC

    Here's another option that will generate your desired results:

    use Modern::Perl; use Sort::Key::IPv4 qw(ipv4sort); my %hash; do { chomp( my ( $ip, $fld2, $fld3 ) = ( split /\t/ )[ 1 .. 3 ] ); push @{ $hash{$ip} }, "$fld2=$fld3"; } for <DATA>; print "$_|" . join( '|', @{ $hash{$_} } ) . "\n\n" for ipv4sort keys % +hash; __DATA__ 9885 10.10.9.48 Room 1105A 9885 10.10.9.48 Jack 1105A--05D 9885 10.10.9.48 org_code B703 9885 10.10.9.48 Building 1268A 114948 10.10.184.0 nasa_nets off 114948 10.10.184.0 blockSecName test name 114948 10.10.184.0 blockTechName brian test 114948 10.10.184.0 blockAdminName test admin 114949 10.10.184.0 blockSecName John G. Smooth 114949 10.10.184.0 blockTechPhone 256-555-1212 114949 10.10.184.0 blockAdminName Lucy P. Wallice 114949 10.10.184.0 blockAdminId 8878787 114949 10.10.184.0 block_name unknown 114949 10.10.184.0 blockSecId 787878 114949 10.10.184.0 blockAdminEmail lucy.p.wallice@nasa.gov 114949 10.10.184.0 blockTechName TEST LAN 114949 10.10.184.0 blockSecPhone 256-555-3232 114949 10.10.184.0 blockTechEmail terCInternal@mail.nasa.gov 114949 10.10.184.0 blockSecEmail John.goody@nasa.gov 114949 10.10.184.0 nasa_nets Soff

    Output:

    10.10.9.48|Room=1105A|Jack=1105A--05D|org_code=B703|Building=1268A 10.10.184.0|nasa_nets=off|blockSecName=test name|blockTechName=brian t +est|blockAdminName=test admin|blockSecName=John G. Smooth|blockTechPh +one=256-555-1212|blockAdminName=Lucy P. Wallice|blockAdminId=8878787| +block_name=unknown|blockSecId=787878|blockAdminEmail=lucy.p.wallice@n +asa.gov|blockTechName=TEST LAN|blockSecPhone=256-555-3232|blockTechEm +ail=terCInternal@mail.nasa.gov|blockSecEmail=John.goody@nasa.gov|nasa +_nets=Soff

    Update:

    I've added the Sort::Key::IPv4 module to use its ipv4sort on the IP4 addresses in your data set. The advantage is an ordered list of results. Its limitation is not sorting on IP6 addresses. This IP4 sorting was easily added, and can be easily removed if not needed.

      I am a newbie here with perl. I see what you guys are offering, but I don't grasp how to implement that within my script. Can you take my script and then paste exactly where it needs to be? Then maybe explain what I am missing. Thanks for being nice! I am trying to learn as I go.

      #!/usr/bin/perl use POSIX qw(strftime); use FileHandle; use Getopt::Long; use IO::Handle; $MYSQL='/pathtomysql/mysql'; # Doing the following to get the date into consistent format my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(t +ime); $year += 1900; my @moname = qw( 01 02 03 04 05 06 07 09 10 11 12 ); my $datestring = "$year.$moname[$mon]$mday.$hour$min.$sec"; my ($dbh,$CONTAINER,$DEBUG,$DOMAIN,$DOMTYPE,$USER,$PASSWORD,$USAGE); $Getopt::Long::autoabbrev=1; &GetOptions('-zdebug'=>\$DEBUG,'container:s'=>\$CONTAINER,'-user:s'=>\ +$USER,'-password:s'=>\$PASSWORD,'-x'=>\$USAGE,'v'=>\$VERSION); if ($VERSION) { print "Version 1.0 Build date: 23 May 2012 \n"; exit; } if (! $CONTAINER) { print "missing container: $CONTAINER \n"; &usage; } if (! $USER) { print "missing user: $USER\n"; &usage; } if (! $PASSWORD) { print "missing password: $PASSWORD\n"; &usage; } if ($USAGE) { &usage; } print "container: $CONTAINER \n" if $DEBUG; # This is converting it to all uppercase $CONTAINER =~ tr/a-z/A-Z/; my $TMPSQLCONT="/tmp/sql.$datestring"; my $TMPCONTLIST="/tmp/sql.contlist.$datestring"; # Getting the container ID's for the subnets that we want to get data +on $SQLCONT="select user_defined_fld_value.objectid, inet_ntoa(block.ST +ARTADDR), user_defined_fld.tag, user_defined_fld_value.VALUE from user_defined_fld_value left join user_defined_fld on user_defined_fld_value.USER_DEFINED_FLD_ID=user_defined_fld.id left join container_block on container_block.blockid=user_defined_fld_value.objectid left join block on block.id=container_block.blockid left join container on container.id=container_block.containerid where container.notes like '%$CONTAINER%'"; my $tmpcont=FileHandle->new; $tmpcont->open(">$TMPSQLCONT"); print $tmpcont "$SQLCONT;\n"; $tmpcont->close; system("$MYSQL -u $USER -p$PASSWORD instancename <$TMPSQLCONT > $T +MPCONTLIST"); ###################################################################### +### sub usage { print "Usage: \n"; print " -c: Container \n"; print " -u: login \n"; print " -p: password\n"; print " -z: debug \n"; print " -x: This message \n"; exit 0; } ###################################################################### +###

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://973890]
Approved by Eliya
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (21)
As of 2014-07-29 14:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (217 votes), past polls