Re: Joining separate data files to make one.

in reply to Joining separate data files to make one.

Here's another way to do it:

#!perl

use 5.10.0;
use strict;
use warnings;

my %merged = ();
my $index = 0;

map {
    open my $file, '<', $_ or die $!;

    map {
        $merged{$_->[0]} //= [ qw{null} x 3 ];
        $merged{$_->[0]}[$index] = $_->[1];
    }
    map {
        [ m{ \A ( \S+ \s \S+ \s \S+ \s \S+ ) \s ( \S+ ) \z }msx ]
    }
    map {
        chomp; $_
    } (<$file>);

    close $file;

    ++$index;
} qw{gravity magnetics bathymetry};

say join(' ', $_, @{$merged{$_}}) for sort keys %merged;
[download]

I put the script in a file called geo_file_join.pl and made some short test files:

ken@Miranda ~/c/_/tmp
$ cat gravity
2010-10-01 00:00:03 lat1 long1 grav1
2010-10-02 00:00:05 lat2 long2 grav2
2010-10-03 00:00:07 lat3 long3 grav3

ken@Miranda ~/c/_/tmp
$ cat magnetics
2010-10-02 00:00:05 lat2 long2 mag1
2010-10-03 00:00:07 lat3 long3 mag2
2010-10-04 00:00:09 lat4 long4 mag3

ken@Miranda ~/c/_/tmp
$ cat bathymetry
2010-10-03 00:00:07 lat3 long3 bath1
2010-10-04 00:00:09 lat4 long4 bath2
2010-10-05 00:00:01 lat3 long3 bath3
[download]

Here's the output:

ken@Miranda ~/c/_/tmp
$ geo_file_join.pl
2010-10-01 00:00:03 lat1 long1 grav1 null null
2010-10-02 00:00:05 lat2 long2 grav2 mag1 null
2010-10-03 00:00:07 lat3 long3 grav3 mag2 bath1
2010-10-04 00:00:09 lat4 long4 null mag3 bath2
2010-10-05 00:00:01 lat3 long3 null null bath3
[download]

Assuming your latitudes and longitudes are in some sortable format, this will sort by the first 4 fields (i.e. date, time, latitude and longitude).

Comment on Re: Joining separate data files to make one. Select or Download Code

Replies are listed 'Best First'.
Re^2: Joining separate data files to make one. by msexton (Initiate) on Oct 07, 2010 at 09:40 UTC
Hi, Thanks for this it worked well. The only problem is, I can't for the life of me figure it out. The multiple calls to map have me perplexed. I spent most of the day reading up about map, and am still a bit confused. I know I didn't give an example of my data files, but you were almost spot on. If I can ask a favour, how would the code vary, if the gravity and magnetics files had a 6th field, whilst the bathymetry remained at five? One of the other replies I received was a bit easier to understand, but did not handle the situation where a later file (eg bathymetry) ends before (in time) an earlier file (eg magnetics). It did not add nulls to the hash. It worked well when files started later than previous files.	[reply]
Re^3: Joining separate data files to make one. by afoken (Chancellor) on Oct 07, 2010 at 15:10 UTC
The multiple calls to map have me perplexed. Sure. map is abused here to work as `foreach` and `while`. `map { open my $file, '<', $_ or die $!; map { $merged{$_->[0]} //= [ qw{null} x 3 ]; $merged{$_->[0]}[$index] = $_->[1]; } map { [ m{ \A ( \S+ \s \S+ \s \S+ \s \S+ ) \s ( \S+ ) \z }msx ] } map { chomp; $_ } (<$file>); close $file; ++$index; } qw{gravity magnetics bathymetry};` [download] The outer map is really a `foreach`: `foreach my $filename (qw{gravity magnetics bathymetry}) { open my $file, '<', $filename or die $!; map { $merged{$_->[0]} //= [ qw{null} x 3 ]; $merged{$_->[0]}[$index] = $_->[1]; } map { [ m{ \A ( \S+ \s \S+ \s \S+ \s \S+ ) \s ( \S+ ) \z }msx ] } map { chomp; $_ } (<$file>); close $file; ++$index; }` [download] The last `map` inside the `foreach` loop simply iterates over all lines of the file and strips trailing newlines. Then it passes each line to the middle `map`, which extracts some parts of the line, and returns an array reference with the matches. The first `map` is again abused as a `foreach`. Using `while (<$file>)` would make that more readable: `foreach my $filename (qw{gravity magnetics bathymetry}) { open my $file, '<', $filename or die $!; while (<$file>) { chomp; my @a=m{ \A ( \S+ \s \S+ \s \S+ \s \S+ ) \s ( \S+ ) \z }msx; $merged{$a[0]}//=[ qw{null} x 3 ]; $merged{$a[0}}[$index]=$a[1]; } close $file; ++$index; }` [download] Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l] [select]
Re^3: Joining separate data files to make one. by kcott (Archbishop) on Oct 07, 2010 at 18:19 UTC
Thanks for this it worked well. Your welcome. I enjoyed writing it. The only problem is, I can't for the life of me figure it out. The multiple calls to map have me perplexed. I spent most of the day reading up about map, and am still a bit confused. Alexander has provided a breakdown of what's going on here. Feel free to ask if anything needs further explanation. I know I didn't give an example of my data files, but you were almost spot on. Your original question was pretty clear. I felt I had a reasonable understanding of what you were after. If I can ask a favour, how would the code vary, if the gravity and magnetics files had a 6th field, whilst the bathymetry remained at five? I'm happy to answer that with a little more information. Is the 6th field to be added to the final data as an additional field? Are the 5th and 6th fields to be combined and then added? Is the 6th field just extraneous data to be discarded? Something else? Finally, on the timing issue, I staggered the date-time fields through the test data to take that into consideration. Within each test file the times are ordered though. If your live data is not necessarily in chronological order, you might want to jumble up the lines in one or more files. I think it should still work but I didn't specifically test for that scenario. Regards, Ken	[reply]
Re^4: Joining separate data files to make one. by msexton (Initiate) on Oct 08, 2010 at 12:09 UTC
Hi Ken and Alexander, I didn't get a chance to examine your responses today, as I was spent most of the day implementing the suggestions from one of the other respondents. I eventually got it to work correctly. The biggest problem I had was extracting the elements out of the hash to write them out to the final output file. I eventually got there. I will examine your suggestions on Monday In the meantime, I have appended a couple of fields to the datasets you sent me to show you basically what I have. The hash should contain: Date, Time, all 5 remaining fields from gravity, all 4 remaining fields from magnetics, and three remaining fields from bathymetry. Whilst the navigation should be the same in all three files, by putting them into the hash I can check that they are. If they are not essentially the same, then I know that a problem exists. No further processing of the fields is done (ie addition, etc). They are just read from the hash and written in a specific format (MGD77) to an output file. With the work I did today, I think I can manage that. Where there are multiple navigations (most times),I have a hierarchy and select what I believe is the best for output (usually bathymetry). Here is the updated file structure you sent me ( can't see how to make the attachment you did) $ cat gravity 2010-10-01 00:00:03 lat1 long1 grav1 g_anom1 eotvos1 2010-10-02 00:00:05 lat2 long2 grav2 g_anom2 eotvos2 2010-10-03 00:00:07 lat3 long3 grav3 g_anom3 eotvos3 $ cat magnetics 2010-10-02 00:00:05 lat2 long2 mag1 m_anom1 2010-10-03 00:00:07 lat3 long3 mag2 m_anom2 2010-10-04 00:00:09 lat4 long4 mag3 m_anom3 $ cat bathymetry 2010-10-03 00:00:07 lat3 long3 bath1 2010-10-04 00:00:09 lat4 long4 bath2 2010-10-05 00:00:01 lat3 long3 bath3 Thanks once again Mike	[reply]

In Section Seekers of Perl Wisdom