Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

One to Many is Too Many?

by Jamin (Novice)
on Apr 22, 2008 at 19:57 UTC ( #682261=perlquestion: print w/replies, xml ) Need Help??
Jamin has asked for the wisdom of the Perl Monks concerning the following question:

I need some Perl help from the masters. I haven't been able to figure this one out, or find anyone who can even begin to figure it out.

I have 2 data files I am trying to combine, and I have not been able to figure out a good way to do this that is not really sloppy. As an example, consider a Bug Tracking database.

One file contains the Bugs:

BUGS.csv BugID,Date,Subject,Priority 9551,03/01/2008,This is the subject,1 9552,03/03/2008,This is the subject,2 9553,03/06/2008,This is the subject,1 9554,03/07/2008,This is the subject,0 9555,03/08/2008,This is the subject,5
BUG_Comments.csv CommentId, BugId, Date, CommentBody 1023,9555,03/06/2008,This is the body 1024,9555,03/07/2008,This is the body 1025,9555,03/08/2008,This is the body

I need to have all the Comments for one bug, concatenated into one field.

WANTED: BUG_Comments_COMBINED.csv BugId, CommentBody 9555,"03/06/2008 This is the body 03/07/2008 This is the body 03/08/20 +08 This is the body"

The history is this: I am writing a script to manipulate data to migrate from our old bug tracker to our new one. The new one has only one field for ALL the comments. It also will only allow a one-time import, no updates. So I have to have ALL the comments from ONE bug, in a giant comments field.

I have started this off 100 times and it get's so stinkin' convoluted, I can't even read it anymore.

ANY ideas? At All? This MUST have come up somewhere before?



Replies are listed 'Best First'.
Re: One to Many is Too Many?
by apl (Monsignor) on Apr 22, 2008 at 20:04 UTC
    Hint: You want a hash keyed on the BugID from BUGS.csv

    Given that, which file do you think you'd need to read first?

    After you've processed the entire first file, how would you process the second file?

    After processing the second file, what would you have to do to complete your processing?

      Considering the desired output, there's only one file that needs to be read.

      Caution: Contents may have been coded under pressure.
        I agree and disagree. You're correct that all necessary information is contained in a single file. However, if I were migrating (as the OP is), I'd want to be able to say "Hey, I found the following Bug_Comments.csv record that does not have an associated Bugs.csv record!".

        There may be more information in the first file that will be merged, but was not not being provided for the sake of brevity. In any event, I'm being pedantic. 8-)

Re: One to Many is Too Many?
by Eimi Metamorphoumai (Deacon) on Apr 22, 2008 at 20:53 UTC
    Doesn't seem that hard. The code below outputs the BUG_Comments_COMBINED.csv file, though it doesn't worry at all about the order in which it outputs lines. If that matters to you, it can be adjusted.
    #!/usr/bin/perl -l use strict; use warnings; my %comment; #comment for each bug id <DATA>; #skip header line while(<DATA>){ chomp; #the 4 in the split below ignores all comma in the comment text my (undef, $bugid, $date, $body) = split(/,/, $_, 4); $comment{$bugid} .= "$date $body "; } while (my ($id, $comment) = each %comment){ print "$id, $comment"; } __DATA__ CommentId, BugId, Date, CommentBody 1023,9555,03/06/2008,This is the body 1024,9555,03/07/2008,This is the body 1025,9555,03/08/2008,This is the body
Re: One to Many is Too Many?
by roboticus (Chancellor) on Apr 22, 2008 at 21:37 UTC

    You might try modifying a standard sort/merge operation. A brief pseudocode description is:

    SortByBugID("BugDescriptions","BugDescriptions.s"); SortByBugID("BugComments","BugComments.s"); BD = open "<BugDescriptions.s" COM = open "<BugComments.s" OUT = open ">ResultFile" curBD = <BD> curCOM = <COM> while !BD.eof && !COM.eof { if curBD.BugID < curCOM.BugID { write OUT, curBD curBD = <BD> } elsif curBD.BugID == curCOM.BugID { curBD.Comment .= curCOM.Comment curCOM = <COM> } else { die "Missing bug description?" } }

    If you're interested in this method, I have an earlier node with working code for a similar problem.

Re: One to Many is Too Many?
by ikegami (Pope) on Apr 22, 2008 at 22:57 UTC
    DBD::CSV? It's not the fastest solution, but that shouldn't matter if it's one-time migration.

      Thank you all for swift and easy to understand answers! It works! I knew there had to be an easier way to do it than I was. I was opening both files twice each. I think I was up to 40 lines.

      I based my solution on what Eimi Metamorphoumai proposed. Thank you Eimi!

      Here is what I ended up with

      foreach (@bz_comments){ ($who, $bug_id, undef, $date, undef, $body) = split(/,/, $_); $BZ_comment{$bug_id} .= "<p><strong>$who $date</strong></p> <p +>$body</p>"; } while (my ($id, $comment) = each %BZ_comment){ push(@bz_comments_OUTPUT, "$id , $comment"); }

      Thank you all again!


Re: One to Many is Too Many?
by pc88mxer (Vicar) on Apr 22, 2008 at 21:01 UTC
    Although I haven't seen what you've tried, I can guess at it. The key idea here is to read both files into appropriate data structures and then do the merge operation. Things can get convoluted if you open and read from the same file multiple times.

    Let's see how well apl's Socratic approach works. :-) If you still find yourself stuck, post your latest attempt, and we'll give you some more pointers.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://682261]
Approved by kyle
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2017-05-30 03:02 GMT
Find Nodes?
    Voting Booth?