Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Parsing multiple rows of text and converting to columns

by gingeremmie (Initiate)
on Jun 24, 2013 at 17:38 UTC ( #1040463=perlquestion: print w/ replies, xml ) Need Help??
gingeremmie has asked for the wisdom of the Perl Monks concerning the following question:

I am fairly new to Perl. I have a file with data in it which looks like this:
> <Study Design|2|Additional Findings|1|Timepoint>
15

> <Study Design|2|Additional Findings|1|Timepoint Unit>
Days

> <Study Design|2|Additional Findings|1|Dose (Mg/Kg)>
20

> <Study Design|2|Additional Findings|1|Average Value>
412.22

> <Study Design|2|Additional Findings|1|Sd>
61.05

> <Study Design|2|Additional Findings|2|Timepoint>
15

> <Study Design|2|Additional Findings|2|Timepoint Unit>
Days

> <Study Design|2|Additional Findings|2|Dose (Mg/Kg)>
20

> <Study Design|2|Additional Findings|2|Average Value>
71.74

> <Study Design|2|Additional Findings|2|Sd>
11.07

And I want to end up with a tab separated column output:
Timepoint Timepoint Unit Dose Average Value SD
15 Days 20 412.22 61.05
I have worked out how to write a very basic Perl script which opens the file, reads it, and prints out the pairs of lines, but I'm not sure where to go from here. Can someone point me in the right direction - I'm really new to Perl and scripting altogether.
Thank you

Comment on Parsing multiple rows of text and converting to columns
Re: Parsing multiple rows of text and converting to columns
by PerlSufi (Friar) on Jun 24, 2013 at 17:47 UTC
    Hi gingeremmie, First, it is perlmonks best practice that you put your code in < code > < /code > tags so we have a better idea what you're trying to achieve. :)
    It will look like this:
    my $code = asdf;
      ... best practice [to] put your code in < code > < /code > tags ...

      gingeremmie:
      ... and also data, input and output. For example, if the data of the OP had been in code tags, it would not be necessary to tediously edit (and possibly corrupt) the data in order to provide an example solution, and that might make it more likely that some busy monk will take the time to provide the example. Please see Writeup Formatting Tips, Markup in the Monastery and How do I post a question effectively?.
      (BTW: Because you posted originally as a registered monk, you can go back and update your OP to include the code tags! Hint, hint...)

Re: Parsing multiple rows of text and converting to columns
by Laurent_R (Parson) on Jun 24, 2013 at 17:57 UTC

    You could use a regex to fetch the last part of the line and read immediately the next line. Something like this:

    my %vals; while (my $line = <$IN>) { if ($line =~ />/) { my $element = $1 if $line =~ /\|([^|]+)>\s*$/; $line = <$IN>; chomp $line; $vals{$element}=$line; } }

    Then you only need to print the hash.

Re: Parsing multiple rows of text and converting to columns
by hdb (Prior) on Jun 25, 2013 at 06:23 UTC

    If you set $/="" then the <FILE> construct will read in paragraphs, ie blocks of lines separated by empty lines.

    Within these paragraphs, you can extract your data with a regular expression. For example, /Findings\|(\d+)\|(.*)>\n(.*)/ would extract the record number, the item description and the value as $1, $2 and $3. You could store them in a hash like this $hash{$1}{$2} = $3 for further processing, sorting and printing.

      Thank you everybody, much appreciated, I will go and do some experimenting with the ideas you have given me
      Not sure why I was told to post my code in code tags because I didn't post any code? My code is non existent at the moment hehe! I just posted the raw data I was trying to extract. Should that have been in code tags? I wouldn't have thought so but I'm new to this, so would be grateful if someone could let me know.
Re: Parsing multiple rows of text and converting to columns
by Cristoforo (Deacon) on Jun 25, 2013 at 14:52 UTC
    Does your data look like:
    > <Study Design|2|Additional Findings|1|Timepoint> 15 > <Study Design|2|Additional Findings|1|Timepoint Unit> Days > <Study Design|2|Additional Findings|1|Dose (Mg/Kg)> 20 > <Study Design|2|Additional Findings|1|Average Value> 412.22 > <Study Design|2|Additional Findings|1|Sd> 61.05 > <Study Design|2|Additional Findings|2|Timepoint> 15 > <Study Design|2|Additional Findings|2|Timepoint Unit> Days > <Study Design|2|Additional Findings|2|Dose (Mg/Kg)> 20 > <Study Design|2|Additional Findings|2|Average Value> 71.74 > <Study Design|2|Additional Findings|2|Sd> 11.07
    Or does it look like this (no blank lines inbetween).
    > <Study Design|2|Additional Findings|1|Timepoint> 15 > <Study Design|2|Additional Findings|1|Timepoint Unit> Days > <Study Design|2|Additional Findings|1|Dose (Mg/Kg)> 20 > <Study Design|2|Additional Findings|1|Average Value> 412.22 > <Study Design|2|Additional Findings|1|Sd> 61.05 > <Study Design|2|Additional Findings|2|Timepoint> 15 > <Study Design|2|Additional Findings|2|Timepoint Unit> Days > <Study Design|2|Additional Findings|2|Dose (Mg/Kg)> 20 > <Study Design|2|Additional Findings|2|Average Value> 71.74 > <Study Design|2|Additional Findings|2|Sd> 11.07

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1040463]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (11)
As of 2014-12-19 15:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (85 votes), past polls