Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Help in manipulating values from two arrays

by rsriram (Hermit)
on Mar 24, 2007 at 08:02 UTC ( #606394=perlquestion: print w/replies, xml ) Need Help??

rsriram has asked for the wisdom of the Perl Monks concerning the following question:

Hi all

I am working on a coded file, which has the format as follows

<act>Key</act><emp>3384</emp><job>78082</job><chap>6</chap><pg>20</pg> +<time>0.7</time><prod>114.285714285714</prod> <act>Reconcile</act><emp>3017</emp><job>78062</job><chap>2-7</chap><pg +>0</pg><time>1.4</time><prod>Insufficient Information</prod> <act>Training</act><emp>3384</emp><job>77654</job><chap>-</chap><pg>0< +/pg><time>5.1</time><prod>Non-Billable</prod> <act>Management</act><emp>3017</emp><job>77893</job><chap>-</chap><pg> +0</pg><time>4.4</time><prod>Non-Billable</prod> <act>Break</act><emp>3379</emp><job>33843</job><chap>-</chap><pg>0</pg +><time>0.2</time><prod>Non-Billable</prod> <act>Excess overload</act><emp>3379</emp><job>77570</job><chap>14</cha +p><pg>1</pg><time>0.5</time><prod>6.66666666666667</prod> <act>Management</act><emp>3123</emp><job>88898</job><chap>-</chap><pg> +0</pg><time>0.5</time><prod>Non-Billable</prod> <act>Management</act><emp>3123</emp><job>22304</job><chap>-</chap><pg> +0</pg><time>0.3</time><prod>Insufficient Information</prod> <act>Management</act><emp>3123</emp><job>11121</job><chap>-</chap><pg> +0</pg><time>1.4</time><prod>Non-Billable</prod> <act>Adapt</act><emp>3123</emp><job>78143</job><chap>08-</chap><pg>0</ +pg><time>0.3</time><prod>Insufficient Information</prod> <act>Import</act><emp>3417</emp><job>76584</job><chap>App K</chap><pg> +4</pg><time>1.0</time><prod>11.4285714285714</prod> <act>Break</act><emp>3123</emp><job>22732</job><chap>-</chap><pg>0</pg +><time>0.4</time><prod>50.65687</prod> <act>key</act><emp>3123</emp><job>78143</job><chap>08</chap><pg>0</pg> +<time>3.3</time><prod>45.5544</prod> <act>Supervision</act><emp>3192</emp><job>54281</job><chap>-</chap><pg +>0</pg><time>4.0</time><prod>Non-Billable</prod>

In the above file, <emp> is the employee number and I want to print the average productivity <prod> of every <emp>. This should not consider if there is no productivity number specified in the <prod> element (eg. Non billable or Insufficient information). The output should be similar to

3384 - 114.285714285714 3379 - 3017 - 3379 - 6.66666666666667 3123 - 48.105635 3417 - 11.4285714285714 3192 -

I tried the following code, but I could not get it.

while(<F5>) { for my $a(0..$#inlst) { if($_ =~ /<emp>@inlst[$a]<\/emp>/) { $_ =~ /<prod>(.+?)<\/prod>/g; $prod=$1; if($prod > 0) { $sum=$sum+$prod; } } } print "@inlst[$a]\t$sum\n"; } }

@inlst will contain all the employee codes in it. Can anyone help me on this?

Replies are listed 'Best First'.
Re: Help in manipulating values from two arrays
by GrandFather (Saint) on Mar 24, 2007 at 08:44 UTC

    How many times do we have to tell the children? Do not hand roll code to parse HTML/XML, life just ain't long enough for that to be worth while - even as an exercise.

    Use HTML::TreeBuilder for HTML or XML::Twig for XML. In this case it looks like XML so lets wrap a root element around the sample data provided and see what we can do:

    use strict; use warnings; use XML::Twig; my $xml = <<XML; <root>
    </root> XML my $t= XML::Twig->new (twig_roots => {emp => \&emp, prod => \&prod}); my %emp; my $currEmp; $t->parse ($xml); print "$_ - $emp{$_}\n" for sort keys %emp; sub emp { my ($t, $data) = @_; $currEmp = $data->trimmed_text (); $emp{$currEmp} ||= ''; } sub prod { my ($t, $data) = @_; my $text = $data->trimmed_text (); return if $text !~ /^\d+(\.\d*)?/; $emp{$currEmp} ||= 0; $emp{$currEmp} += $text; }

    Prints:

    3017 - 3123 - 96.21127 3192 - 3379 - 6.66666666666667 3384 - 114.285714285714 3417 - 11.4285714285714

    DWIM is Perl's answer to Gödel
      I, for one, often like to find the solutions that don't involve modules. As a fairly new coder, I am not confident in their use. Additionally, I think that many of the solutions that can be solved without the use of a module are enhancing my understanding of how perl works.

      On the other hand, learning something about modules is helpful as well... and helping everyone learn is what this site is supposed to be about.

        All problems that can be solved using a module (written by someone else) can be solved without using the module. It's just that you may end up rewriting the module! Redoing hundred or thousands of hours of work may be a good way of learning, but it doesn't get the task at hand achieved in a timely fashion.

        Of course if you really want to learn stuff try solving the same problems in assembly language or Ook! - you'll learn all sorts of stuff about frustration and low productivity, but those are probably not the things you want to learn.

        One of the important lessons to learn here is that there are a lot of very clever people writing modules for Perl and making them freely available. Using those modules can save you a lot of time. Peeking at the internals of those modules can teach you a lot about coding techniques. Using modules you can win both ways - learning and saving time.


        DWIM is Perl's answer to Gödel
        I, for one, often like to find the solutions that don't involve modules. As a fairly new coder, I am not confident in their use.

        That attitude can be a little bit dangerous. If you have a chance, skim the XML 1.0 specification. I'm certainly not going to hand-roll code to parse XML in a couple of hours, and I'm a fairly experienced coder.

        You're a lot better off spending your learning time figuring out how to take advantage of work other people have already done. This particular case is awfully complex.

      hi

      Thank you very much!! Your code is amazing. I just need one further help. I have the contents <act>key</act>... on a separate xml file. I am not able to succeed if I use open or if I store the file to a variable and call the variable from the place where you have placed the content. Can you please tell me how I can call a external file here?

      Thanks, for all your help in this

        It's not clear to me what you are trying to achieve with the separate file. Perhaps you could sketch in code what you are trying to do? Not a fully worked attempt to provide a solution to the problem, but an outline of the steps you think are required.


        DWIM is Perl's answer to Gödel
Re: Help in manipulating values from two arrays
by Anno (Deacon) on Mar 24, 2007 at 09:02 UTC
    Your output sample shows employee 3379 twice. I'll assume that is an error.

    You are assuming you already have a list @inlst of employees, though your code doesn't show where it comes from. The program should probably build the list as it goes along.

    Here is one way to do it (__DATA__ section not shown):

    use List::Util qw( sum); my (@emp_lst, %prod); while ( <DATA> ) { my ( $emp) = m{<emp>(\d+)</emp>} or warn "No <emp> in line $.\n", next; push @emp_lst, $emp unless $prod{ $emp}; $prod{ $emp} ||= []; my ( $prod) = m{<prod>([.\d]+)</prod>} or next; push @{ $prod{ $emp} }, $prod; } for ( values %prod ) { $_ = @$_ ? sum( @$_)/@$_ : ''; } print "$_ - $prod{ $_}\n" for @emp_lst;
    Anno

    Update: I agree with GrandFather that an appropriate parser is a better solution.

    Update: Noticed an optimisation. The pesky $prod{ $emp} ||= []; can go from the while loop and everything rely on autovivification as it should:

    while ( <DATA> ) { my ( $emp) = m{<emp>(\d+)</emp>} or warn "No <emp> in line $.\n", next; push @emp_lst, $emp unless $prod{ $emp}; push @{ $prod{ $emp} }, m{<prod>([.\d]+)</prod>}; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://606394]
Approved by bart
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2022-09-29 04:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer my indexes to start at:




    Results (125 votes). Check out past polls.

    Notices?