Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Perl's hidden depths

by talexb (Chancellor)
on Nov 28, 2024 at 16:56 UTC ( [id://11162937]=perlmeditation: print w/replies, xml ) Need Help??

I'm semi-retired, which means I take care of a client's system of Perl scripts that mostly run without my intervention. I log everything with the excellent Log::Log4perl module, and sometimes I tail those files to keep on eye on the various scripts that run. One group of scripts creates tickets for new orders, and other scripts update these tickets based on what Sage (the accounting system) says.

Eventually, I started to think about understanding the life-cycle of these tickets -- they get created (that's logged in one file), they get updated (logged in a couple of other files), and they get closed (logged in two other files). Could I parse all of the log files and see the life-cycle just by drawing inferences? It's an academic exercise, since all I have to do is query the ticketing system's API about the history of a ticket, but like I said, I'm mostly retired, but I'm still curious.

The lines are like this:

2024/11/28 10:54:04 INFO : Update ticket 425955 to add invoice 802436 +tag .. OK 2024/11/28 10:54:05 INFO : Update ticket 425912 to add invoice 802435 +tag .. OK 2024/11/28 10:54:06 INFO : Add note to ticket 425912 with info about i +nvoice 802435 .. OK 2024/11/28 10:57:02 INFO : Create FD ticket 425991 for order 662626 .. + OK
So I created an AoH data structure with the filename, a useful regular expression, and an action (create or update). (Because for me, it always starts with a data structure to organize the logic.) But then I realized each log file had different elements that needed collecting. How do I handle that without having to write code for each log file? Can't I just add something clever to my data structure?

Eventually, some of my brain cells told me I needed to use a named capture in the regular expressions to handle this. Other brain cells complained that I'd never used that before, but the first group of brain cells said, Nonsense (or Buck Up, I forget), it's all in the Camel if you just look.

So, when you're capturing stuff in a regexp with a clause like (\d+), that first capture just gets stashed in $1. But you can also name that capture (a feature I never needed until now), like this: (?<ticket>\d+). And you get it out by looking at the magic variable %+, so the ticket value is available as $+{ ticket }. SO COOL!

I was then able to write a bunch of regular expressions, all with named captures, and collect whatever I needed from the log lines. Then, if a particular element was there, I would add it to the history hash I was building. So one of the AoH entries looked like this:

{ filename => 'status.log', regexp => qr/Update (?<ticket>\d+) status to (?<status>.+) \.\./, action => 'update' },
Then, putting stuff into the history hash was this large statement:
$history{ $+{ ticket } }{ $entry->{ action } } = { date => $words[0], 'time' => $words[1], ( exists ( $+{ order } ) ? ( order => $+{ order } ) : () ), ( exists ( $+{ invoice } ) ? ( invoice => $+{ invoice } ) : () ), ( exists ( $+{ shipment } ) ? ( shipment => $+{ shipment } ) : () ), ( exists ( $+{ scheduled_date } ) ? ( scheduled_date => $+{ scheduled_date } ) : + () ), ( exists ( $+{ status } ) ? ( status => $+{ status } ) : () ), };
I wanted to do all of this in a single statement, rather than have individual if statements for each possible element.

The code runs fine, and does what I expect. Named captures are a very cool feature, but they do exactly what I needed to do. Props to all the smart folks who came up with that idea (and then implemented it). What a cool language.

Alex / talexb / Toronto

Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Replies are listed 'Best First'.
Re: Perl's hidden depths
by ikegami (Patriarch) on Nov 28, 2024 at 17:18 UTC

    You have the following:

    $history{ $+{ ticket } }{ $entry->{ action } } = {
                                         date  => $words[0],
                                        'time' => $words[1],
      ( exists ( $+{ order } ) ? (       order => $+{ order } ) : () ),
      ( exists ( $+{ invoice } ) ? (   invoice => $+{ invoice } ) : () ),
      ( exists ( $+{ shipment } ) ? ( shipment => $+{ shipment } ) : () ),
      ( exists ( $+{ scheduled_date } ) ? (
                                scheduled_date => $+{ scheduled_date } ) : () ),
      ( exists ( $+{ status } ) ? (     status => $+{ status } ) : () ),
    };
    

    That's a lot less readable than then following:

    $history{ $+{ ticket } }{ $entry->{ action } } = {
                                         date           => $words[0],
                                         time           => $words[1],
      ( exists( $+{ order          } ) ? order          => $+{ order          } : () ),
      ( exists( $+{ invoice        } ) ? invoice        => $+{ invoice        } : () ),
      ( exists( $+{ shipment       } ) ? shipment       => $+{ shipment       } : () ),
      ( exists( $+{ scheduled_date } ) ? scheduled_date => $+{ scheduled_date } : () ),
      ( exists( $+{ status         } ) ? status         => $+{ status         } : () ),
    };
    

    It can be further cleaned using a loop.

    $history{ $+{ ticket } }{ $entry->{ action } } = { date => $words[0], time => $words[1], (\%+)->%{ grep exists( $+{ $_ } ), qw( order invoice shipment scheduled_date status ) } };

    The mess can be hidden in a sub.

    sub kv { my $h = shift; $h->%{ grep exists( $h->{ $_ } ), @_ } } kv( \%+, qw( ... ) )

    We can simplify further if we don't need 100% equivalence.

    If you don't care about undef fields, it simplifies to the following:

    $history{ $+{ ticket } }{ $entry->{ action } } = { date => $words[0], time => $words[1], (\%+)->%{qw( order invoice shipment scheduled_date status )} };

    If there are no extra captures (or you don't care if they end up as extra fields), it simplifies to the following:

    $history{ $+{ ticket } }{ $entry->{ action } } = { date => $words[0], time => $words[1], %+, };

      .. and that's why posting to Perlmonks is so awesome. Someone way brighter can point out how the code can be simplified even more. Thanks!

      Alex / talexb / Toronto

      Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://11162937]
Approved by soonix
Front-paged by soonix
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-12-12 17:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which IDE have you been most impressed by?













    Results (65 votes). Check out past polls.