fennewald has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing some one off functionality to work with other CLI tools. I was curious (1) what was the idiomatic way to do what I describe, and (2) if there is a faster way

This specific example deals with the command gst-inspect-1.0. You don't need to know what it does, just this. Calling gst-inspect-1.0 without arguements returns a list of plugins. Calling gst-inspect-1.0 with a plugin name returns information about that specific plugin. Each call to gst-inspect-1.0 takes about half a second, give or take (so suprisingly long).

I want to work with the description for each plugin, so I wrote the following snippet:

my %plugins = map { $_ => scalar `gst-inspect-1.0 $_` } # Inspect the plugin map { $1 if /^\w+:\s+(\w+)\b/ } # Reduce to plugin name grep { /^\w+:/ } # Filter header and empt +y lines out split /\n/, `gst-inspect-1.0`; # Bring in input
Is my code idiomatic? Should I be creating separate arrays for each of these steps? Is there any way to have line 2 run in parallel? Each call does take a while and it adds up fast with ~1000 plugins!

Replies are listed 'Best First'.
Re: More Effecient Method Chaining
by ikegami (Pope) on Apr 21, 2021 at 22:20 UTC

    You shell out multiple times. The performance of the Perl code seems inconsequential.

    This post will cover some cleanups (but leaves the important question of parallelism unanswered).


    I don't think

    map { $1 if /^\w+:\s+(\w+)\b/ }

    has a defined behaviour. You want

    map { /^\w+:\s+(\w+)\b/ }

    And the grep { /^\w+:/ } is completely redundant.


    Backticks returns the captured output as lines in list context, so

    split /\n/, `gst-inspect-1.0`

    can be replaced with

    `gst-inspect-1.0`

    It's not equivalent since it leaves the line feeds in, but that's not an issue for you.


    Cleaned:

    my %plugins = map { $_ => scalar `gst-inspect-1.0 $_` } map { /^\w+:\s+(\w+)\b/ } `gst-inspect-1.0`;
    You could even combine the two maps, but I wouldn't.
    my %plugins = map { /^\w+:\s+(\w+)\b/ ? $1 => scalar `gst-inspect-1.0 $1` : () } `gst-inspect-1.0`;

    Seeking work! You can reach me at ikegami@adaelis.com

      > shell out
      Concur, dr. If this external "binary" is a perl program, OP is committing great shame in not at least crafting a modulino and using the functionality in the same process as a simple library call.

        > shell out Concur, dr. If this external "binary" is a perl program, OP is committing great shame in not at least crafting a modulino and using the functionality in the same process as a simple library call.

        it is shameful to call shame

        or promote modulino

Re: More Effecient Method Chaining
by eyepopslikeamosquito (Bishop) on Apr 22, 2021 at 09:21 UTC

    This specific example deals with the command gst-inspect-1.0. You don't need to know what it does, just this. Calling gst-inspect-1.0 without arguments returns a list of plugins. Calling gst-inspect-1.0 with a plugin name returns information about that specific plugin. Each call to gst-inspect-1.0 takes about half a second, give or take (so surprisingly long).
    Please don't assume we don't need to know. Given that running this command takes "surprisingly long", we might be able to offer tips on how to speed it up, for example by avoiding the system shell, or by "batching" the command somehow.

    I further noticed that you are not checking errors when running this external command. A well-behaved command should exit code zero on success, 1-255 for failure, further writing a clear error message to stderr (but you are not checking for any of this).

    Googling found this gst-inspect-1.0 command. Is that the one you are using? From its description:

    gst-inspect-1.0 is a tool that prints out information on available GStreamer plugins, information about a particular plugin, or information about a particular element. When executed with no PLUGIN or ELEMENT argument, gst-inspect-1.0 will print a list of all plugins and elements together with a summary. When executed with a PLUGIN or ELEMENT argument, gst-inspect-1.0 will print information about that plug-in or element.
    I also see it may run on multiple operating systems (Linux, Windows, ...). Do you need it to run on Linux and Windows? Or just a single platform? Is performance of running this command important to you? What about security?

    To give you a feel for where I'm coming from, have a look at these old nodes:

    As you can see, running external commands safely, robustly, portably and efficiently is surprisingly tricky.

Re: More Effecient Method Chaining
by jcb (Parson) on Apr 23, 2021 at 23:14 UTC

    First, run the script as it is with top in another window. We will need to know if gst-inspect-1.0 is actually burning half a second or so of CPU time or just spending most of its time blocked for some strange reason. If it is "blocky", running many instances in parallel will result in a huge performance improvement. If it is wasteful, running many instances in parallel will only help to the limit of your available hardware threads.

    A better option may be to cache the gst plugins in a simple database. There is an SDBM module bundled with Perl that provides tied persistent hashes with a few limitations, most notably that each record (key and value) can be no larger than about 1KB. There are other DBM modules also available, but SDBM is self-contained and the others require relevant libraries be available when Perl is built.

    If DBM is insufficient or unavailable, DBI provides SQL database bindings, with the DBD::SQLite backend bundling the required SQLite.

    I have used both SDBM and SQLite in the past, with the deciding factor being the application. Since you do not mention what you seek to accomplish, I cannot really recommend one over the other for you.