|Perl: the Markov chain saw|
Re: Splitting up quoted/escaped command line argumentsby Tanktalus (Canon)
|on Feb 12, 2014 at 00:33 UTC||Need Help??|
Suggestion: Skip it all, and call system.
Ok, I know. You said that the shell involves a lot of overhead, and it's significant. However, compared to the overhead of ssh and the remote shell and the actual code you're calling on the remote side? Maybe not so much. Would have to benchmark the full thing.
Also, all the work you're doing in pretending to be the shell? You're writing it in perl instead of C. Not sure that'll be a win. The only overhead you're saving is re-initialisation of the C runtime library, and that will get partially eaten up by the fact you're parsing in perl vs the shell in C. Remember the shell parser has two advantages over code you might use: 1. it's written in C, and likely been overoptimised over the years, and 2. it's correct by definition: bugs have been worked out over the last few decades, and your sysadmins are used to those bugs that remain (thinking of them as features, like the method to escape single quotes). If you aren't bug-for-bug compatible with the shell, it'll be you that is wrong, not the sysadmin. You can't win that game, only triage it until the number of bug reports coming in over your misparsing slow to a manageable crawl.
There are other ways to mitigate this. Some of them are crazier than others.
One is to reduce your fork overhead. If your perl process takes up a lot of memory, when you fork and exec, regardless of what it is, that's a lot of CoW memory to free up each time. I've seen the author of AnyEvent::Fork create a small template process that he shunts the work of forking off to. That process is kept as small as possible, and then is instructed by the parent as to what it should fork and exec. He claims a speed up on that.
Another one is to leave the shell open. Basically, open a shell, and run your command there, but leave it open. Something like this:
You can do a bit more here, for example if you use IPC::Open2 or IPC::Open3, you can extract stdout and stderr. You can then encode the return code in the shell output as well. Or, with a bit more work, you can do ; echo $rc >&3 inside there, requiring you to have filehandle 3 opened for it to print the output to so you can receive it. Notice that I'm using parenthesis here in an attempt to limit environment changes, including current working directory.
You can run multiple commands through this shell, eliminating all the startup costs, but maintaining the shell's ability to parse the commands. There is some risk of bleed-through (a command with mismatched parenthesis can ruin your whole day), but you can blame your sysadmins for those :) On the other hand, each command here is serial, though you could have multiple shells open for a job queue to run them in parallel if you so wanted.
Note that something like AnyEvent::Util::run_cmd can make this less difficult to handle, IMO. YMMV. There are likely other similar, or even better, options on CPAN. Finding them and figuring them out is left as an excersise for the reader :)
Note that I currently have a system that runs multiple ssh's in parallel to multiple co-located servers. I had planned to figure out a way to re-use ssh connections as a potential performance bottleneck because I'm calling ssh thousands of times. But I've not gone down that road in the four years I've been doing this because, quite simply, the performance hit has not been significant enough to warrant time spent on that. (That could be very different if I was ssh'ing over VPN to another continent. I don't know. But that's not possible for my current job, so I'm unconcerned with it.) At this point, I might save 5-10 seconds over the course of a 20-hour job. Probably not even that much.