Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Cool Uses for Perl

( #1044=superdoc: print w/ replies, xml ) Need Help??

This section is the place to post your general code offerings.

CUFP's
Identifying scripts (writing systems)
2 direct replies — Read more / Contribute
by AppleFritter
on Sep 16, 2014 at 17:32

    Dear monks and nuns, priests and scribes, popes and antipopes, saints and stowaways lurking in the monastery, lend me your ears. (I promise I'll return them.) I'm still hardly an experienced Perl (user|programmer|hacker), but allow me to regale you with a story of how Perl has been helping me Get Things Done™; a Cool Use for Perl, or so I think.

    I was recently faced with the problem of producing, given a number of lines each written in a specific script (i.e. writing system; Latin, Katakana, Cyrillic etc.), a breakdown of scripts used and how often they appeared. Exactly the sort of problem Perl was made for - and thanks to regular expressions and Unicode character classes, a breeze, right?

    I started by hardcoding a number of scripts to match my snippets of text against:

    my %scripts; foreach (@lines) { my $script = m/^\p{Script=Latin}*$/ ? "Latin" : m/^\p{Script=Cyrillic}*$/ ? "Cyrillic" : m/^\p{Script=Han}*$/ ? "Han" : # ... "(unknown)"; $scripts{$script}++; }

    Obviously there's a lot of repetition going on there, and though I had a list of scripts for my sample data, I wasn't sure new and uncontemplated scripts wouldn't show up in the future. So why not make a list of all possible scripts, and replace the hard-coded list with a loop?

    my %scripts; LINE: foreach my $line (@lines) { foreach my $script (@known_scripts) { next unless $line =~ m/^\p{Script=$script}*$/; $scripts{$script}++; next LINE; } $scripts{'(unknown)'}++; }

    So far, so good, but now I needed a list of the scripts that Perl knew about. Not a problem, I thought, I'll just check perluniprops; the list of properties Perl knows about was staggering, but I eventually decided that any property of the form "\p{Script: ...}" would qualify, so long as it had short forms listed (which I took as an indication that that particular property was the "canonical" form for the script in question). After some reading and typing and double-checking, I ended up with a fairly long list:

    my @known_scripts = ( "Arabic", "Armenian", "Avestan", "Balinese", "Bamum", "Batak", "Bengali", "Bopomofo", "Brahmi", "Br +aille", "Buginese", "Buhid", "Canadian_Aboriginal", "Carian", "Chakma", "Cham", "Cherokee", "Coptic", "Cuneiform", "Cypriot", "Cyrillic", # ... );

    Unfortunately, when I ran the resulting script, Perl complained:

    Can't find Unicode property definition "Script=Chakma" at (...) line ( +...)

    What had gone wrong? Versions, that's what: I'd looked at the perluniprops page on perl.org, documenting Perl 5.20.0, but this particular Perl was 5.14.2 and didn't know all the scripts that the newer version did, thanks to being built against an older Unicode version. Now, I could've just looked at the locally-installed version of the same perldoc page, but - wouldn't it be nice if the script automatically adapted itself to the Perl version it ran on? I sure reckoned it'd be.

    What scripts DID the various Perl versions recognize, anyway? What I ended up doing (perhaps there's an easier way) was to look at lib/unicore/Scripts.txt for versions 5.8, 5.10, ..., 5.20 in the Perl git repo (I skipped 5.6 and earlier, because a) the relevant file didn't exist in the tree yet back then, and b) those versions are ancient, anyway). And by "look at", I mean download (as scripts-58.txt etc.), and then process:

    $ for i in 8 10 12 14 16 18 20; do perl scripts.pl scripts-5$i.txt >5$ +i.lst; done $ for i in 8 10 12 14 16 18; do diff --unchanged-line-format= --new-li +ne-format=%L 5$i.lst 5$((i+2)).lst >5$((i+2)).new; done $

    scripts.pl was a little helper script to extract script information (apologies for the confusing terminology, BTW):

    #!/usr/bin/perl use strict; use warnings; use feature qw/say/; my %scripts; while(<>) { next unless m/; ([A-Za-z_]*) #/; $scripts{$1}++; } $, = "\n"; say sort { $a cmp $b } map { $_ = ucfirst lc; $_ =~ s/(?<=_)(.)/uc $1/ +ge; qq/"$_"/ } keys %scripts;

    I admit, I got lazy at this point and manually combined those files (58.lst, as well as 510.new, 512.new etc.) into a hash holding all the information, instead of having a script output it. Nonetheless, once this was done, I could easily load all the right scripts for a given Perl version:

    # New Unicode scripts added in Perl 5.xx my %uniscripts = ( '8' => [ "Arabic", "Armenian", "Bengali", "Bopomofo", "Buhid", "Canadian_Aboriginal", "Cherokee", "Cyrillic", "Deseret", "Devanagari", "Ethiopic", "Georgian", "Gothic", "Greek", "Guja +rati", "Gurmukhi", "Han", "Hangul", "Hanunoo", "Hebrew", "Hiragana", "Inherited", "Kannada", "Katakana", "Khmer", "Lao", "Latin", "Malayalam", "Mongolian", "Myanmar", "Ogham", "Old_Italic", "O +riya", "Runic", "Sinhala", "Syriac", "Tagalog", "Tagbanwa", "Tamil", "Telugu", "Thaana", "Thai", "Tibetan", "Yi" ], '10' => [ "Balinese", "Braille", "Buginese", "Common", "Coptic", "Cuneif +orm", "Cypriot", "Glagolitic", "Kharoshthi", "Limbu", "Linear_B", "New_Tai_Lue", "Nko", "Old_Persian", "Osmanya", "Phags_Pa", "Phoenician", "Shavian", "Syloti_Nagri", "Tai_Le", "Tifinagh", "Ugaritic" ], '12' => [ "Avestan", "Bamum", "Carian", "Cham", "Egyptian_Hieroglyphs", "Imperial_Aramaic", "Inscriptional_Pahlavi", "Inscriptional_Parthian", "Javanese", "Kaithi", "Kayah_Li", "Lepcha", "Lisu", "Lycian", "Lydian", "Meetei_Mayek", "Ol_Chik +i", "Old_South_Arabian", "Old_Turkic", "Rejang", "Samaritan", "Saurashtra", "Sundanese", "Tai_Tham", "Tai_Viet", "Vai" ], '14' => [ "Batak", "Brahmi", "Mandaic" ], '16' => [ "Chakma", "Meroitic_Cursive", "Meroitic_Hieroglyphs", "Miao", "Sharada", "Sora_Sompeng", "Takri" ], '18' => [ ], '20' => [ ], ); (my $ver = $^V) =~ s/^v5\.(\d+)\.\d+$/$1/; my @known_scripts; foreach (keys %uniscripts) { next if $ver < $_; push @known_scripts, @{ $uniscripts{$_} }; } print STDERR "Running on Perl $^V, ", scalar @known_scripts, " scripts + known.\n";

    The number of scripts Perl supports this way WILL increase again soon, BTW. Perl 5.21.1 bumped the supported Unicode version to 7.0.0, adding another bunch of new scripts as a result:

    # tentative! '22' => [ "Bassa_Vah", "Caucasian_Albanian", "Duployan", "Elbasan", "Gra +ntha", "Khojki", "Khudawadi", "Linear_A", "Mahajani", "Manichaean", "Mende_Kikakui", "Modi", "Mro", "Nabataean", "Old_North_Arabia +n", "Old_Permic", "Pahawh_Hmong", "Palmyrene", "Pau_Cin_Hau", "Psalter_Pahlavi", "Siddham", "Tirhuta", "Warang_Citi" ],

    But that's still in the future. For now I just tested this on 5.14.2 and 5.20.0 (the two Perls I regularly use); it worked like a charm. All that was left to do was outputting those statistics:

    print "Found " . scalar keys(%scripts) . " scripts:\n"; print "\t$_: " , $scripts{$_}, " line(s)\n" foreach(sort { $a cmp $b } + keys %scripts);

    (You'll note that in the above two snippets, I'm using print rather than say, BTW. That's intentional: say is only available from Perl 5.10 on, and this script is supposed to be able to run on 5.8 and above.)

    Fed some sample data that I'm sure Perlmonks would mangle badly if I tried to post it, this produced the following output:

    Running on Perl v5.14.2, 95 scripts known. Found 18 scripts: Arabic: 21 line(s) Bengali: 2 line(s) Cyrillic: 12 line(s) Devanagari: 3 line(s) Georgian: 1 line(s) Greek: 1 line(s) Gujarati: 1 line(s) Gurmukhi: 1 line(s) Han: 29 line(s) Hangul: 3 line(s) Hebrew: 1 line(s) Hiragana: 1 line(s) Katakana: 1 line(s) Latin: 647 line(s) Sinhala: 1 line(s) Tamil: 4 line(s) Telugu: 1 line(s) Thai: 1 line(s)

    Problem solved! And not only that, it's futureproof now as well, adapting to additional scripts in my input data, and easily extended when new Perl versions support more scripts, while maintaining backward compatibility.

    What could still be done? Several things. First, I should perhaps find out if there's an easy way to get this information from Perl, without actually doing all the above.

    Second, while Perl 5.6 and earlier aren't supported right now, they could be. Conveniently, the 3rd edition of Programming Perl documents Perl 5.6; the \p{Script=...} syntax for character classes doesn't exist yet, I think, but one could write \p{In...} instead, e.g. \p{InArabic}, \p{InTamil} and so on. Would this be worth it? Not for me, but the possibility is there if someone else ever had the need to run this on an ancient Perl. (Even more ancient Perls may not have the required level of Unicode support for this, though I wouldn't know for sure.)

    Lastly, since the point of this whole exercise was to identify writing systems used for snippets of text, there's room for optimization. Perhaps it would be faster to precompile a regular expression for each script, especially if @lines is very large. Most of the text I'm dealing with is in the Latin script; as such, I should perhaps test for that before anything else, and generally try to prioritize so that lesser-used scripts are pushed further down the list. Since I'm already keeping a running total of how often each script has been seen, this could even be done adaptively, though whether doing so would be worth the overhead in practice is another question, one that could only be answered by measuring.

    But neither speed nor support for ancient Perls is crucial to me, so I'm done. This was a fun little problem to work on, and I hope you enjoyed reading about it.

Mojolicious starting template
1 direct reply — Read more / Contribute
by neilwatson
on Sep 05, 2014 at 14:03

    I like Mojolicious, but it was hard to learn. More than six months later I still feel I'm just scratching the surface. So, what I'm about to offer may not be great, but it is as far as I've come. You'll still need to study the Mojolicious documentation, but you can start with this rather than nothing.

    Template here.

    And a tip of the hat to Sebastian and his team mates, who have answered my novice questions and have been quick to improve the documentation to help newbies like me.

    Neil Watson
    watson-wilson.ca

Commodore disk image processor thingy
5 direct replies — Read more / Contribute
by rje
on Sep 01, 2014 at 19:57
    Dear Perlmonks,

    I wrote a Perl library, and I think it's pretty cool, but I'm also asking your opinions about it - is it worth putting on CPAN, for instance.

    It is a pure-Perl library for handing Commodore disk images. For those needing a refresher, these are digital images of diskettes and hard disks used by Commodore computers in the late 1970s thru the 1980s.

    It's hosted inside my network, behind my modem's firewall, by my Raspberry Pi running a cheapo web server I wrote (also in Perl) specifically for the purpose of serving and manipulating Commodore disk images.

    My library handles D64, D71, D81, D67, D80, D82, and X64 image types. Each format is a little package (about 8k) with data specific to that image. I made them packages, although I could have just used parametric data. These packages are essentially parametric data anyhow, and provide context to a generic engine that knows how Commodore disk images work.

    The library is 140k (includes good POD documentation, which is rare for me) split among about 20 files.

    First, is it worth posting to CPAN. It's awfully specialized. Maybe it would be better just to post it as a tarball on a website (or github?).

    Second, it's been nearly 10 years since I've uploaded to CPAN, and I am intimidated by the process. Yes, I've read the rules, but I'm concerned about uploading 20 related files in one batch. Anyone have any advice beyond what PAUSE has to say?

    Thanks for listening.

Duct taping spam-bot protection to a web forum
1 direct reply — Read more / Contribute
by aitap
on Aug 31, 2014 at 10:24

    There is a free web hosting which offers a third-level domain name and an installation of their proprietary CMS. It is somewhat widely known in the ex-USSR countries. They have "Web 2.0" AJAX interface, a lot of modules for nearly everything, from a simple forum to a web shop, and a primitive read-only API. I happen to be moderating one of such forums. Despite not being popular in terms of human population it has recently gained a lot of popularity among spam-sending robots.

    At first they all were making the same mistake of posting messages with titles equal to their nicknames, and so the first version of bothunter.pl was born. It employed link parsing routines of WWW::Mechanize and reproduced a sniffed AJAX request by some black magic of parsing JavaScript source for variables. Needless to say, soon it broke, both because JavaScript source slightly changed and because bots became slightly smarter, so the moderators went back to deleting bots manually.

    Yesterday I thought: with PhantomJS, I could mimic the browser and click all these AJAX buttons required to ban a user. As for the spam, maybe it's possible to count unique words in a message and warn if some of them is repeated a lot, and a list of stop-words could help, too... Before I started thinking of ways to automatically build a list of stop words from spam messages I realised that I was reiventing the wheel and searched for spam detection engines.

    My first try was Mail::SpamAssassin, because it's written in Perl and I heard a lot of stories about plugging it into other programs. It turned out to be not so easy to make it work with plain text (non-mail) messages, so I searched for alternatives. It is Mail::SpamAssassin, after all. Bogofilter is not written in Perl, but still was easy to plug in my program, thanks to its -T option, and it happily works with plain text without complaining.

    Interfacing with the site was not so easy. Banning a spam robot (click-click-tab-tab-"spam robot"-tab-space-tab-space) exploded into a mess of ->clicking xpath-found elements; at one time the site refused to register my click no matter how I tried, so I had to call the corresponding JS function manually; in the other place of program I find myself logged out, and the only way to get back in is to load the page I'm going to delete, load the login page, log in, then load the to-be-deleted page again. Kludges. Ew.

    So, here it is: the second version of bot hunter Perl program. I hope it won't break as fast as the first one. Or, at least, will be easier to fix.

LED blinking Morse code from Raspberry Pi
3 direct replies — Read more / Contribute
by rje
on Aug 30, 2014 at 11:23
    With the handy Device::BCM2835 package, I tossed together this little package so I could send "morse code" out on an LED I connected to GPIO Pins #5 and #7.

    (FYI: the Raspberry Pi can run a number of Linux systems, all with Perl. I wrote a tiny HTTP server on it which lets me create and serve Commodore disk images, also allowing me to extract and inject files, in Perl of course. It runs behind my firewall...)

'bld' project - signature(SHA1) based replacement for 'make'
4 direct replies — Read more / Contribute
by rahogaboom
on Aug 22, 2014 at 15:14
    'bld' is entirely in Perl. 'bld' is a replacement for the 'make' command. It is based on determi +ning out of dateness by signatures(SHA1) not dates. For a critique of 'make' and why you woul +d want to do this see: http://www.scons.org/wiki/FromMakeToScons. Installing 'bld' is very simple. Download bld-1.0.0.tar.xz from https +://github.com/rahogaboom/bld. Unpack wherever in your home directory and install the experimental.pm + Perl module. Make sure you have access to GNU 'cpp' and 'ldd'. To run the examples +(examples, git, svn, systemd) you'll need gcc(1)/g++(1) (http://gcc.gnu.org/) and clang(1) (http://l +lvm.org/). That's It! I used the git, svn and systemd projects as complex multi-target examp +les of how bld would be used to re-'make' these projects. They are well known and widely used. An +y other projects might do. Read the bld.README file. Do './bld -h'. Do 'perldoc bld'. Do './bld' to build the exec-c executable "Hello, world!" program. Th +is creates the bld.info, bld.warn and Bld.sig files which along with the Bld file + gives an illustration of how to construct Bld files and the output that bld + creates. I plan on adding an App::bld distribution to CPAN. The following is an abbreviated version of 'perldoc bld': NAME bld VERSION bld version 1.0.0 USAGE usage: bld [-h] -h - this message.(exit) ARGUMENTS None OPTIONS bld [-h] -h help message(exit) ENVIRONMENT VARIABLES None RC CONFIGURATION FILES None DEPENDENCIES Required for execution: experimental.pm(3pm) - for smartmatch and switch features cpp(1) - gnu cpp cmd is required for dependency determination ldd(1) - used for library dependency determination Required for test: gcc(1)/g++(1) (http://gcc.gnu.org/) clang(1) (http://llvm.org/) FEATURES AND ADVANTAGES 1. Everything is done with SHA1 signatures. No dates are used an +ywhere. Signatures are a property of the file and not meta data from the system used for the build. An +y time issues, whether related to local clocks, networked host clocks or files touched by command acti +vities are eliminated. Modern signature algorithms are strongly randomized even for small file changes + - for the 160 bit SHA1 hash collisions are unlikely in the extreme. The Digest::SHA module is fast. The + expense of signature calculation times is small relative to the expense of programmer time. 2. bld is REALLY simple to use. There are no arguments, no optio +ns(except -h), no environment variables and no rc files. The entire bld is controlled from the Bld(and Bl +d.gv file) file. Only a minimal knowledge of perl is needed - variable definitions and simple regular ex +pressions. 3. Automatic dependency checking - GNU cpp is used to find the he +ader file dependencies. Optionally, header file checking may be done for user header files only or for si +multaneously both system header and user header files. All header file dependency information associat +ed with each source is saved to the bld.info file. 4. bld is not hierarchical. A single Bld file controls the const +ruction of a single target(a target is an executable or library(static or shared)). Complex multi-targe +t projects use one Bld.gv(global values) file and many Bld files - one to a target. 5. Each source file will have three signatures associated with it + - one for the source file, one for the corresponding object file and one for the cmds use to rebuild +the source. A change in any of these will result in a rebuild. A change in the target signature will re +sult in a rebuild. Optionally, the signatures of dynamic libraries may be tracked. If a library +signature changes the bld may warn or stop the rebuild. If dynamic libraries are added or deleted from t +he bld this can ignore/warn/fatal. 6. If any files in the bld have the same signature this is warned + about e.g. two header or source files of the same or different names. 7. Complex multi-target projects are built with a standard direct +ory setup and a standard set of scripts - Directories: Bld.<project>/<version> - has all files controlling +<project> <version>s blds and bld target output files bld.<project>/<version> - source code for <project> +<version>s Files: bld.<project> - for initiating single targ +et, multi-target or all target blds of a <project> bld.<project>.rm - for initiating single targ +et, multi-target or all target clean of a <project> bld.<project>.targets - list of all <project> targ +ets bld.<project>.README - <project> README bld.<project>.install - <project> install script bld.<project>.script.<script> - scripts called by the Bld. +<project>.<target> files Bld.<project>.<target> - the Bld file for each <pro +ject> <target> Bld.gv.<project> - global values imported int +o all Bld.<project>.<target> files 8. Security - since the signatures of everything(source, objects, + libraries, executable) are checked it is more difficult to insinuate an exploit into an object or libra +ry during the build process. 9. The capture of the full build process in the bld.info, bld.war +n and bld.fatal files allows easy access to and saving of this information. For multi-target projects wit +h the target names appended to these files it allows quick investigation of the build process of many int +errelated targets at the same time. 10. Perl - since bld is all perl and since all warnings and fatal +s have the source line number associated with them, it is very easy to locate in the souce code the exact l +ocation of an error and examine the context about which the error occurred and routine that the error was + produced in. 11. Time - programmer time; learning about, maintaining/debugging + Makefiles and Makefile hierarchies, dependency checking integration and formulation of Makefile strategies, +automatic Makefile generation with Autotools - these all dominate the programmer time and expense of 'make'. + bld only requires basic perl variables(in the Bld file EVAL section) and dir:regex:{} line specificatio +ns(in the Bld file DIRS section). 12. 'make' and it's difficulties: http://www.scons.org/wiki/FromMakeToScons - a detailed critique of make and some alternatives ftp://ftp.gnu.org/old-gnu/Manuals/autoconf/html_mono/auto +conf.html#SEC3 - a brief critique of make and how GNU automake from th +e GNU Build System contributes PROJECT STATE State: 1. The code is mostly done - unless someone finds a bug or sugges +ts an enhancement. 2. The in code documentation is done. 3. The testing is 80%-90% done. 4. The usage msg is done - the perldoc is 50%-60% done, needs a l +ot of work. Needed: 1. The code is in very good shape unless someone discovers a bug +or suggests an enhancement. My current focus is on the documentation and testing. 2. The git, svn and systemd projects need work. I ran ./configur +e before each bld. I used no options. How options affect the generated code and thus th +e Bld files is important. Anyone willing to investigate configure options and how these +options affect the Bld files is welcome. 3. The bld.<project>.install scripts all need to be done. I'd pr +efer to partner with someone knowledgeable about the installation of git, svn and systemd. 4. All the Bld.gv.<project> files should be vetted by a <project> + knowledgeable builder. 5. The git, svn and systemd projects will all be creating new ver +sions eventually. Anyone that would like to add bld.<project>/<version> and Bld.<projec +t>/<version> directories with the new versions is welcome. 6. I need someone with substantial experience building the linux +kernel to advise me or partner with me on the construction of 3.15 or later. 7. If you successfully bld a new project and wish to contribute t +he bld, please do so. I'm interested in how others construct/organize/document/debug pro +jects and their Bld files. DESCRIPTION bld(1.0.0) is a simple flexible non-hierarchical program that bui +lds a single C/C++/Objective C /Objective C++/Assembler target(executable or library(static or s +hared)) and, unlike 'make', uses SHA1 signatures(no dates) for building software and GNU cpp for automa +tic header file dependency checking. The operation of bld depends entirely on the construct +ion of the Bld(bld specification) and Bld.gv(bld global values) files. See the bld.README file. T +here are no cmd line arguments or options(except for -h(this msg)) or $HOME/.bldrc or ./.bldrc file +s and no environment variables are used. Complex multi-target projects are bld't with the use of a +Bld.<project> (Bld files and target bld output files) directory, bld.<project>(project source) + directory, bld.<project>(target construction) script, bld.<project>.rm(target and bld.<info|warn| +fatal>.<target> file removal) script, Bld.<project>.gv(project global values) file, bld.<projec +t>.install(target and file install) script and bld.<project>.README(project specific documen +tation) file. Current example projects: Bld.git - the git project http://git-scm.com/ Bld.svn - the subversion project http://subversion.apache.org +/ Bld.systemd - the systemd project http://www.freedesktop.org/ +wiki/Software/systemd/ Bld.example - misc examples intended to show how to create Bl +d and Bld.gv files bld is based upon taking the SHA1 signature of anything that, whe +n changed, would require a rebuild of the executable/library. It is not, like 'make', based + in any way on dates. This means that source or header files may be moved about, and if the +files do not change then nothing needs to, or will, be rebuilt. bld is not hierarchical; +all of the information to rebuild the executable is contained in the Bld(and Bld.gv) file. + The rebuild is based on Perl's regex engine to specify source file patterns along with the Perl +eval{} capability to bring variable definitions from the Bld file into the source. bld reads the Bld file which describes the build. This example B +ld file serves for the following discussion: Program description and Bld file explanatory comments go here. +(and are ignored by bld) EVAL DIRS The Bld file has three sections , a starting comment section to d +ocument the Bld, an EVAL and DIRS. Variables to be used for interpolation into build commands are de +fined in the EVAL section. The variables are all Perl variables. The entire EVAL section is + eval{}'ed in bld. Any errors will terminate the run. The DIRS section has three field( +: 0) lines which are the directory, the matched files to a Perl regular expression, an +d a build command for the line matched files. EVAL section variable definitions are interpolate +d into the build commands. bld will execute "$cmd $dir/$s"; for each source file, with $cmd +from the interpolated third field, $dir from the first field, and $s from the matched source +second field of the DIRS section lines. Rebuilds will happen only if: 1. a source file is new or has changed 2. the corresponding object file is missing or has changed 3. the command that is used to compile the source has changed 4. a dependent header file has changed 5. the command to link the executable or build the library arc +hive has changed 6. the executable or library has changed or is missing The Bld.sig file, automatically built, holds the source/object/he +ader/executable/library file names and the corresponding signatures used to determine if a sou +rce should be rebuilt the next time bld is run. Normally, system header files are included + in the rebuild criteria. However, with the -s switch, signature testing of these files can + be disabled to improve performance. It is unusual for system header files to change exc +ept after a new OS installation. add description of directory structure - o dir - build dir QUICK START 1. Bld'ing the systemd project - http://www.freedesktop.org/wiki/ +Software/systemd/ a. cd Bld.systemd/systemd-208 # puts you into the systemd(syst +emd-208) project directory b. ./bld.systemd --all # bld's all of the systemd targe +ts and bld target output files - the bld.info.systemd.<targ +et>, the bld.warn.systemd.<targ +et>, the bld.fatal.systemd.<tar +get>, files c. ./bld.systemd.rm --all # cleans up everything 2. Bld'ing the svn project - https://subversion.apache.org/ a. cd Bld.svn/subversion-1.8.5 # puts you into the svn(subvers +ion-1.8.5) project directory b. ./bld.svn --all # bld's all of the svn targets +and bld target output files - the bld.info.svn.<target> +, the bld.warn.svn.<target> +, the bld.fatal.svn.<target +>, files c. ./bld.svn.rm --all # cleans up everything 3. Bld'ing the git project - http://www.git-scm.com/ a. cd Bld.git/git-1.9.rc0 # puts you into the git(git-1.9.rc0) + project directory b. ./bld.git --all # bld's all of the git targets and b +ld target output files - the bld.info.git.<target>, the bld.warn.git.<target>, the bld.fatal.git.<target>, files c. ./bld.git.rm --all # cleans up everything 4. Bld'ing any single target a. cd bld # the main bld directory - cd here when you un +pack the bld.tar.xz file b. Install the source code in a sub-directory of the bld direc +tory c. Create a Bld file - the Bld file entirely controls the targ +et bld - see example below d. ./bld -h # the bld usage msg e. ./bld # do the bld f. ./bld.rm # clean up g. vi Bld.sig # examine the bld signature file h. vi bld.info # detailed info about the stages of the bld i. vi bld.warn # warning msgs from the bld j. vi bld.fatal # fatal msgs that terminated the bld - should +be empty if bld is successful FILES ~/bld directory files: bld - the bld perl script bld.rm - script to clean the bld directory bld.README - for first point of contact quick start Bld - the bld file which controls bld and the construction + of a target Bld.gv - the file of global values imported into the Bld file +(unusually used only for multi-target builds) Bld.sig - the signature(SHA1) file created from the Bld file bld.info - information about the bld bld.warn - warnings from the bld bld.fatal - the fatal msg that ended the bld ~/bld directories: Bld.<project>/<version> - has all files controlling <project> <ve +rsion>s blds and bld target output files bld.<project>/<version> - source code for <project> <version>s aux - template scripts for <project> blds ~/bld/aux files: aux/bld.<project> - template copied to Bld.<project>/<versi +on> directories to bld multi-target projects aux/bld.<project>.rm - template copied to Bld.<project>/<versi +on> directories to clean multi-target projects ~/bld/Bld.<project>/<version> files: bld.<project> - for initiating single target, mul +ti-target or all target blds of a <project> bld.<project>.rm - for initiating single target, mul +ti-target or all target clean of a <project> bld.<project>.targets - list of all <project> targets bld.<project>.README - <project> README bld.<project>.install - <project> install script bld.<project>.script.<script> - scripts called by the Bld.<projec +t>.<target> files Bld.<project>.<target> - the Bld file for each <project> < +target> Bld.gv.<project> - global values imported into all B +ld.<project>.<target> files Bld.sig.<project>.<target> - the signature(SHA1) file for each + <project> <target> bld.info.<project>.<target> - the bld.info file for each <proje +ct> <target> bld.warn.<project>.<target> - the bld.warn file for each <proje +ct> <target> bld.fatal.<project>.<target> - the bld.fatal file for each <proj +ect> <target> bld.<project>.targets - all of the <project> targets PRIMARY PROGRAM DATA STRUCTURES TBD NOTES 1. bld assumes that a source will build a derived file e.g. .o fi +les in the same directory and have the same root name as the source. 2. bld assumes that all targets in multi-target bld's will be uni +quely named - all targets go into the same project directory. 3. Some projects violate either or both of these target naming or + object file naming/location requirements, but reconstructing these projects with bld shoul +d be relatively easy e.g. systemd. 4. bld executes cmd fields({}) in the bld directory and then move +s all created files to the source directory. Very old notes - needs updating: 1. bld uses two adjunct files: Bld and Bld.sig. The Bld file +describes all the sources, their locations, and the rules for building them. Since bl +d is not hierarchical the Bld file is the only place where build information is locat +ed. The Bld file consists of two subsection types: EVAL and DIRS. The file must begi +n with one EVAL section. The EVAL section is Perl code that is eval{}'ed by bld. Va +riables that the user wants to have interpolated into source file build commands are de +fined here. Three variables are mandatory: $exec(the executable name), $link(the execut +able building rule), and $libs(the executable libraries spec). Other variables that + are to be used in the expansion of source file build commands are defined here. +The EVAL keyword must start the line. An eval{} error in the EVAL section will cause t +he termination of the program. A DIRS section is specified after the EVAL sectio +n and at the end of the Bld file. The DIRS keyword must start the line. This section +consists of whitespace lines, which are ignored, and build specification lines with three + colon separated fields. The fields are the directory relative to the bld executable dir +ectory, a Perl regular expression including /'s selecting the source files that th +is line is to control the building of, and the build command for the selected source +files. A directory may have several DIRS section lines selecting for different source f +iles with different build commands. Any variables in the build commands are interpol +ated from variables defined in the EVAL section. All unnecessary whitespace in the com +mand third field of DIRS section lines is compressed out. Thus changing the spacing +(spaces, tabs, and \n) of the command field will not affect any source rebuild. If a +ny source file in any of the directories is not selected by some regular expression, it +is not included in the build. A simple and an extended example of Bld files is given belo +w. 2. The Bld.sig signature file is automatically created and upd +ated. It contains one line for each source, one line for each header file, and one lin +e for the executable. The header file lines and the executable line have two fields: +the file name and its signature. The source lines have three or four fields: the + file name, the signatures of the source file, the command use to build the source, an +d the object file. The user can modify this file to force the rebuild of files by +altering the signature or even by deleting a line, however, any modification to a sou +rce or header file, or build command string will do the same thing. 3. The user should always run bld -b after completion of a new + executable Bld file. This option processes the Bld file and prints the expanded +build command and all source files that each specification line selects for. Thu +s, the user can see exactly what is going to be included in each build and how +all files are going to be rebuilt. Any variables defined for use in the build rul +es are also printed out. 4. A bld run may be ^C interrupted. When a normal uninterrupt +ed run is completed the Bld.sig file is rewritten using only those files that were +included in the build. Thus, if files were added or deleted this would reflect in +the new Bld.sig file. If a run is interrupted, all files read in from the original Bld +.sig file and any new files already built are written out to the Bld.sig file. This en +sures that new files already built and old files not yet examined will not be rebuilt. +This may result in some entries in the Bld.sig file that no longer exist, but this +will be corrected at the end of the next normally completed run. 5. The -h, -r options exit without rebuilding the executable. 6. In the Bld file anything before the EVAL is ignored. This +allows Bld file explanatory comments to be inserted at the start of this file. 7. There are no imported environment variables. The mkdir, mv +, and cpp commands must be accessible, as well as, the commands in $cmd and $link v +ariables. 8. A non C or C++ source will be rebuilt if its build command +has been changed or the source file itself has been changed. The rebuilt output wi +ll be put back in the directory where the source came from under the assumption t +hat it will be a C file that will then need itself to be rebuilt later on in the bu +ild sequence. Bld FILE FORMAT The Bld file(and Bld.gv) controls the entire target bld. It is d +ivided into three sections - Comment(s), EVAL and DIRS: Add comments before the EVAL line EVAL # mandatory defined variables $bld=""; $bldcmd = ""; $lib_dirs = ""; $opt_s = ""; $opt_r = ""; $opt_lib = ""; DIRS # {cmds} cmd blocks or dir:regex:{cmds} specifications {cmds} dir:regex:{cmds} dir:regex:{cmds} ... 1. a comment section 2. An EVAL(starts a line) section - this is perl code that is eva +l'ed in bld. Six variables are required. These are: e.g. EVAL # mandatory defined variables # the target to built e.g. executable, libx.a, libx.so $bld="exec-c"; # cmd used in perl system() call to build $bld target +- requires '$bld'(target) and '$O'(object files) internally $bldcmd = "$CC -lm -o \$bld \$O"; # space separated list of directories to search for li +braries $lib_dirs = "example/lib /usr/lib /lib /usr/local/lib" +; # use system header files in dependency checking("syst +em" or "nosystem") $opt_s = "system"; # inform about any files that will require rebuilding, + but do not rebuild("rebuild" or "norebuild") $opt_r = "rebuild"; # do dependency checking on libraries("libcheck", "nol +ibcheck", "warnlibcheck" or "fatallibcheck") $opt_lib = "fatallibcheck"; Any other simple perl variables can be defined in the EVAL + section and used in the DIRS section. Environment variables may be set. 3. A DIRS(starts a line) section - this section will have either +{cmds} cmd blocks or dir:regex:{cmds} specifications. The {cmds} blocks are just a group of shell cmds, always execu +ted. A dir specification is a source directory relative to the bld directory. The regex specification is a perl regul +ar expression that will pick up one or more of the source files in dir. The {cmds} specification describes how t +o build the selected source files. Any number of cmds, ';' separated, may be specified within the {} brackets. Example Bld Files: Simplest(Bld.example/example/Bld.example.helloworld-c): The 'Hello World!' program with only the minimal required +definitions. Comment(s) EVAL $CC = "gcc"; # mandatory defined variables # the target to built e.g. executable, libx.a, libx.so $bld="helloworld-c"; # cmd used in perl system() call to build $bld target +- requires '$bld'(target) and '$O'(object files) internally $bldcmd = "$CC -o \$bld \$O"; # space separated list of directories to search for li +braries $lib_dirs = "/usr/lib /lib /usr/local/lib"; # use system header files in dependency checking("syst +em" or "nosystem") $opt_s = "system"; # inform about any files that will require rebuilding, + but do not rebuild("rebuild" or "norebuild") $opt_r = "rebuild"; # do dependency checking on libraries("libcheck", "nol +ibcheck", "warnlibcheck" or "fatallibcheck") $opt_lib = "warnlibcheck"; DIRS bld.example/example : ^helloworld\.c$ : { $CC -c $s; } Complex(Bld.example/example/Bld.example.exec-c): A well commented example of all of the features of a Bld f +ile. The code routines are all just stubs designed to illustrate a Bld file. Comment(s) EVAL # this section will define perl variables to be interpolat +ed into DIRS section cmd fields # the compiler $CC = "clang"; # mandatory defined variables # the target to built e.g. executable, libx.a, libx.so $bld="exec-c"; # cmd used in perl system() call to build $bld target +- requires '$bld'(target) and '$O'(object files) internally $bldcmd = "$CC -lm -o \$bld \$O"; # space separated list of directories to search for li +braries $lib_dirs = "example/lib /usr/lib /lib /usr/local/lib" +; # use system header files in dependency checking("syst +em" or "nosystem") $opt_s = "system"; # inform about any files that will require rebuilding, + but do not rebuild("rebuild" or "norebuild") $opt_r = "rebuild"; # do dependency checking on libraries("libcheck", "nol +ibcheck", "warnlibcheck" or "fatallibcheck") $opt_lib = "fatallibcheck"; # some examples of variables that will be interpolated int +o DIRS section cmd fields $INCLUDE = "-I bld.example/example/include"; $LSOPTIONS = "-l"; # "a" or "b" to conditionally compile main.c $COND = "a"; DIRS # this section will have either {cmds} cmd blocks or dir:r +egex:{cmds} specifications # example of use of conditional compilation bld.example/example/C : ^main\.c$ : { # can have comments here too if [ "$COND" == 'a' ]; then $CC -S $INCLUDE $s; fi if [ "$COND" == 'b' ]; then $CC -O4 -S $INCLUDE $s; fi } # example of execution of a bare block of cmds - '{' and ' +}' may be on separate lines { ls $LSOPTIONS; } # the cmd field may be put on another line(s) and indented bld.example/example/C : ^g\.x\.C$ : { $CC -c $INCLUDE $s; } # all three fields - dir, regex and cmd - may be put on se +parate lines(even with extra blank lines). # directories may have embedded blanks('a b'). bld.example/example/C/a b : ^m\.c$ : {$CC -c $INCLUDE $s;} # example of regex field that captures multiple source fil +es(h.c and i.c) and example of a # cmd field with multiple cmds - white space is irrelevant +(a change should not cause a rebuild) # example of cmd fields with multiple cmds(ls and $CC) bld.example/example/C : ^(h|i)\.c$ : { ls -l $s; +$CC -c $INCLUDE $s; } # example of assembler source # Note: the $CC compile produces .o output by changing the + c to an o. # the as output needs to be specified by the -o opti +on. bld.example/example/C : ^main\.s$ : {as -c -o main.o $ +s;} bld.example/example/C/ww : ^u\.c$ : {$CC -c $INCLUDE $ +s;} # example of use of recursive directory search - the same +regex and cmd fields # are applied to all subdirectories of the specified dir f +ield(right after the 'R') R bld.example/example/C/y : ^.*\.c$ : {$CC -c $INCLUDE $ +s;} bld.example/example/C/x : ^t\.c$ : {$CC -c $INCLUDE $ +s;} bld.example/example/C/z : ^(w|w1)\.c$ : {$CC -c $INCL +UDE $s;} # cmd blocks may execute multiple cmds(ls and pwd) { ls -lfda; pwd; ls; } DIAGNOSTICS Warnings(Warning ID(WID)): # routine: rebuild_target_bool() # ldd return has a library 'not found' entry warning("WID 1: ldd return: $libname library is 'not found'"); . . . Fatals(Fatal ID(FID)): # routine: main bld program block # bad option entered - only '-h' allowed fatal("FID 1: GetOptions() failed(use bld -h)."); . . . TODO/CONTEMPLATE/INVESTIGATE/EXAMINE/CHECKOUT/THINK ABOUT/HACK ON Software Development: . investigate `cd Bld.systemd/systemd-208; ldd ` cd before ldd . investigate elimination of -L(use lib_dirs?) option in Bld.<pro +ject>.<target> files . check: is $lib_dirs working? does this require a full path? . Bld.example/example/Bld.example.rdx-c - mod rdx to delete() and + search() on from one to n keys - do not require the full set of keys - should be easy Test: . bld - install kernel - think about automation of 'make V=1' to +Bld file . test execution of arbitrary perl code in Bld.<project>.gv and E +VAL sections . do a 'make V-1 install' in the git project and see what it take +s to install git . add new version of one of git/subversion/systemd to show how mu +ltiple versions of <project> are bld't . run on Mac . test library changes . investigate running under the debugger and profiling . Unit Tests: . empty Bld file . Bld file with both, in order, EVAL and DIRS lines . EVAL section is correct(required variables) and DIRS secti +on is empty . EVAL section is empty(required variables) and DIRS section + is correct(non-empty) . EVAL section is empty(required variables) and DIRS section + is empty Doc: . add explanation of aux directory . go thru code line by line and do code scrub, comment scrub and +make list of features and advantages . do bld perldoc . doc extensions.pl and clang-gcc-headers.pl . Bld file with only a {} block will execute the block but fail b +ecause of no .o files to build the target . make list of install component requirements: 1. experimental.pm 2. gcc/clang . add note that two project builds cannot run at the same time - +Bld.sig and Bld.gv files . add Note about if two targets are built from the same source fi +les - they both use the same .o files and may compile different .o files depending on the compile + options - this results in the building of one target interfering with the build of the other targe +t since the signature of the .o files are now different because of the other build. . add doc/notes about ldd and env variable LD_LIBRARY_PATH - add +code to add special paths to LD_LIBRARY_PATH?? . doc packages needed for execution of all Bld.example's : clang(LLVM), gcc, g++, GNUstep header of the GNUstep Base l +ibrary package . add perldoc section on how to use bld to build a multi executab +le/library project e.g. subversion . major routine call tree . explain modules to install and all build/run requirements experimental.pm gcc/g++ clang ldd . required executables are available and full paths - operation: +ldd, cpp, mv, which - build: gcc, g++, clang, as add doc with list of all external cmds . FFAQ - (Fake)FAQ . add section on design notes - section on design philosophy . write subroutines summary . add doc about the 6 required variables and how to construct the +m Outside resources: . read Intermediate Perl 2nd Ed for ideas . investigate scons on scons.tigres.org . read "Managing Projects with GNU Make" . Links: . http://www.scons.org/wiki/FromMakeToScons . http://producingoss.com/ - Carl Fogel . http://coding.smashingmagazine.com/2013/01/03/starting-open +-source-project/ . http://oss-watch.ac.uk/resources/howtobuildcommunity Release: . on release ask for users to compile new projects and incorporat +e them into a Release bld.<project>.<version>.tar.gx file . slide presentation . article . LLVM community . GNU community . Apache Software Foundation . send the perlbrew team a copy . CPAN.org . hunspell Notes: . to do a forced rebuild of everything delete Bld.sig or *.o . note on how to build both static and dynamic libraries . why no 'include' DIRS directive? . exit codes 0=success, 255=error - see bld.* . EVAL section %@ defines are ignored - only scalar variables - $ +cmd to EVAL 0 . if a source is moved it will rebuild since the Bld.sig signatur +e is keyed to the source file and it's path from the bld parent directory . same basename with different extensions will fail and same base +name with same extension in different locations will fail because same named object +will be created . symbolic links to source files should work - a file in one plac +e and a link to that same file in another place will be detected/warned/fatal . g++ is installed with the gcc-c++ package Consult . open source release consult?? INCOMPATIBILITIES None Known BUGS AND LIMITATIONS None Known SEE ALSO bld.README Critique of 'make': http://www.scons.org/wiki/FromMakeToScons - a detailed critique of make and some alternatives ftp://ftp.gnu.org/old-gnu/Manuals/autoconf/html_mono/autoconf +.html#SEC3 - a brief critique of make and how GNU automake from the GN +U Build System contributes GITHUB RELEASES https://github.com/rahogaboom/bld bld-1.0.0.tar.gz - initial release AUTHOR Richard A Hogaboom richard.hogaboom@gmail.com LICENSE and COPYRIGHT and (DISCLAIMER OF) WARRANTY Copyright (c) 1998-2014, Richard A Hogaboom All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions + are met: * Redistributions of source code must retain the above copyright n +otice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyrigh +t notice, this list of conditions and the following disclaimer in th +e documentation and/or other materials provided with the distributio +n. * Neither the name of the {organization} nor the names of its cont +ributors may be used to endorse or promote products derived from this softw +are without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTOR +S "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMI +TED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTIC +ULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECI +AL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO +, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEOR +Y OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDI +NG NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. :-)
DBIx::Class recursive subquery construct
No replies — Read more | Post response
by maruhige
on Aug 15, 2014 at 14:38

    Hello Monks,

    My first encounter with DBIx::Class's subquery construct went so well (and my need of n subqueries so great) that I tried a programmatic way of creating nested subqueries, shared here.

    Context is simply a tag search function, where all the tags are in one table, with an intermediate 2 integer column table to the post table. Tags are retrieved from the tag table in ascending order of weight to ensure the smallest possible start set, getting progressively smaller.

    sub recursive_subquery{ my($schema,$join_column,$key,$values) = @_; my ($value1) = shift(@$values); my $sub_query =$schema->search({ $key => $value1, }); for(@$values){ my $tmp_query = $schema->search({ $join_column => { -in => $sub_query->get_column($join_column)->as_query }, $key => $_, }); $sub_query = $tmp_query; } return $sub_query; }

    And a short example of using it manually in Catalyst:

    $c->stash->{rows} = $c->model('DB::Post')->search({ 'posts.post_id' => { -in => recursive_subquery( $c->model('DB::TagCloud1'), ,'post_id', 'tag_def_id',[ 16042, 190712, ])->get_column('post_id')->as_query }, });

    As an aside, the performance of this highlighted the unhappy fact that mysql tends to evaluate sub queries from the outside-in, so you may want to reverse the weighting in that setup.

Draw a Square With Perl!
4 direct replies — Read more / Contribute
by Dipseydoodle
on Aug 14, 2014 at 10:41

    Good morning monks. Today I figured I'd post this little script I wrote. It's not that cool and you can propbably point out errors in my writing/syntax style, so feel free to yell at me :)

    #!/usr/bin/perl # Put on your nerd glasses and draw a square! use strict; use warnings; my $balancex = 10; # width my $repeatx = $balancex; #don't change repeatx!!! use balancex instead +. my $repeaty = 10; # height do{ while($repeatx > 0){ print ". "; # change the period to print another character, but ke +ep the extra space. $repeatx -= 1; } print "\n"; $repeatx += $balancex; $repeaty -= 1; } until ($repeaty == 0); #corrected by Athanasius & AppleFritter

    No doubt this could be done in much fewer lines, or even as a one-liner :P But as it says it draws a square with periods, and is just fun to look at.

Check popular review sites for new reviews.
1 direct reply — Read more / Contribute
by wrinkles
on Jul 28, 2014 at 19:25

    This script checks select pages on some popular review sites for the latest review, and writes the date of the most recent review from each site to a file. Each time it is run, it checks against the previous results and sends an email notification with the date and link to page(s) with fresh reviews.

    "mailx" was used to send email. I suspect that this may not be available in Windows, I tested only on Mac OS X and Ubuntu.

    The following script has the pages hard-coded, as it was written for my school. Those pages (and your email addresses) could easily be replaced to suit your requirements.

    I found "The 10-minute XPath Tutorial" ("Automating System Administration with Perl, 2nd ed.) very helpful in understanding XPath. Thanks also to the help of fellow perl monks!

    By the way, "EB" and "MA" are shorthand for two separate campuses within our school.

    Update 2014-07-28 - I ran perlcritic and fixed some potential problems

    #!/usr/bin/env perl use strict; use warnings; use utf8; use Text::CSV; use Carp; use LWP::Simple qw(get); use Text::Unidecode qw(unidecode); use HTML::TreeBuilder::XPath; # Email Settings my %email = ( to => 'me@example.com,you@example.com', subject => 'New ECDS reviews found' ); # Reviews subroutine and URLs to check my $review_sites = [ { site => 'Yelp', sub => \&yelp_checker, review_pages => { 'EB' => 'http://www.yelp.com/biz/encinitas-country-day-school-encinitas?sort_b +y=date_desc', 'MA' => 'http://www.yelp.com/biz/encinitas-country-day-school-encinitas-2?sort +_by=date_desc' } }, { site => 'GreatSchools', sub => \&gs_checker, review_pages => { 'MA' => 'http://www.greatschools.org/california/encinitas/9670-Encinitas-Count +ry-Day-School/?tab=reviews' } }, { site => 'PrivateSchoolReview', sub => \&psr_checker, review_pages => { 'MA' => 'http://www.privateschoolreview.com/school_ov/school_id/ +2039' } }, { site => 'Kudzu', sub => \&kudzu_checker, review_pages => { 'MA' => 'http://www.kudzu.com/m/Encinitas-Country-Day-School-135 +71675' } }, { site => 'MerchantCircle', sub => \&mc_checker, review_pages => { 'MA' => 'http://www.merchantcircle.com/business/Encinitas.Country.Day.School.7 +60-942-1111?sort=created&dir=desc' } } ]; # Default date if no record my $default_date = '00-00-0000'; # Month name to number conversion my %month = ( January => '01', February => '02', March => '03', April => '04', May => '05', June => '06', July => '07', August => '08', September => '09', October => '10', November => '11', December => '12' ); # Where is the reviews file? my $reviews_filepath = "reviews.txt"; # Where is the alert message file? my $msg_filepath = "msg.txt"; # Slurp hash from reviews file my $old_reviews = hash_from_csv($reviews_filepath); my %new_reviews; # Iterate through each site for my $review_site (@$review_sites) { my $pages = $review_site->{review_pages}; # iterate through each campus html and collect xpath nodes while ( my ( $campus, $url ) = each %$pages ) { my $html = get $url or croak("Can't reach $url $!\n"); $html =~ s/([^[:ascii:]]+)/unidecode($1)/ge; my $tree = HTML::TreeBuilder::XPath->new; $tree->parse($html) or croak("Parse failed: $!\n"); my ($date) = $review_site->{'sub'}->($tree); # create hash keys from campus and review site names my $campus_site = $campus . '_' . $$review_site{'site'}; push( @{ $new_reviews{$campus_site} }, $date ); push( @{ $new_reviews{$campus_site} }, $url ); } } # Write message if new reviews my $msg = ''; while ( my ( $item, $data ) = each %new_reviews ) { unless ( $$old_reviews{$item}[0] eq $$data[0] ) { $msg .= "New review on $$data[0]: \n $$data[1]\n"; } } # Save message. open my $fh, ">:encoding(utf8)", "$msg_filepath" or croak("cannot open $msg_filepath: $!"); print {$fh} $msg or croak("Can't print message:\n$msg\n$!"); close $fh; # Write new review data to file. hash_to_csv( \%new_reviews, $reviews_filepath ); # Email message if exists send_email($msg) if length($msg); ######## SUBROUTINES ####### # import old data from file sub hash_from_csv { my $filepath = shift; open my $fh, "<:encoding(utf8)", "$filepath" or croak("cannot open $filepath: $!"); my $csv = Text::CSV->new( { binary => 1 } ); my %hash; map { $hash{ shift @{$_} } = $_ } @{ $csv->getline_all($fh) }; close $fh; return \%hash; } # write new data to file sub hash_to_csv { my ( $hash, $filepath ) = @_; open my $fh, ">:encoding(utf8)", "$filepath" or croak("cannot open $filepath: $!"); my $csv = Text::CSV->new( { binary => 1, eol => "\n" } ); for ( keys %$hash ) { my $colref = [ $_, $$hash{$_}->[0] ]; $csv->print( $fh, $colref ); } close $fh; return; } # send email notifications sub send_email { my ($body) = @_; open my $pipe, '|-', '/usr/bin/mailx', '-s', $email{subject}, $ema +il{to} or croak("can't open pipe to mailx: $!\n"); print $pipe $body; close $pipe; croak("mailx exited with a non-zero status: $?\n") if $?; return; } # extract date of most recent review from GreatSchools tree sub gs_checker { my $tree = shift; my $xpath = '//div[contains(@class,"media mbs")]/div[(@class="author small make-99 +9999 fl pbn mbn")]'; my $dates = $tree->findnodes($xpath); # dates returned as 'month dd, yyyy' my $date; $date = $$dates[0]->as_trimmed_text() if ( $$dates[0] ); if ( $date =~ /(\w{3,9})\s+(\d{1,2}),\s+(\d{4})/ ) { $date = $3 . '-' . $month{$1} . '-' . $2; } return ( $date || $default_date ); } # extract date of most recent review from Yelp tree sub yelp_checker { my $tree = shift; my $xpath = '//meta[@itemprop="datePublished"][1]'; my $dates = $tree->findnodes($xpath); # dates returned as 'yyyy-mm-dd' if ( $$dates[0] ) { return $$dates[0]->attr('content'); } else { return ( $$dates[0] || $default_date ); } } # extract date of most recent review from PrivateSchoolReview tree sub psr_checker { my $tree = shift; my $xpath = '//meta[@itemprop="datePublished"][1]'; my $dates = $tree->findnodes($xpath); # dates returned as 'yyyy-mm-dd' if ( $$dates[0] ) { return $$dates[0]->attr('content'); } else { return ( $$dates[0] || $default_date ); } } # extract date of most recent review from Kudzu tree sub kudzu_checker { my $tree = shift; my $xpath = '//div[@class="review_post_date"]/p/span[@class="rp-da +te"]'; my $dates = $tree->findnodes($xpath); # date returned as 'mm/dd/yyyy' my $date; $date = $$dates[0]->as_trimmed_text() if ( $$dates[0] ); if ( $date =~ /(\d{1,2})\/(\d{1,2})\/(\d{4})/ ) { $date = $3 . '-' . $1 . '-' . $2; } return ( $date || $default_date ); } # extract date of most recent review from MerchantCircle tree sub mc_checker { my $tree = shift; my $xpath = '//span[@itemprop="datePublished"][1]'; my $dates = $tree->findnodes($xpath); # dates returned as 'Month dd, yyyy at hh:mm PM' my $date; $date = $$dates[0]->as_trimmed_text() if ( $$dates[0] ); if ( $date =~ /\s*(\w{3,9})\s*(\d{1,2})\s*\,\s*(\d{4})\s+at\s+\d{1,2}\:\d{2} +\s+[AP]M/ ) { $date = $3 . '-' . $month{$1} . '-' . $2; } return ( $date || $default_date ); }
Install missing modules with Module::Extract::Install's cpanm-missing/cpanm-missing-deep
1 direct reply — Read more / Contribute
by frozenwithjoy
on Jul 24, 2014 at 12:07

    The other day I got a new laptop and tried to run a couple scripts on it. I quickly grew tired of the tedious cycle of 'Module::X not found' errors/installing Module::X. I decided to make a tool to improve the situation.

    The result, Module::Extract::Install, can be used to analyze perl scripts and modules to identify and install their dependencies in an automated, pain-free manner. You can use this module's methods to write your own script (e.g., to pipe missing modules to your favorite installer) or take advantage of the included command-line tools cpanm-missing (checks a list of Perl files) and cpanm-missing-deep (checks all the Perl files within a directory).

    Feel free to give me last minute comments/suggestions before I put it on CPAN (currently it is only available through GitHub). Thanks!

SysV shared memory (Look-Alike) -- pure perl
3 direct replies — Read more / Contribute
by flexvault
on Jul 20, 2014 at 16:42

    Dear Monks,

    I have stayed away from using shared memory because of the statement: "This function is available only on machines supporting System V IPC." in the documentation for use. I decided I had a good use and did a Super Search and found zentara's excellent work which I used as a starting point for this discussion. I re-read the documentation and looked at the books 'Programming Perl' and the 'Perl Cookbook', and wondered if I could do something similar with a RAM disk and not have a dependency on System V IPC support. So taking the code provided by zentara, and using it as a benchmark for my requirements, I started testing on a 8GB RAM disk on a Debian 64bit Linux box using a 32-bit 5.14.2 Perl. I found that I could get approximately 216K System V IPC writes per second(wps). WOW!

    Since I only needed 20-25K writes per second, I started working on my "shared memory look-alike". What I found was that I could do better than 349K wps. Actually the 1st run produced 800K wps, but I realized I didn't follow the format of zentara's script, so I modified the script to call a subroutine, flock the file, test return codes, etc. Currently, 349K wps is the worse case on a RAM disk, 291K wps on a 7,200 rpm hard disk, and 221K wps on a 5,400 rpm disk. (Note: I didn't have a SSD on the test system.) The code follows, and if I did something to make my numbers look better, I'd like to know.

    Update: Do not use this code as it mixes buffered and unbuffered I/O. See later for a sample that I believe works correctly!

    ####### shmem-init.pl ############################ #!/usr/bin/perl use warnings; use strict; use Time::HiRes qw( gettimeofday usleep ); use Fcntl qw( :DEFAULT :flock ); ## Part of core perl use IPC::SysV qw(IPC_STAT IPC_PRIVATE IPC_CREAT IPC_EXCL S_IRUSR S_IWU +SR IPC_RMID); # see "perldoc perlfunc /shmget" and "perldoc perlipc /SysV" # big difference from c is attach and detach is automatic in Perl # it attaches to read or write, then detaches my $go = 1; $SIG{INT} = sub{ $go = 0; &close_m(); #close up the shared mem exit; }; my $segment_hbytes = 0x640; # hex bytes, a multiple of 4k my ($segment_id, $segment_size) = &init_m($segment_hbytes); print "shmid-> $segment_id\tsize-> $segment_size\n"; # Counter Elap +sed time Writes/second # ------------- +---------------------------- my $stime = gettimeofday; my $i = 0; # Result: 2000000 9.27 +134203910828 215718/second while($go) { &write_m($i); $i++; if ( $i >= 2_000_000 ) { $stime = gettimeofday - $stime; my $rpm = int( 2_000_000 / + $stime ); print "$i\t$stime\t$rpm/second\n\n"; last; } #select(undef,undef,undef,.001); last if ! $go; } our $indexdb; # Counter Ela +psed time Writes/second # ------------ +----------------------------- my $file = "/dev/shm/FlexBase/__env.index"; # Result: 2000000 5.7 +3024797439575 349025/second # my $file = "/__env.index"; # Result: 2000000 6.8 +8051080703735 290676/second # my $file = "/flexvault/__env.index"; # Result: 2000000 9.0 +2671384811401 221564/second open( $indexdb,"+<", $file ) or die "Not open: $!"; $stime = gettimeofday; $i = 0; while( 1 ) { &write_mem($i); $i++; if ( $i >= 2_000_000 ) { $stime = gettimeofday - $stime; my $rpm = int( 2_000_000 / + $stime ); print "$i\t$stime\t$rpm/second\n"; last; } } close $indexdb; exit; sub write_mem() { our $indexdb; # Write a string to the shared file. my $message = shift; if ( flock( $indexdb, LOCK_EX ) ) { my $ret = sysseek( $indexdb, 0, 0); # move to beginning of fil +e if ( ! defined $ret ) { die "O04. sysseek failed: $!"; } $ret = syswrite ( $indexdb, $i, length($i) ); if ( $ret != length($i) ) { die "O05. syswrite failed! $!"; } } ## ## Make test ( 1==1 ) to verify syswrite worked correctly. ## Make test ( 1==2 ) to test speed of syswrite to filesystem. ## if ( ( 1==2 )&&( flock( $indexdb, LOCK_SH ) ) ) { my $ret = sysseek( $indexdb, 0, 0); # move to beginning of fil +e if ( ! defined $ret ) { die "O06. sysseek failed: $!"; } $ret = sysread ( $indexdb, my $ni, length($i) ); if ( $ni != $i ) { die "O07. |$ni|$i| $!"; } } return 0; } ################################################################# sub init_m(){ my $segment_hbytes = shift; # Allocate a shared memory segment. my $segment_id = shmget (IPC_PRIVATE, $segment_hbytes, IPC_CREAT | IPC_EXCL | S_IRUSR | S_IWUSR); # Verify the segment's size. my $shmbuffer = ''; shmctl ($segment_id, IPC_STAT, $shmbuffer); my @mdata = unpack("i*",$shmbuffer); #not sure if that is right unp +ack? works :-) return($segment_id, $mdata[9] ); } sub write_m() { # Write a string to the shared memory segment. my $message = shift; shmwrite($segment_id, $message, 0, $segment_size) || die "$!"; #the 0, $segment_size can be broke up into substrings like 0,60 # or 61,195, etc return 0; } sub close_m(){ # Deallocate the shared memory segment. shmctl ($segment_id, IPC_RMID, 0); return 0; } 1; __END__

    Regards...Ed

    "Well done is better than well said." - Benjamin Franklin

Yahoo Content Analyzer
No replies — Read more | Post response
by Your Mother
on Jul 20, 2014 at 16:34

    Inspired by How to transmit text to Yahoo Content Analysis. Not sure how complete or correct it is, just threw it together for fun. Seems to work and Iíll make amendments as necessary or sanely suggested.

    Requires: strictures, LWP::UserAgent, Getopt::Long, Pod::Usage, Path::Tiny.

    #!/usr/bin/env perl use 5.010; use strictures; no warnings "uninitialized"; use LWP::UserAgent; use Getopt::Long; use Pod::Usage; use open qw( :encoding(UTF-8) :std ); use Path::Tiny; # use XML::LibXML; # For expansion... or XML::Rabbit my $service = "http://query.yahooapis.com/v1/public/yql"; my %opt = ( text => undef, url => undef, max => 100 ); # These are, luckily, false by default for Yahoo, so we only care abou +t true. my %boolean = map {; $_ => 1 } qw/ related_entities show_metadata enable_categorizer /; # What we compose to query, e.g. not "verbose" or "file." my %sql = ( %opt, %boolean ); my $ok = GetOptions( \%opt, "text=s", "file=s", "url=s", "max=i", "verbose", "help", keys %boolean ); pod2usage( -verbose => 0, -exitval => 1, -message => "Options were not recognized." ) unless $ok; pod2usage( -verbose => 2 ) if $opt{help}; pod2usage( -verbose => 0, -exitval => 1, -message => "One of these, at most, allowed: text, url, fil +e." ) if 1 < grep defined, @opt{qw/ text url file /}; # Only one, text|file, is allowed by Getopt::Long. $opt{text} ||= path($opt{file})->slurp if $opt{file}; unless ( $opt{url} || $opt{text} ) # Accept from STDIN. { say "Type away. ^D to execute (on *nix anyway)."; chomp( my @input = <> ); $opt{text} = join " ", @input; die "Give some input!\n" unless $opt{text} =~ /\w/; } my @where; for my $key ( keys %opt ) { next unless defined $opt{$key} and exists $sql{$key}; $opt{$key} = "true" if $boolean{$key}; $opt{$key} =~ s/([\\"'\0])/\\$1/g; push @where, sprintf "%s = '%s'", $key, $opt{$key}; } my $q = sprintf "SELECT * FROM contentanalysis.analyze WHERE %s", join " AND ", @where; say "SQL >> $q\n" if $opt{verbose}; my $ua = LWP::UserAgent->new; my $response = $ua->post( $service, [ q => $q ] ); say $response->request->as_string if $opt{verbose}; say $opt{verbose} ? $response->as_string : $response->decoded_content(); exit ! $response->is_success; __END__ =pod =encoding utf8 =head1 Name yahoo-content-analyzer - command-line to query it. =head1 Synopsis yahoo-content-analyzer -text "Perl is a programming language." -text "{command line string}" -file (slurp and submit as text) -url -max [100 is default] -related_entities -show_metadata -enable_categorizer -verbose -help =head1 Description L<https://developer.yahoo.com/search/content/V2/contentAnalysis.html> =head1 Code Repository L<http://perlmonks.org/?node_id=1094394> =head1 See Also L<https://metacpan.org/search?q=YQL>. =head1 Author and License Your Mother, L<http://perlmonks.org/?node_id=248054>. You may redistribute and modify this code under the same terms as Perl itself. =head1 Disclaimer of Warranty No warranty. No means no. =cut

    Updates/Changelog

    • Removed URI, only first draft used it.
commandline ftpssl client with Perl
1 direct reply — Read more / Contribute
by zentara
on Jul 05, 2014 at 12:37
    Recently, all my c-based ftpssl programs stopped working with ssl, namely gftp and lftp. I found that Net::FTPSSL still works great, but it isn't interactive, it allows just automated scripting. So, how to make an interactive session? I first thought of using a gui, but there was no real advantage to the gui, over the commandline, ( not without a huge amount of work ;-) ), so a simple commandline program fit the bill. Here it is. There is a second program below it, which runs it from a pty, in anticipation of channeling it into a Tk or GTk gui; but the gui's seems to have difficulty capturing the tty. If anyone can show how to get the ftpssl tty pty output into a textbox, I would be grateful.

    If you want to experiment on your own machine, Proftd works good when configured with --enable-tls, you can google for instructions.

    I used a little eval trick to pass the commands into the pty.

    Some common commands : list pwd cwd noop nlst mkdir('foo') rmdir('foo') put('somelocalfile', 'remotefile')

    The method set that comes with Net::FTPSSL is simple and easy.

    ftps-z: runs standalone or thru a pty as shown below

    #!/usr/bin/perl use strict; use warnings; use Net::FTPSSL; my $server = "127.0.0.1"; my $username = "someuser"; my $passwd = "somepass"; my @ret; my $ftps = Net::FTPSSL->new($server, Encryption => EXP_CRYPT, Debug => 1, # Croak => 1, ) or die "Can't open $server\n$Net::FTPSSL::ERRSTR"; $ftps->login($username, $passwd) or error("Credential error, $ftps->last_message"); # get default listing and pwd @ret = $ftps->list() or error("Command error, $ftps->last_message"); print "####################\n"; print join "\n", @ret,"\n"; print "####################\n"; # get default pwd @ret = $ftps->pwd or error("Command error, $ftps->last_message"); print "####################\n"; print join "\n", @ret,"\n"; print "####################\n"; if( -t STDIN ) { print "tty\n"; } while(1){ print "Hit Control-C to exit ... otherwise:\n"; print "Enter command: \n"; my $com = <STDIN>; chomp $com; if ($com =~ m/quit/){ print "exiting\n";} # needed this eval to get ftps methods to work with pty my @ret = eval "\$ftps->$com"; if($@) { print "@_\n"; } print "####################\n"; print join "\n", @ret,"\n"; print "####################\n"; if ($com =~ m/quit/){ print "exit command received, ftpssl exiting\n"; + print "Control-C to exit pty, or Shift-PageUp to + view log\n"; last; } } print "at end\n"; exit;
    IO-Pty-driver for above ftps-z
    #!/usr/bin/perl -w # Description: Fool a process into # thinking that STDOUT is a terminal, when in fact # basic PTY code from etcshadow use warnings; use strict; use IO::Pty; $SIG{CHLD} = 'IGNORE'; # for when we quit the ftpssl session my $pty = IO::Pty->new; my $slave = $pty->slave; open TTY,"/dev/tty" or die "not connected to a terminal\n"; $pty->clone_winsize_from(\*TTY); close TTY; my $pid = fork(); die "bad fork: $!\n" unless defined $pid; if (!$pid) { open STDOUT,">&=".$pty->fileno() or die $!; exec "./ftps-z"; }else{ $pty->close(); while (defined (my $line = <$slave>)) { print $line; } } while(1){ my $command = <>; print $slave "$command\n"; }

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
Vim: Auto highlight of variables
No replies — Read more | Post response
by Loops
on Jun 27, 2014 at 18:49

    So Ovid made this blog post that gave an example of editing Perl in Vim -- when you move your cursor over a Perl variable it is highlighted in the rest of the document automatically. Quite handy.

    Paul Johnson then made some improvements and put the code in a Git repo so that it's very easy to install with Pathogen in Vim

    After cloning that repo into your Pathogen bundle directory, it pretty much just works as advertised. For some reason it does not work with Tim Popes "vim-sensible" plugin however.

    The highlighting is delayed until you haven't moved your cursor for the number of milliseconds set in the Vim "updatetime" variable. By default this is set to 4000 which is pretty slow. Doing "set ut=50" in your vimrc makes it much snappier.

    Enjoy.

    P.S. Anyone have an updated syntax file for 5.20.0 sub-signatures and other new features?

WWW::Mechanize inventory update for musicstack.com
No replies — Read more | Post response
by maruhige
on Jun 24, 2014 at 11:21

    Spent a bit longer on this than needs be with hindsight, so I figure i'd share the product of that time

    Music stack doesn't have much in the way of an API, instead it requires manual navigation to upload change files. The change file also has an enormous amount of saved customisations on the upload form which would be numbingly tedious to fill out manually in the $mech itself

    So! here's a short and effective means of putting inventory files up on Music Stack, using WWW::Mechanize to navigate from the login screen to the upload screen.

    use WWW::Mechanize; my $username = 'x@example.com'; my $password = 'mypw'; my $upfile = '/path/to/additions.csv'; my $mech = WWW::Mechanize->new() or die $!; $mech->cookie_jar(HTTP::Cookies->new()); $mech->get(q#http://www.musicstack.com/login.cgi#) or die $!; #need to use sequential identifiers when forms are either nameless + or share the same name $mech->form_number(3); $mech->field ('user' => $username); $mech->field ('pw' => $password); $mech->click_button(name => "login"); #now on the user account page $mech->follow_link( text => 'Upload' );#case sensitive #now on the inventory management page $mech->form_name('form'); # 2 forms on page - other is 'search' # 3 options here - add is incremental $mech->set_fields('delete' => 'add'); $mech->field('upfile' => $upfile ); $mech->click_button(value => 'Upload File'); #the file is uploaded and the status screen displayed here print $mech-> content();

Add your CUFP
Title:
CUFP:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":


  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others exploiting the Monastery: (8)
    As of 2014-09-16 23:12 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (52 votes), past polls