Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Meditations

( #480=superdoc: print w/ replies, xml ) Need Help??

If you've discovered something amazing about Perl that you just need to share with everyone, this is the right place.

This section is also used for non-question discussions about Perl, and for any discussions that are not specifically programming related. For example, if you want to share or discuss opinions on hacker culture, the job market, or Perl 6 development, this is the place. (Note, however, that discussions about the PerlMonks web site belong in PerlMonks Discussion.)

Meditations is sometimes used as a sounding-board — a place to post initial drafts of perl tutorials, code modules, book reviews, articles, quizzes, etc. — so that the author can benefit from the collective insight of the monks before publishing the finished item to its proper place (be it Tutorials, Cool Uses for Perl, Reviews, or whatever). If you do this, it is generally considered appropriate to prefix your node title with "RFC:" (for "request for comments").

User Meditations
RFC: An on-disk NFS-safe key-value store database (NFSdb)
3 direct replies — Read more / Contribute
by RecursionBane
on Oct 12, 2014 at 13:29
    Greetings, Monks!

    It has been too long since I have solicited your opinion.

    After looking at the dozens upon dozens of database mechanisms available, I see that there are two major types:

    1. On-disk, serverless, "low"-concurrency database as file(s); Examples include:
    2. Remote (even if via localhost/), server/client, "high"-concurrency databases; Examples include:
    I had a specific requirement for a database that was:
    • Multi-process safe
    • Multi-host safe
    • Network File System (NFS) safe
    • Multi-master enabled (potentially to thousands of master processes concurrently)
    • Easily backed up on a frequency-basis
    • Lacking a single point of failure, assuming IT-managed storage filers

    None of the local databases I have found claim to be both multi-process safe and NFS-safe:

    • Some of them are averse to NFS (see: SQLite, BerkeleyDB, LMDB),
    • Others do not allow multiple processes accessing the database at the same time (see: TokyoCabinet, LevelDB), and,
    • Still others perform coarse-locking for multi-process access (see: MLDBM::Sync).

    Remote databases require one or more server hosts, or else the program will have to open and maintain one (and only one!) local server-process and have all other processes connect to it via localhost. Additionally, having managed to choke a MySQL server with unoptimized long-running queries early on while developing a complex project, I tend to shy away from remote databases.

    Despite the risk of link rot, it is hoped that the extensive collection of links above helps users find a database binding in Perl that works for their needs. A description of NFSdb begins below.

    Let's start with how NFSdb benchmarks against SQLite with multiple writers and readers across a network file system.

    # Benchmarks with 100000 sequential keys with random record values acr +oss four concurrent readers/writers # # NFSdb settings: # # atomic_read: 0 # atomic_write: 1 # db_root: ./nfsdb # debug: 0 # depth: 0 # lock_read: 0 # lock_write: 0 # nonblocking_write: 1 # profile: 0 # # Benchmark : Avg (us) Max (us) Min (us) # ========= ======== ======== ======== # SQLite fresh writes : 12921.69 978057.00 1379.00 # NFSdb fresh writes : 3337.90 117746.00 1893.00 # SQLite repeat writes : 11329.72 880585.00 3419.00 # NFSdb repeat writes : 3952.88 159310.00 2121.00 # SQLite fresh reads : 2379.53 509153.00 1536.00 # NFSdb fresh reads : 1139.35 12749.00 533.00 # SQLite repeat reads : 2471.39 40543.00 1518.00 # NFSdb repeat reads : 1101.33 13373.00 311.00

    Note that the average times for writes are up to 4x better, and max times are up to 8x better; this is because of table-level locking in SQLite.
    Of course, this isn't an entirely fair comparison because SQLite provides a relational layer, whereas NFSdb is simply a key-value store. There are many situations, however, where a key-value store would suffice, but programmers code up a solution around SQLite anyway. There is a better way!

    Now, let's talk about the implementation.

    While perusing CPAN, I found File::SharedNFSLock to make locking across NFS feasible by exploiting hardlinks (Update: A kind Anonymous Monk points out that this module warns of potential race conditions if hardlinking is not a viable locking solution on NFS). Inspired by CHI::Driver::File's automatic hashing and deep-directory creation, I then proceeded to naively whip up a simple key-value store that I call NFSdb, with the following features:

    • Low-overhead (no server/client, but it does have a few non-core dependencies)
    • Object-oriented (my first OO module!)
    • NFS-safe locking available
    • Atomic (lockless) write supported
    • Indexless (so searching is not possible; the exact key is required for retrieval)
    • Benchmarks favorably compared to SQLite

    Since every "record" is a file on-disk, even with locking enabled, individual "cells" can be locked, leading to high concurrency when compared to SQLite's table-locking mechanism. With lockless writes, it is possible to achieve even higher performance with the tradeoff that your read_key() may not see the absolute newest data (I suppose this could be labeled "eventual consistency").

Installing wxPerl 0.9923 with wxWidgets 3.0.1 on Unbuntu 14.04LTS 64bit
1 direct reply — Read more / Contribute
by jmlynesjr
on Oct 11, 2014 at 21:19

    I'm in the process of replacing my old 32bit Thinkpad with a new 64bit HP 15. As these things go, MPIDE, Fritzing, Eagle, and wxPerl all required libraries that weren't included in 14.04. After a lot of searching, all have been successfully installed. Below is the script I used for the wxWidgets/wxPerl installation. Hope it can be of some use to someone. Also cross posted to the wxPerl Wiki.

    Update1:

    Based on comments here at the Monestary and discussions with the original author, listed below is an updated version of the script.

    James

    There's never enough time to do it right, but always enough time to do it over...

Perl Success Stories
6 direct replies — Read more / Contribute
by aartist
on Oct 08, 2014 at 14:16
    I was visiting Success Stories and found them very old. The latest story is dated as old as September 2001. Is there an another version being written by somebody? Any blog/websites reflect the current status ?
A port of "Dukedom" to Perl
1 direct reply — Read more / Contribute
by boftx
on Oct 07, 2014 at 02:36

    I was bored the other day so I decided to port the game "Dukedom" from C to Perl. Here is the result, I'd greatly appreciate comments/feedback on how to make it display agnostic so it can be used for websites, Tk, etc. besides command line scripts. I have code refs now that can be changed out, but I'm pretty sure I need to do more. I am toying with using exception objects to signal the need for display/input and to provide callbacks to re-enter the state machine at the proper point.

    Please keep in mind that this is only the first draft and no docs or tests have been written yet. However, the command line script will work and allow you to play the game.

    https://github.com/boftx/Games-Dukedom

    You can find the original code that I ported from here: https://github.com/caryo/Dukedom/blob/master/imports/dukedom.c

    You must always remember that the primary goal is to drain the swamp even when you are hip-deep in alligators.
The importance of avoiding the shell
5 direct replies — Read more / Contribute
by jhourcle
on Sep 25, 2014 at 07:34

    For those who haven't heard, there was a Bash exploit announced yesterday. Although a patch did come out (4.3.25), there are reports that it does not fully fix the problem.

    Using variations of the test string that was posted to slashdot, it looks as if perl makes your system invulnerable:

    sh-3.2$ env x='() { :;}; echo vulnerable' sh -c "echo this is a test" vulnerable this is a test sh-3.2$ env x='() { :;}; echo vulnerable & echo' perl -e 'system "echo + test"' test sh-3.2$ env x='() { :;}; echo vulnerable' perl -e 'print `echo test`' test

    ... but unfortunately, perl only protects you when you either pass system a list. In other cases, if it sees a shell meta character in your string, you're still vulnerable:

    sh-3.2$ env x='() { :;}; echo vulnerable' perl -e 'print `echo test;`' vulnerable test sh-3.2$ env x='() { :;}; echo vulnerable' perl -e 'system "echo test;" +' vulnerable test sh-3.2$ env x='() { :;}; echo vulnerable' perl -e 'system qw(echo test +;)' test;

    Your main attack vector is CGIs -- anyone can set their user-agent, or pass in a query string, and the webserver will set environmental variables automatically. Should your scripts shell out, they're exploitable.

    So, the moral of the story: always use the list form of system, and avoid backticks if you can. If you have to do strange things w/ redirecting output, look at IPC::Open2 and IPC::Open3 which can also take list inputs.

SNTP Client/Server V2 RFC
1 direct reply — Read more / Contribute
by thanos1983
on Sep 23, 2014 at 10:38

    Hello Monks,

    A few days ago, I finished my task on creating a running SNTP Client/Server. I had a similar post on my previous version UDP SNTP Client/Server RFC where I was introduced to the idea that this could be submitted as module on CPAN since nothing similar exists. The closest module that I could found is Net::NTP. Which in reality there are nothing to look a like to each other. The other module fetches and displays data from the NTP Server. The module that I propose is using the local clock of the server to calculate the roundtrip delay d and local clock offset t.

    So I thought if people are interesting in running a script like that it should have higher accuracy and actually support most if not all features of Simple Network Time Protocol (SNTP).

    Accuracy down to microseconds can only be achieved on Linux OS and not to Windows OS Time::HiRes not that high on windows.

    I would like to ask for possible improvements or suggestions of my code. Please take in consideration that I am not expert on programming and this is my first module submission.

    Also can Windows and MAC users test the code and provide feedback on the script execution. I mean if you have any problems or possible faults. I have developed the scripts on Linux so it is only tested on Linux environment.

    Update: client.pl and server.pl include use POSIX qw(CLOCKS_PER_SEC); for compatibility reasons with Windows OS. Thanks to the help of VinsWorldcom for the proposed solution.

    Client.pl

    Server.pl

    Possible future improvement is to apply threading on the server so it can reply on my multiple requests (clients) simultaneously.

    Thank you all for your time and effort to review and comment on my request.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
Building an Open Source Perl ERP
No replies — Read more | Post response
by einhverfr
on Sep 14, 2014 at 21:21

    LedgerSMB 1.4.0 has been released after about three years of development. The system is written in Perl, SQL, and PL/PGSQL, and supports Perl 5.10 and higher, as well as PostgreSQL 9.0 and higher. Click Read More for our press release.

    We are already going into this release with a fair bit of discussion as to where to go from here. We've already been building a framework for database access based on our experiences (PGObject). And now we are looking at moving to a web framework (Dancer is the one we are looking at most heavily right now, but Mojolicious and Catalyst have also been discussed).

    While we chose Perl because it was used by the software we forked (SQL-Ledger), as we have moved into modernizing and improving the code, we have become very happy with the power and flexibility of the language. 1.4.0 moves a fair bit of code onto Moose. And we expect this trend to continue.

    I won't vouch for the quality of the code we inherited. But I think the quality of the code that is being written for 1.5 is now something I am pretty happy with.

    I would be interested in any feedback on this process that other large enterprise application developers have.

The Case for Macros in Perl
5 direct replies — Read more / Contribute
by einhverfr
on Sep 12, 2014 at 23:07

    In some of my work I have started doing a lot more with higher order and functional Perl programming. A good example is PGObject::Util::DBMethod which provides a way to declaratively map stored procedures in Postgres to object methods. I have linked to the source code on github above because it is a good example of where macros would be very helpful.

    Now I will be the first to admit that in these cases, macros are not 100% necessary. The module above can accomplish what it needs to do without them. However the alternative, which means effectively creating a highly generalized anonymous coderef, setting up a custom execution environment for that coderef, and then installing the generalized coderef with the specific execution environment as a method has some significant drawbacks.

    Here's the particular section that does the main work:
    sub dbmethod { my $name = shift; my %defaultargs = @_; my ($target) = caller; my $coderef = sub { my $self = shift @_; my %args; if ($defaultargs{arg_list}){ %args = ( args => _process_args($defaultargs{arg_list}, @_) + ); } else { %args = @_; } for my $key (keys %{$defaultargs{args}}){ $args{args}->{$key} = $defaultargs{args}->{$key} unless $args{args}->{$key} or $defaultargs{strict_ar +gs}; $args{args}->{$key} = $defaultargs{args}->{$key} if $defaultargs{strict_args}; } for my $key(keys %defaultargs){ next if grep(/^$key$/, qw(strict_args args returns_objects) +); $args{$key} = $defaultargs{$key} if $defaultargs{$key}; } my @results = $self->call_dbmethod(%args); if ($defaultargs{returns_objects}){ for my $ref(@results){ $ref = "$target"->new(%$ref); } } if ($defaultargs{merge_back}){ _merge($self, shift @results); return $self; } return shift @results unless wantarray; return @results; }; no strict 'refs'; *{"${target}::${name}"} = $coderef; }

    Now that is 40 lines of code and 30 lines of it go into the coderef which is executed when the method is actually run. This doesn't seem too much but it does the work of 5-10 lines of code in an imperative style. In other words, it is 5-6 times as long and intensive as it needs to be.

    With macros, it would be quite possible to generate only the code needed for the specific function rather than creating a generalized case which has to handle many non-applicable inputs, and then create a context where it only gets what it needs.

Almost 28 new names for 32 old marks
6 direct replies — Read more / Contribute
by tye
on Sep 06, 2014 at 01:42

    We were discussing a software bug and somebody mentioned "vertical pipe" and I thought, "Then it should be called 'bong'". It took several days after that, but I eventually settled on my new names for all of the ASCII punctuation marks:

    ! bang | bong @ bung & dung $ bling ^ sting < bring > brung ( sling ) slung [ cling ] clung { fling } flung : sing ; sung " string ' strong ` strang ~ swing = rung ? rang . ding , dang / slash \ sash - dash _ lash # bash * splash % rash + crash

    Each is mnemonic but I'll leave divining etymologies as an exercise; some of them might be entertaining to realize (some I find entertaining while obvious, YMMV).

    - tye        

RFC Using PERL HEREDOC script within bash
4 direct replies — Read more / Contribute
by dcronin135
on Aug 26, 2014 at 23:29

    This submission is in response to others asking how to embedded a PERL within a bash or ksh script. Though it may not be a common practice, it does illustrate a couple of examples as to how this would be accomplished.

    #!/bin/sh # If you are not passing bash var's into the PERL HEREDOC, # then single quote the HEREDOC tag perl -le "$(cat <<'MYPL' # Best to build your out vars rather than writing directly # to the pipe until the end. my $STDERRdata="", $STDOUTdata=""; while ($i=<STDIN>){ chomp $i; $STDOUTdata .= "To stdout\n"; $STDERRdata .= "Write from within the heredoc\n"; MYPL print $STDOUTdata; # Doing the pipe write at the end will save you warn $STDERRdata; # a lot of frustration. )" <myInputFile 1>prints.txt 2>warns.txt


    or

    #!/bin/sh set WRITEWHAT="bash vars" # If you want to include your bash var's # Escape the $'s that are not bash vars. perl -le "$(cat <<MYPL my $STDERRdata="", $STDOUTdata=""; while (\$i=<STDIN>){ chomp \$i; \$STDOUTdata .= "To stdout\n"; \$STDERRdata .= "Write $WRITEWHAT from within the heredoc\n"; MYPL print \$STDOUTdata; # Doing the pipe write at the end will save you warn \$STDERRdata; # a lot of frustration. )" <myInputFile 1>prints.txt 2>warns.txt

    If you wanted to pass command line arguments, insert them before the < indirect for STDIN.

How realistic is an extended absence?
13 direct replies — Read more / Contribute
by ksublondie
on Aug 15, 2014 at 13:17
    I've been working for the same small, local company since college (12 years -- CS degree) and the sole programmer for the last 7...5 of which have been almost exclusively from home. I love my job, the company is great, can't ask for a better boss, I'm able to work independently and come up with my own projects. But lately, I've been contemplating staying home* to watch the kiddos (currently 3 all <=5). I'm flat out burned out and my priorities have shifted.

    How realistic is it to quit my job for an extended adsence (5+ years) and later return to a programming/IT position? Am I going to be pigeon holed into the baby-track? Will I be untouchable & irrelavant?

    * EDIT: "staying at home" = quitting my job/programming. For clarification, I have been working at home full-time with the kiddos from day one. Always in the past, it worked rather well. It was all they ever knew. My parenting style is rather "hands off" (not to say I neglect my children, but I make sure their needs are met while teaching them to be independent and doing things for themselves if it's within their capability). As a result, they have amazing attention spands and are capable of entertaining themselves. Plus a fortune invested in baby gates helps. Toddlers running around are less distracting than my coworkers and all the drama, politics, meetings about the next meeting, etc.

    I don't know if it's the addition of #3, or their ages requiring more mental stimulation, or #2 being a yet-to-be-potty-trained holy terror...or a combination thereof...but it's not working so smoothly anymore. I'm debating about quitting completely. I can tell myself to "stay in the loop" independently, but realistically, I know I won't. I already feel irrelavant since I'm not physically in the office.

RFC: interface for a DBD::Proxy-like module client-side DSN
No replies — Read more | Post response
by MidLifeXis
on Aug 14, 2014 at 09:22

    I made mention of this in the CB the other day, but didn't get many responses, so I thought I would ask it here to perhaps get a wider audience and set of responses.

    I am modifying a copy of DBD::Proxy/DBI::ProxyServer so that instead of specifying the entire server-side DSN on the client side, you instead specify a known name of a handle to a configured DSN on the server side. Using this and implementing the sql section of the configuration to another set of known queries would allow the client to use a DBI compliant data source without needing to have the server-side implementation details available. I am also looking to update the connection / user / query definition sections to make them more able to be isolated from one another.

    • Does a client-side DSN along the lines of dbi:Router:hostname=$hostname;port=$port;dsn=dbi:router:$remotename seem like a reasonable interface? [clarification: $hostname and $port are for connecting to the proxy / routing server, not the database -- that is fully configured on the routing server] Is there something (currently) better to base this on than DBD::Proxy/DBI::ProxyServer?
    • Does the name seem sensible?
    • Should I just try to incorporate this directly into the DBD::Proxy core itself?
    • Any other thoughts / previously invented wheels / ideas?

    The major use case I have for this is to standardize access to all of the little bits of information I have to use for my applications which currently exist in different data stores (CSV/TSV, SQLite, ldap, ...) in order to migrate them into a more manageable setup without impacting the application code. This type of configuration would also allow for the mockup of a testing set of data, migration to different platforms, ...

    Updates

    • pgbouncer was mentioned as a similar tool
    • Added description of my use case
    • Added a clarification of what the host/port refer to

    --MidLifeXis

RFC: pianobar event example
2 direct replies — Read more / Contribute
by ulterior_modem
on Aug 10, 2014 at 22:02
    Hello monks,

    I know enough perl to be bad; however I live in a unix userland and some things are more universally accepted than others, one of them being perl. I usually play around in php, but writing this sort of script in php seems wrong.

    My goal was for it to be understandable and log songs played via pianobar to csv for use with other things. What ways could this be improved? Any feedback is appreciated.

    //ulterior_modem

    Script.

    use strict; use warnings; # this holds all of the lines output by pianobar. my @input = <STDIN>; # lines parsed into hash. my %data; # file we want to write output to my $file = '/home/ulterior/pandora.log'; # assembled CSV line. my $line; # last line of logfile. my $lastline; # remove newlines from end of all values in array. chomp @input; # build hash from contents of array. foreach my $var (@input) { (my $key, my $value) = split(/\=/, $var); $data{$key} = $value; } # check to see if all the field we want are defined. if (defined($data{title}) && defined($data{artist}) && defined($data{album}) && defined($data{songStationName})) { # compose csv line with/without album art. if (defined($data{coverArt})) { $line = '"'.$data{title}.'","'.$data{album}.'","'.$data{artist}.'"," +'.$data{songStationName}.'","'.$data{coverArt}.'"'."\n"; } else { $line = '"'.$data{title}.'","'.$data{album}.'","'.$data{artist}.'" +,"'.$data{songStationName}.'"'."\n"; } } # check to see if log file exists. if (-e $file) { # check to see if the last line is the same to avoid duplication. $lastline = qx/tail -n 1 $file/; if ($line eq $lastline) { exit(0); } # write csv line to file. else { open(HANDLE, ">>", $file); print(HANDLE "$line"); close(HANDLE); } }

    Sample data.

    artist=Bastille title=Pompeii album=Pompeii (Remixes) coverArt=http://cont-2.p-cdn.com/images/public/amz/9/5/3/6/800026359_5 +00W_500H.jpg stationName=QuickMix songStationName=Major Tom Radio pRet=1 pRetStr=Everything is fine :) wRet=1 wRetStr=Everything's fine :) songDuration=214 songPlayed=214 rating=0 detailUrl=http://www.pandora.com/bastille/pompeii-remixes/pompeii?dc=2 +32&ad=1:23:1:47805::0:msn:0:0:581:307:IN:18167:0:0:0:0:6:0 stationCount=74 station0=28 Days Radio station1=And so on...
Private & Protected Objects
3 direct replies — Read more / Contribute
by Sixes
on Aug 10, 2014 at 13:46

    Some time ago (nearly 15 years, actually) in this thread, btrott was talking about various ways of protecting a blessed object and quite a lot of discussion came from it.

    I haven't seen anyone suggest using Variable::Magic to achieve this. I'm thinking of writing a base class with a class method on the lines of this.

    sub new { my $class = shift; my %params = @_; my $protected = sub { croak qq{Attempt to access protected data "$_[2]"} unless call +er->isa(__PACKAGE__); }; my $wiz = wizard( store => $protected, fetch => $protected, exists => $protected, delete => $protected, ); my %self; cast %self, $wiz; my $self = \%self; bless $self, $class; $self->$_($params{$_}) foreach keys %params; return $self; }

    Does anyone have any views on whether this (a) will work correctly and (b) will be useful? The intention is to make the underlying hash inaccessable other than to subclasses of a class using this as a parent.

    The main problem I'm trying to solve is the programmer who accidentally types $obj->{field} when he meant $obj->field, thereby inadvertantly bypassing any clever stuff in the getter.

Contemplating some set comparison tasks
8 direct replies — Read more / Contribute
by dwhite20899
on Aug 08, 2014 at 14:32

    I'm stewing on a particular task that is likely to reappear from time to time. I'd like to find an efficient way to do this work so it can scale up in future.

    In summary, I have Keys and Sources. A Key may come from one or many Sources, a Source may generate one or more Keys. What is the minimal list of Sources which cover the Keys?

    I have data in the format "Key|Source", /^[0-9A-F]{40}\|[0-9a-f]{40}$/

    0000002D9D62AEBE1E0E9DB6C4C4C7C16A163D2C|2f214516cdcab089e83f3e5094928 +fe9611f2f51 000000A9E47BD385A0A3685AA12C2DB6FD727A20|2adeac692d450c54f8830014ee6cb +e3a958c1e60 00000142988AFA836117B1B572FAE4713F200567|04bb7bbed62376f9aaec15fe6f18b +89b27a4c3d8 00000142988AFA836117B1B572FAE4713F200567|6935a8fc967a6ffc20be0f07f2bb4 +a46072a397e 00000142988AFA836117B1B572FAE4713F200567|8c88f4f3c4b1aff760a026759ae80 +7af6c40e015 00000142988AFA836117B1B572FAE4713F200567|974c820f53aded6d6e57ca8de2c33 +206e2b5f439 00000142988AFA836117B1B572FAE4713F200567|b05be3e17bb9987ffb368696ee916 +dd9b9c2f9b3 000001BCBC3B7C8C6E5FC59B686D3568132D218C|0d4c09539f42165bb8b1ab890fe6d +c3d3ca838b3 000001BCBC3B7C8C6E5FC59B686D3568132D218C|9fd421d4e020788100c289d21e4b9 +297acaaff62 000001BCBC3B7C8C6E5FC59B686D3568132D218C|d09565280ebae0a37ca9385bc39c0 +a777a446554 000001E4975FA18878DF5C0989024327FBE1F4DF|55b8ece03f4935f9be667e332d52f +7db3e17b809 000001EF1880189B7DE7C15E971105EB6707DE83|cd15550344b5b9c2785a13ef95830 +15f267ad667 000002F2D7CB4D4B548ADC623F559683D6F59258|36bed8bdb6d66fb67f409166f5db6 +4b02199812f 0000034C9033333F8F58D9C7A64800F509962F3A|3c4b0a3c1acf6e03111805a0d8b4e +879df112b7a 000003682106A4CB4F9D3B1B6E5C08820FCFD1B2|cd15550344b5b9c2785a13ef95830 +15f267ad667 00000368B9CFE1B4CF9F3D38F3EFD82840BA280D|50edd315b9217345b1728c38b0265 +7df42043197 000003A16A5A1C6CCDDBE548E85261422489A458|691845459c0ad35b28cce4dffc0e3 +ee8912fb0f5 0000046FD530A338D03422C7D0D16A9EE087ECD9|13e213f346ce624e9be99b356ab91 +25af563a375 0000046FD530A338D03422C7D0D16A9EE087ECD9|67c0da2da88a23a803733cea951e8 +4974b34d029 00000472E2B96A6CD0CBE8614779C5C8197BB42D|0c5e6cdb06c52160ded398d173922 +46269165e0a

    I am now dealing with a 190,000,000+ set of Key|Source pairs. There are 30,000,000 unique Key values and 20,000 unique Source values. There are 23,800,000 Keys that appear only once, so I know I must have at least their Sources in the final set I want. I need to find the smallest set of Sources that cover the 6,200,000 remaining Keys.

    I can think of a brute-force iteration method to do this, but there should be a more elegant (and hopefully more efficient) way to find the smallest coverage set of Sources over the 6,200,000 Keys.

    My data is usually sorted by Key value, and how I'm used to thinking of it. If I sort on Source value, I might have an inspiration.

    So I'm stewing on this...

    UPDATE 2014-08-12

    I have shrunk the problem set by identifying the keys which only come from single sources. I am now left only to consider the set of many-many key-source relationships. That is 9,197,129 relations, between 1,890 sources and 3,692,089 keys. I was able reduce the key signatures from 40 char to 11 char and source signatures from 40 char to 6 char, to buy myself some space.

    Off to bang on this a while...

    Complete 170 MB data in 50MB zip file : data_locks_keys.zip
    format: key|source


Add your Meditation
Title:
Meditation:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":


  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others about the Monastery: (7)
    As of 2015-01-25 23:43 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My top resolution in 2015 is:

















      Results (186 votes), past polls