Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Dump JudyHS

by diotalevi (Canon)
on Dec 29, 2008 at 22:57 UTC ( #733140=snippet: print w/replies, xml ) Need Help??
Description:

This dumps the contents of a Judy::HS/JudyHS(3) array. I had to violate its API to do this. JudyHS is constructed as nested Judy::L/JudyL(3) arrays. The top level encodes the string length. The next level encodes a hashing. Each additional level encodes another 4 or 8 bytes of the input string until no more are needed and it terminates in a C struct which contains the key and value.

The below example loaded Judy::HS with a map from string to line number. It's completely arbitrary and I did it just to demo to myself that I could enumerate the contents of Judy::HS if I needed to.

Judy.h in the Judy C library has a nice, readable description of the structure that's being dumped here.

#!perl
use strict;
use warnings;
use Config '%Config';
use Judy::HS qw( Set );
use Judy::L qw( First Next );
use Judy::Mem qw( Peek Ptr2String2 );

use constant LONGSIZE => 0+$Config{longsize};

# Load $hs with a pile of data.
my $hs;
@ARGV = "$ENV{HOME}/Documents/Political Data/Secretary of state/Statew
+idevoters13102.txt";
while (<>) {
  Set( $hs, $_, $. );
}


# Nested printing.
our $P = -1;
sub p { print ' ' x ( 4 * $P ), @_ }


# Loop over JudyL array, each entry contains all strings of length $le
+ngthKey.
my ( undef, $lengthL, $lengthKey ) = First( $hs, 0 );
while ( defined $lengthKey ) {
  local $P = 1+$P;
  p( "LENGTH: $lengthKey\n" );


  # Loop over JudyL array, each entry contains all strings that map to
+ the same $hashKey.
  my $hashCount = 0;
  my ( undef, $hashL, $hashKey ) = First( $lengthL, 0 );
  while ( defined $hashKey ) {
    local $P = 1+$P;
    p( sprintf "HASH @{[ ++ $hashCount ]}: 0x%x\n", $hashKey );


    # Recurse down through JudyL until I find the key/value.
    dumpLTree( $hashL );

    ( undef, $hashL, $hashKey ) = Next( $lengthL, $hashKey );
  }

  ( undef, $lengthL, $lengthKey ) = Next( $hs, $lengthKey );
}


sub dumpLTree {
  my ( $l ) = @_;

  # Find the stored key/values.
  if ( Judy::JLAP_INVALID & $l ) {
    $l &= ~Judy::JLAP_INVALID;
    local $P = 1+$P;


    # Unpack the C struct containing my key value. The value is the fi
+rst 
    my $value = Peek( $l );
    my $str   = Ptr2String2( LONGSIZE + $l, $lengthKey );
    p( "{Value: $value, String: $str}\n" );
  }
  else {

    # Go deeper.
    my ( undef, $innerL, $key ) = First( $l, 0 );
    while ( defined $key ) {
      local $P = 1+$P;
      p( "str: $key\n" );

      dumpLTree( $key );
      ( undef, $innerL, $key ) = Next( $l, $key );
    }
  }
}
Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: snippet [id://733140]
help
Chatterbox?
[TCLion]: I am breaking down the line and putting the date in the correct order during output
[TCLion]: at least that's the plan
[1nickt]: Corion this looks good, from the pod: "there are 9000+ variations that are detected correctly in the test files (see t/data/* for most of them). If you can think of any that I do not cover, please let me know."
[TCLion]: some moron put the date like this : Mon Feb 20 09:31:30 2017
[Corion]: 1nickt: Yes, the module sounds promising indeed
[1nickt]: "putting the date in correct order" how?
[Corion]: TCLion: Whee ;)
[TCLion]: need to put like this : 2017-02-20 09:30:53
[1nickt]: That's why I asked if you are using DateTime. It has a large number of supporting modules (the author likes the term 'eco-system') so if you are already creating a DateTime obj from your dates, this module would read in the mnoron-formatted 1s seamlessly
[TCLion]: when I put the date together it looks like : 2017-Feb-24 (month is the problem)

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (14)
As of 2017-03-23 14:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Should Pluto Get Its Planethood Back?



    Results (288 votes). Check out past polls.