DeductionPro is a software package that is used by many U.S. residents to help determine the value of items donated to charity. Unfortunately, the UI does not provide a good search function and finding items within the hierarchical tree of categories can be tedious. It did not take me long to get frustrated enough to take matters into my own hands (with a bit of Perl, of course).

The data file used by the program is a simple, albeit awkward, XML file. I sprinkled a bit of XML::Twig over it and produced a tab-separated text file that can be searched more easily.

I'm quite sure there is a more efficient, or at least more Perl-ish, way of doing this. I'd be interested in other approaches, especially since I'm not very familiar with Twig.

use strict; use warnings; use XML::Twig; # Could specify $infile in @ARGV, but this is a specialized use case my $infile = 'DPNoncashDetails.xml'; #************************************************* open( my $outfh, '>', $infile . '.txt' ) or die $!; my %data; # holds item and pricing data my $twig = XML::Twig->new( start_tag_handlers => { Item => \&item } ); $twig->parsefile( $infile ); my @fields = ( 'name', 'Like New', 'Minor Wear', 'Average Wear' ); print $outfh '# ', join( "\t", 'Category', @fields ), "\n"; foreach my $treestr ( sort { $a cmp $b } keys %data ) { my $h = $data{$treestr}; foreach my $id ( sort { $h->{$a}{name} cmp $h->{$b}{name} } keys % +$h ) { print $outfh join( "\t", $treestr, @{ $h->{$id} }{ @fields } ) +, "\n"; } } #************************************************* sub item { my ( $twig, $elt ) = @_; my $tree = get_category_tree( $elt ); $tree = join( " => ", @$tree ); my $href = $elt->atts; verify_id( $tree, $href ); $data{$tree}{ $href->{itemNum} }{name} = $href->{name}; $data{$tree}{ $href->{itemNum} }{ $href->{quality} } = $href->{fmv +}; } sub get_category_tree { my ( $elt ) = @_; my @tree; while( my $parent = $elt->parent ) { last if $parent->tag eq 'NonCashDetails'; next if $parent->tag ne 'Category'; unshift( @tree, $parent->att('name') ); $elt = $parent; } return \@tree; } sub verify_id { my ( $tree, $href ) = @_; my $id = $href->{itemNum}; my $name = $href->{name}; if( exists $data{$tree} && exists $data{$tree}{$id} ) { if( $name ne $data{$tree}{$id}{name} ) { print "Warning: about to overwrite data due to record mism +atch\n"; print " $tree, item id = $id:\n"; print " [existing]: $data{$tree}{$id}{name}\n"; print " [new]: $name\n"; } } }

Replies are listed 'Best First'.
Re: Reformat DeductionPro 2008 Data File
by dHarry (Abbot) on Mar 30, 2009 at 06:40 UTC

    Unfortunately you don't show some of the XML, i.e. what exactly does it look like? You say the XML file is simple, then XML::Simple; jumps into mind. But given that you have a working XML::Twig solution why change? On the other hand if you want to search in XML files you might consider using xpath expressions. XML::Twig::XPath implements a subset of xpath. Also see XML::XPath.

      You are right - an example of the input XML would have made this more interesting. I didn't include it because the program is not free and I didn't want to get into a situation where I could be accused of publishing proprietary (or at least fee-based) data in a public forum. Nonetheless, I created an example based on the original input file using fictitious data. The original XML file is much larger of course, but hopefully this will suffice.

      <?xml version="1.0" encoding="UTF-8"?> <NonCashDetails verNo="" year=""> <Category name="Appliances"> <Category name="Household (Small)"> <Item desc="Air Purifier" fmv="11.11" itemNum="120" name="Air Purifier" quality="Like New" /> <Item desc="Air Purifier" fmv="10.01" itemNum="120" name="Air Purifier" quality="Minor Wear" /> <Item desc="Air Purifier" fmv="9.09" itemNum="120" name="Air Purifier" quality="Average Wear" /> <Item desc="Bathroom Scale" fmv="9.01" itemNum="116" name="Bathroom Scale" quality="Like New" /> <Item desc="Bathroom Scale" fmv="7.65" itemNum="116" name="Bathroom Scale" quality="Minor Wear" /> <Item desc="Bathroom Scale" fmv="4.32" itemNum="116" name="Bathroom Scale" quality="Average Wear" /> </Category> </Category> <Category name="Clothing"> <Category name="Baby"> <Category name="Accessories"> <Item desc="Belt" fmv="1.25" itemNum="238" name="Belt" quality="Like New" /> <Item desc="Belt" fmv="0.95" itemNum="238" name="Belt" quality="Minor Wear" /> <Item desc="Belt" fmv="0.73" itemNum="238" name="Belt" quality="Average Wear" /> </Category> <Category name="Activewear"> <Item desc="Shorts" fmv="2.35" itemNum="249" name="Shorts" quality="Like New" /> <Item desc="Shorts" fmv="1.58" itemNum="249" name="Shorts" quality="Minor Wear" /> <Item desc="Shorts" fmv="1.04" itemNum="249" name="Shorts" quality="Average Wear" /> </Category> </Category> <Category name="Boy's"> <Category name="Accessories"> <Item desc="Backpack/Book Bag" fmv="7.34" itemNum="533 +" name="Backpack/Book Bag" quality="Like New" /> <Item desc="Backpack/Book Bag" fmv="6.52" itemNum="533 +" name="Backpack/Book Bag" quality="Minor Wear" /> <Item desc="Backpack/Book Bag" fmv="5.41" itemNum="533 +" name="Backpack/Book Bag" quality="Average Wear" /> </Category> </Category> </Category> </NonCashDetails>

Re: Reformat DeductionPro 2008 Data File
by Anonymous Monk on Apr 02, 2009 at 04:07 UTC
    Thank you! The DeductionPro UI is a travesty.