http://www.perlmonks.org?node_id=754073

koti688 has asked for the wisdom of the Perl Monks concerning the following question:

As i am working on a project regarding Backup/recovery solutions, i need to stress test the application. So i need to send huge data to back end server to test the breakage points.

processing huge data , more than 1gb through XML is taking so much of time.

my question is is textpad processing is better than XMl in perl ?? or can you suggest me which type of file processing is better for huge loads in Perl ??

Replies are listed 'Best First'.
Re: Which is Best ? XML or Textpad
by pemungkah (Priest) on Mar 30, 2009 at 06:00 UTC
    I'm not quite sure I understand your question. You haven't really given us quite enough detail.

    Are you asking, "Is there a better way to generate XML data in Perl than the one I'm using?" - I don't know; I don't know what you're doing yet.

    By "textpad processing" do you mean text manipulation? Yes, that's probably going to be faster that XML manipulation in Perl - but I don't know what text you want, so I can't be sure.

    Or do you mean, "I need to generate big dummy data files for my program. How can I do this with Perl?" - another question that needs more data.

    What are you actually doing? What data files do you need? What format are they in? Do their contents matter at all? If so, what do they need to contain? Have you got any sample code we can see?

    We'd love to help - we just need a better explanation of what you're doing.

      >> Or do you mean, "I need to generate big dummy data files for my program. How can I do this with Perl?"

      Yes, i am generating big dummy XMl files of huge data(more than 1gb). Please check this following programmes

      programme to generate big xml dummy file

      use Utils::XMLUtils qw(loadXMLFile getNodesFromPath getNodeFromPath getChildren getNodeName getValueFromPath getValuesFromPath checkIfExists createXMLDocument createNewElement assignChild createTextElement wrtiteTofile ); use Digest::SHA1 qw(sha1); sub getHexData { my ($key)= unpack("H*",shift); return $key; } print "Enter the file path :"; my $file = <STDIN>; chomp($file); print "Enter chunk size :"; my $size = <STDIN>; chomp($size); open(FILE, $file) || die "Unable to open the file"; my $doc = createXMLDocument(); my $SigData = createNewElement($doc, "SigData"); $len = read(FILE, $data, $size); while ( $len != 0) { $datalen = $size == $len ? $size: $len; $digest = sha1($data); $keydata = getHexData($digest); $data = getHexData($data); $kvpair = createNewElement($doc, "KVPair"); $key = createNewElement($doc, "Key"); $keytext = createTextElement($doc, $keydata); assignChild($key, $keytext); $Value = createNewElement($doc, "Value"); $Valuetext = createTextElement($doc, $data); assignChild($Value, $Valuetext); assignChild($kvpair, $key); assignChild($kvpair, $Value); assignChild($SigData, $kvpair); $len = read(FILE, $data, $size); } wrtiteTofile($SigData,"sample.xml",0); close(FILE);


      utility functions used in above programme are in this programme :
      package Utils::XMLUtils; use strict; use XML::DOM; use XML::DOM::Xpath; use XML::Writer; require Exporter; use vars qw(@ISA @EXPORT_OK $VERSION); @ISA = ('Exporter'); @EXPORT_OK = qw(loadXMLFile checkIfExists getParentNode getValueFromPath getNodesFromPath getValuesFromPath getNodeValue getNodeWithValue createNewElement createTextElement assignChild getNodeFromPath createXMLDocument cloneNode replaceNode getChildren getXMLNodes wrtiteTofile getNodeName ); # This function checks the existence of a given path under a specified + node. # @param $node Node in the XML tree from where the search should start +. # @param $npath Path to be searched for existance. Follows XPath forma +t. sub checkIfExists { my ($node, $nPath) = @_; my $bStatus; $bStatus = $node->exists($nPath); return $bStatus; } # This function retrieves the parent node of the specified node. # # @param $node Node in the XML tree. sub getParentNode { my ($node) = @_; my $pNode = $node->getParentNode; return $pNode; } # This function retrieves text value from the given path under a speci +fied node. # If the path matches to multiple nodes then the text value from the # first node will be retrieved. # # @param $node Node in the XML tree from where the search should start +. # @param $npath Path to be used to retrieve the text value. # Follows XPath format. sub getValueFromPath { my ($node, $nPath) = @_; my $strValue = ""; my @nodes = $node->findnodes($nPath); my $length = @nodes; if ($length > 0) { my @chlds = $nodes[0]->getChildNodes; foreach my $tNode (@chlds) { if ($tNode->getNodeTypeName eq "TEXT_NODE") { $strValue = $tNode->getNodeValue; last; } } } return $strValue; } # This function retrieves all the node references matching to the give +n path # under a specified node. # # @param $node Node in the XML tree from where the search should start +. # @param $npath Path to be used to retrieve the nodes. Follows XPath f +ormat. sub getNodesFromPath { my ($node, $nPath) = @_; my @nodes = $node->findnodes($nPath); return @nodes; } # This function returns the first node references matching to the give +n path # under a specified node. # # @param $node Node in the XML tree from where the search should start +. # @param $npath Path to be used to retrieve the nodes. Follows XPath f +ormat. sub getNodeFromPath { my ($node, $nPath) = @_; my @nodes = $node->findnodes($nPath); return $#nodes >= 0 ? $nodes[0] : undef; } # This function retrieves text values from all the nodes matching to t +he given # path under a specified node. # # @param $node Node in the XML tree from where the search should start +. # @param $npath Path to be used to retrieve the text values. # Follows XPath format. sub getValuesFromPath { my ($node, $nPath) = @_; my $strValue = ""; my @values; my @nodes = $node->findnodes($nPath); my $length = @nodes; foreach my $mNode (@nodes) { my @chlds = $mNode->getChildNodes; foreach my $tNode (@chlds) { if ($tNode->getNodeTypeName eq "TEXT_NODE") { push (@values, $tNode->getNodeValue); last; } } } return @values; } # This function retrieves the text value under a specified node. # # @param $node Node in the XML tree. sub getNodeValue { my ($node) = @_; my $strValue = ""; my @chlds = $node->getChildNodes; foreach my $tNode (@chlds) { if ($tNode->getNodeTypeName eq "TEXT_NODE") { $strValue = $tNode->getNodeValue; last; } } return $strValue; } # This function retrieves the node reference whose text matches to # the specified text. # # @param $node Node in the XML tree from where the search should start +. # @param $npath Path to be used to match the text values. # Follows XPath format. # @param $strExpValue Expected text value. sub getNodeWithValue { my ($node, $nPath, $strExpValue) = @_; my $strValue = ""; my $resNode; my $bFound = 0; my @nodes = $node->findnodes($nPath); foreach my $mNode (@nodes) { my @chlds = $mNode->getChildNodes; foreach my $tNode (@chlds) { if ($tNode->getNodeTypeName eq "TEXT_NODE") { $strValue = $tNode->getNodeValue; if ($strValue eq $strExpValue) { $resNode = $mNode; $bFound = 1; } last; } } last if ($bFound); } return $resNode; } # This function loads the specified XML file and returns the XML tree. # # @param $filePath Path to the XML file. sub loadXMLFile { my ($filePath) = @_; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ($filePath); return $doc; } # This function creates a new xml element. # # @param $node Node to be used for creating XML element. # @param $strElmtName Name of the element to be created. sub createNewElement { my ($node, $strElmtName) = @_; return $node->createElement($strElmtName); } # This function creates a new text node. # # @param $node Node to be used for creating text node. # @param $strText Text to be added to the node. sub createTextElement { my ($node, $strText) = @_; return $node->createTextNode($strText); } # This function adds a given node to another node. # # @param $parentNode Node to which the child node is to be assigned. # @param $chldNode Node to assigned. sub assignChild { my ($parentNode, $chldNode) = @_; $parentNode->appendChild($chldNode); } sub createXMLDocument { return new XML::DOM::Document; } sub replaceNode { my ($parentNode, $oldChildNode, $newChildNode) = @_; $parentNode->replaceChild($newChildNode, $oldChildNode); } sub cloneNode { my ($node) = @_; return $node->cloneNode(1); } sub getChildren { my ($node) = @_; return $node->getChildNodes; } sub getNodeName { my ($node) = @_; return $node->getNodeName; } sub wrtiteTofile { my ($node, $file, $appendFile) = @_; my $output; if ($appendFile) { $output = new IO::File(">>$file"); } else { $output = new IO::File(">$file"); } my $writer = new XML::Writer(OUTPUT => $output, DATA_MODE=>1,DATA_IN +DENT => 4); getXMLNodes($node, $writer); $writer->end(); $output->close(); } sub getXMLNodes { my ($node, $writer) = @_; my $name = $node->getNodeName; if (not ($name eq "#text")) { $writer->startTag($name); foreach my $chldNode ($node->getChildNodes) { getXMLNodes($chldNode, $writer); } $writer->endTag($name); } else { my $val = trim($node->getNodeValue); if (not ($val eq "") ){ $writer->characters($val); } } } sub trim($) { my $string = shift; $string =~ s/^\s+//; $string =~ s/\s+$//; return $string; } 1;


      with the above programmes, i will create a huge dummy xml files,which look like below containing multiple Sigdata blocks.
      <SigData> <KVPair> <Key>...</Key> <Value>..</Value> </SigData> <code> <SigData> <KVPair> <Key>....</Key> <Value>..</Value> </SigData>



      I need to read this <key>,<value>. please check this programme. Here my @keys and @values are reading the above <key>,<value> contents

      sub putData { my ($event, $BENodeMap, $testID,$eventName, $obj) = @_; my $eventID = getValueFromPath($event, "ID"); my $eventName = uc(getValueFromPath($event, "Name")); { lock %eventStatus; if (exists $eventStatus{$testID."_".$eventID}) { print "Event ID \"$eventID\" is defined more than once in the +test case $testID"; return 0; } $eventStatus{$testID."_".$eventID} = EVENT_START; } my @BENodes = getValuesFromPath($event ,"Node/ID"); my $port = getValueFromPath($event, "Settings/Port"); my $sigDataFile = getValueFromPath($event ,"data/file"); my $sigData = loadXMLFile($sigDataFile); my @dataNodes = getNodesFromPath($sigData, "/SigData/KVPair"); my @FakeFEObjects; foreach my $BENode (@BENodes) { my $ip = $BENodeMap->{$BENode}; my $obj1 = new FakeFEClient::FakeFEClient("192.168.106.57",9001, +"1","default_encrypt1"); push @FakeFEObjects, $obj1; } my @failedKeys; my $s_bts= $obj->start_timestamp(); my @keys = getValuesFromPath($sigData ,"/SigData/KVPair/Key"); my @values = getValuesFromPath($sigData ,"/SigData/KVPair/Value"); $obj->beginTransaction("ID:".$eventID."_".$eventName); my $idx = 0; print " No of Keys: $#keys "; foreach my $key (@keys) { my $length = length($values[$idx]); foreach my $FakeFEObject (@FakeFEObjects) { my $out = $FakeFEObject->putData($key, $length, $values[$idx]) +; if ($out == 0) { push @failedKeys, $key; } } $idx++; } if ($#failedKeys != -1) { print "Put failed for keys:".(join(", ", @failedKeys))."\n"; { lock %eventStatus; $eventStatus{$testID."_".$eventID} = EVENT_END_FAILURE; } return 0; } print "\nBackUp Successful "; { lock %eventStatus; $eventStatus{$testID."_".$eventID} = EVENT_END_SUCCESS; } return 1; }


      I need to read this files in another programme, which is taking me long time to read the file. My concern is Time.

      So i am looking for otherfiles which will take take less time for huge loads.Their contents does not matter. i just need to read the content and send them to my Backend. i dont do any manipulation on the content too.



      is there any otherway to communicate with you in any IM.my gtalk id is koti688, Ym id is koti_dl.

        I am also not sure what you are trying to do but if you just want to exercise the back end and do not care about the contents you are putting in perhaps you do not need to create and read large XML files. Would a small routine that just hammers data pairs into the back end and warns of any errors be enough?

        Are you trying to test the FakeFEClient which I see in the line: my $obj1 = new FakeFEClient::FakeFEClient("192.168.106.57",9001,"1","default_encrypt1");

        Your code does look to create a lot of large data sets, this will take time and possibly use a lot of memory, if you can take your key/value pairs one at a time and then do the required work you may find it faster. It may be enough to generate random key/value pairs in a loop and the test the insertion of these into the back end.

        Perhaps you can explain in a few steps what you are trying to do and why you decided to do this by first building a massive XML file.

        There are also two lines missing from the code you have posted:

        use strict, use warnings,

        They will save you a lot of time in the long run

        Cheers,
        R.

        Pereant, qui ante nos nostra dixerunt!