http://www.perlmonks.org?node_id=995813


in reply to Re^2: dynamic number of threads based on CPU utilization
in thread dynamic number of threads based on CPU utilization

my apologies...I thought that since the procXml sub worked just fine, it would not be relevant to the discussion or potential solution. Within the procXml sub, I simply slurp the file into a hash, then operate on the hash.

I was under the impression that because I was operating on the file contents in memory (i.e. the hash), it was a mostly CPU-bound process (minus slurping the input file and printing to the output file.

sub procXml { my($inFile)=@_; my $triples; my (%countries,%avDetails,%avFiles); my $fsize = -s $inFile; my $fmb=$fsize/1048576; print "PROCESSING: $inFile (".sprintf("%.4f",$fmb)." MB)\n"; my $INFILE; open($INFILE,'<',$inFile); my $xmlString=read_file($inFile); close($INFILE); my $xml_converter = XML::Hash->new(); my $xml_hash; eval { $xml_hash = $xml_converter->fromXMLStringtoHash($xmlString); }; if($@) { print "BAD XML: $inFile\n"; return; } $xmlString = undef; foreach my $outer (@{$xml_hash->{'cyveillance'}->{'inspected_url'} +->{'URL'}}) { my $domainName=$outer->{'Domain_Name'}->{'text'}; my $exploitType=$outer->{'Exploit_Type'}->{'text'}; my $inspectedTime=$outer->{'InspectedTime'}->{'text'}; my ($ss,$mm,$hh,$day,$month,$year,$zone) = strptime($inspected +Time); $year+=1900; $month+=1; $month=sprintf("%02d",$month); $hh='00' unless defined $hh; $mm='00' unless defined $mm; $ss='01' unless defined $ss; $inspectedTime="$year-$month-$day"."T"."$hh:$mm:$ss"; my $ip=$outer->{'IP'}->{'text'}; my $exploitDescription=$outer->{'Exploit_Description'}->{'text +'}; my $hostName=$outer->{'Host_Name'}->{'text'}; my $referenceUrl=$outer->{'reference_url'}; $ip=defined $ip?$ip eq ''?undef:$ip=~m/^-$|^unknown$/i?undef:$ +ip:undef; $exploitDescription=defined $exploitDescription?$exploitDescri +ption eq ''?undef:$exploitDescription=~m/^-$|^unknown$/i?undef:$explo +itDescription:undef; $hostName=defined $hostName?$hostName eq ''?undef:$hostName=~m +/^-$|^unknown$/i?undef:$hostName:undef; $referenceUrl=defined $referenceUrl?$referenceUrl eq ''?undef: +$referenceUrl=~m/^-$|^unknown$/i?undef:$referenceUrl:undef; if(ref($outer->{'Binary'}) eq 'ARRAY') { foreach my $binary (@{$outer->{'Binary'}}) { my $fileName=$binary->{'File_Name'}->{'text'}; my $fileURL=$binary->{'Binary_Path'}->{'text'}; my $pestName=$binary->{'Pest_Name'}->{'text'}; my $md5=$binary->{'Hash'}->{'MD5'}->{'text'}; my $fileSize=$binary->{'File_Size'}->{'text'}; $fileName=defined $fileName?$fileName eq ''?undef:$fil +eName=~m/^-$|^unknown$|^Unidentified Threat$/i?undef:$fileName:undef; $fileURL=defined $fileURL?$fileURL eq ''?undef:$fileUR +L=~m/^-$|^unknown$|^Unidentified Threat$/i?undef:$fileURL:undef; $pestName=defined $pestName?$pestName eq ''?undef:$pes +tName=~m/^-$|^unknown$|^Unidentified Threat$/i?undef:$pestName:undef; $pestName=$1 if defined $pestName && $pestName =~ m/Fo +und potentially unwanted program (.*)\./; $md5=defined $md5?$md5 eq ''?undef:$md5=~m/^-$|^unknow +n$|^Unidentified Threat$/i?undef:$md5:undef; $fileSize=defined $fileSize?$fileSize eq ''?undef:$fil +eSize=~m/^-$|^unknown$|^Unidentified Threat$/i?undef:$fileSize=~m/^.[ +0-9]+$/?$fileSize:undef:undef; my $server_domainName=$binary->{'Server_Properties'}-> +{'Domain_Name'}->{'text'}; my $server_hostName=$binary->{'Server_Properties'}->{' +Host_Name'}->{'text'}; my $server_ip=$binary->{'Server_Properties'}->{'IP'}-> +{'text'}; my $server_ISP=$binary->{'Server_Properties'}->{'ISP_D +ata'}->{'ISP'}->{'text'}; my $server_numBinaries=$binary->{'Server_Properties'}- +>{'ISP_Data'}->{'Number_Hosted_Binaries'}->{'text'}; my $server_zipCode=$binary->{'Server_Properties'}->{'I +SP_Data'}->{'Zip_Code'}->{'text'} if exists $binary->{'Server_Propert +ies'}->{'ISP_Data'}->{'Zip_Code'}->{'text'}; my $server_city=$binary->{'Server_Properties'}->{'ISP_ +Data'}->{'City'}->{'text'} if exists $binary->{'Server_Properties'}-> +{'ISP_Data'}->{'City'}->{'text'}; my $server_region=$binary->{'Server_Properties'}->{'IS +P_Data'}->{'Region'}->{'text'} if exists $binary->{'Server_Properties +'}->{'ISP_Data'}->{'Region'}->{'text'}; my $server_country=$binary->{'Server_Properties'}->{'I +SP_Data'}->{'Country'}->{'text'} if exists $binary->{'Server_Properti +es'}->{'ISP_Data'}->{'Country'}->{'text'}; my $server_numSitesHosted=$binary->{'Server_Properties +'}->{'ISP_Data'}->{'Number_Hosted_Sites'}->{'text'} if exists $binary +->{'Server_Properties'}->{'ISP_Data'}->{'Number_Hosted_Sites'}->{'tex +t'}; my $webServer=$binary->{'Server_Properties'}->{'ISP_Da +ta'}->{'Web_Server_info'}->{'text'}; $server_domainName=defined $server_domainName?$server_ +domainName eq ''?undef:$server_domainName=~m/^-$|^unknown$/i?undef:$s +erver_domainName:undef; $server_hostName=defined $server_hostName?$server_host +Name eq ''?undef:$server_hostName=~m/^-$|^unknown$/i?undef:$server_ho +stName:undef; $server_ip=defined $server_ip?$server_ip eq ''?undef:$ +server_ip=~m/^-$|^unknown$/i?undef:$server_ip:undef; $server_ISP=defined $server_ISP?$server_ISP eq ''?unde +f:$server_ISP=~m/^-$|^unknown$/i?undef:$server_ISP:undef; $server_numBinaries=defined $server_numBinaries?$serve +r_numBinaries eq ''?'1':$server_numBinaries=~m/^-$|^unknown$/i?'1':$s +erver_numBinaries=~m/^.[0-9]+$/?$server_numBinaries:'1':'1'; $server_zipCode=defined $server_zipCode?$server_zipCod +e eq ''?undef:$server_zipCode=~m/^-$|^unknown$/i?undef:$server_zipCod +e:undef; $server_city=defined $server_city?$server_city eq ''?u +ndef:$server_city=~m/^-$|^unknown$/i?undef:$server_city:undef; $server_region=defined $server_region?$server_region e +q ''?undef:$server_region=~m/^-$|^unknown$/i?undef:$server_region:und +ef; $server_country=defined $server_country?$server_countr +y eq ''?undef:$server_country=~m/^-$|^unknown$/i?undef:$server_countr +y:undef; $server_numSitesHosted=defined $server_numSitesHosted? +$server_numSitesHosted eq ''?'1':$server_numSitesHosted=~m/^-$|^unkno +wn$/i?'1':$server_numSitesHosted=~m/^.[0-9]+$/?$server_numSitesHosted +:'1':'1'; $webServer=defined $webServer?$webServer eq ''?'unknow +n':$webServer=~m/^-$|^unknown$/i?'unknown':$webServer:'unknown'; $server_country =~ s/\s/_/g if defined $server_country +; my (%avDetections,%threatTypes,%classes); next if !defined $binary->{'Class'}; foreach(keys $binary->{'Class'}) { $classes{$_}=1 if $binary->{'Class'}->{$_}->{'text +'} == 1; } foreach(keys $binary->{'Anti-Virus'}) { $avDetections{$_}->{'Signature_Version'}=$binary-> +{'Anti-Virus'}->{$_}->{'Signature_Version'} unless $binary->{'Anti-Vi +rus'}->{$_}->{'Signature_Version'} eq ''; $avDetections{$_}->{'Engine_Version'}=$binary->{'A +nti-Virus'}->{$_}->{'Engine_Version'} unless $binary->{'Anti-Virus'}- +>{$_}->{'Engine_Version'} eq ''; $avDetections{$_}->{'Threat_Name'}=$binary->{'Anti +-Virus'}->{$_}->{'Threat_Name'} unless $binary->{'Anti-Virus'}->{$_}- +>{'Threat_Name'} eq ''; } foreach(keys $binary->{'Type'}) { $threatTypes{$_}=1 if $binary->{'Type'}->{$_}->{'t +ext'} == 1; } $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasDomainName> <http://cs.org/domain#$domainName> .\n| +if defined $domainName; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasExploitType> <http://cs.org/exploitAttempted#$exploi +tType> .\n| if defined $exploitType; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasExploitDescription> "$exploitDescription" .\n| if de +fined $exploitDescription; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/DTGstart> "$inspectedTime"^^<http://www.w3.org/2001/XML +Schema#dateTime> .\n| if defined $inspectedTime; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasIpAddr> <http://cs.org/ipv4#$ip> .\n| if defined $ip +; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasHostName> <http://cs.org/host#$hostName> .\n| if def +ined $hostName; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasURL> <http://cs.org/url#$referenceUrl> .\n| if defin +ed $referenceUrl; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasFile> <http://cs.org/file#$fileName> .\n| if defined + $fileName; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasFileURL> <http://cs.org/url#$fileURL> .\n| if define +d $fileURL; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasFileMD5> <http://cs.org/MD5#$md5> .\n| if defined $m +d5; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasFileSize> "$fileSize"^^<http://www.w3.org/2001/XMLSc +hema#integer> .\n| if defined $fileSize; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasPestName> <http://cs.org/pest_name#$pestName> .\n| i +f defined $pestName; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasWebServer> "$webServer" .\n| if defined $webServer; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasServerDomain> <http://cs.org/domain#$server_domainNa +me> .\n| if defined $server_domainName; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasServerHostName> <http://cs.org/host#$server_hostName +> .\n| if defined $server_hostName; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasServerIpAddr> <http://cs.org/ipv4#$server_ip> .\n| i +f defined $server_ip; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasServerNumSites> "$server_numSitesHosted"^^<http://ww +w.w3.org/2001/XMLSchema#integer> .\n| if defined $server_numSitesHost +ed; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasServerNumBinaries> "$server_numBinaries"^^<http://ww +w.w3.org/2001/XMLSchema#integer> .\n| if defined $server_numBinaries; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasISP> "$server_ISP" .\n| if defined $server_ISP; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasServerZipCode> "$server_zipCode" .\n| if defined $se +rver_zipCode; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasServerCity> <http://cs.org/city#$server_city> .\n| i +f defined $server_city; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasServerRegion> <http://cs.org/city#$server_region> .\ +n| if defined $server_region; $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasServerCountry> <http://cs.org/country#$server_countr +y> .\n| if defined $server_country; $triples .= qq|<http://cs.org/country#$server_country> + <http://cs.org/p/hasServerRegion> <http://cs.org/city#$server_region +> .\n| unless (!defined $server_country || !defined $server_region) | +| exists $countries{$server_country}->{'regions'}->{$server_region}; $triples .= qq|<http://cs.org/region#$server_region> < +http://cs.org/p/hasServerCity> <http://cs.org/city#$server_city> .\n| + unless (!defined $server_region || !defined $server_city) || exists +$countries{$server_country}->{'cities'}->{$server_city}; $triples .= qq|<http://cs.org/city#$server_city> <http +://cs.org/p/hasServerZipCode> <http://cs.org/city#$server_zipCode> .\ +n| unless (!defined $server_city || !defined $server_zipCode) || exis +ts $countries{$server_country}->{'zipcodes'}->{$server_zipCode}; $countries{$server_country}->{'regions'}->{$server_reg +ion}=1 if defined $server_region && defined $server_country; $countries{$server_country}->{'cities'}->{$server_city +}=1 if defined $server_city && defined $server_country; $countries{$server_country}->{'zipcodes'}->{$server_zi +pCode}=1 if defined $server_zipCode && defined $server_country; $triples .= qq|<http:cs.org/file#$fileName> <http://cs +.org/p/detectedAs> <http://cs.org/pest_name#$pestName> .\n| if (defin +ed $fileName && defined $pestName) && (!exists $avFiles{$pestName} || + $avFiles{$pestName} ne $pestName); $avFiles{$pestName}=$fileName if defined $fileName && +defined $pestName; foreach(keys %avDetections) { my $sig=$avDetections{$_}->{'Signature_Version'}; my $eng=$avDetections{$_}->{'Engine_Version'}; my $tn=$avDetections{$_}->{'Threat_Name'}; $tn =~ s/\s/_/g if defined $tn; $triples .= qq|<http://cs.org/record#$recordCount> + <http://cs.org/p/detectedBy> <http://cs.org/AV#$_> .\n|; $triples .= qq|<http://cs.org/record#$recordCount> + <http://cs.org/p/hasAvEngineVersion> "$eng" .\n| if defined $eng; $triples .= qq|<http://cs.org/record#$recordCount> + <http://cs.org/p/hasAvSigVersion> "$sig" .\n| if defined $sig; $triples .= qq|<http://cs.org/record#$recordCount> + <http://cs.org/p/detectedAs> <http://cs.org/avThreat_name#$tn> .\n| +if defined $tn; $triples .= qq|<http://cs.org/file#$fileName> <htt +p://cs.org/p/detectedBy> <http://cs.org/AV#$_> .\n| unless !defined $ +fileName || exists $avDetails{$fileName}->{'avDetection'}->{$_}; $triples .= qq|<http://cs.org/file#$fileName> <htt +p://cs.org/p/detectedAs> <http://cs.org/avThreat_name#$tn> .\n| unles +s (!defined $tn || !defined $fileName) || exists $avDetails{$fileName +}->{'avThreatName'}->{$tn}; $avDetails{$fileName}->{'avDetection'}->{$_}=1 if +defined $fileName; $avDetails{$fileName}->{'avThreatName'}->{$tn}=1 i +f defined $tn && defined $fileName; $avFiles{$tn}=$fileName if defined $fileName && de +fined $tn; } foreach(keys %threatTypes) { $triples .= qq|<http://cs.org/record#$recordCount> + <http://cs.org/p/hasThreatType> <http://cs.org/threatType#$_> .\n|; $triples .= qq|<http://cs.org/file#$fileName> <htt +p://cs.org/p/hasThreatType> <http://cs.org/threatType#$_> .\n| unless + !defined $fileName || exists $avDetails{$fileName}->{'avThreatType'} +->{$_}; $avDetails{$fileName}->{'avThreatType'}->{$_}=1 if + defined $fileName; } foreach(keys %classes) { $triples .= qq|<http://cs.org/record#$recordCount> + <http://cs.org/p/hasThreatClass> <http://cs.org/threatClass#$_> .\n| +; $triples .= qq|<http://cs.org/file#$fileName> <htt +p://cs.org/p/hasThreatClass> <http://cs.org/threatClass#$_> .\n| unle +ss !defined $fileName || exists $avDetails{$fileName}->{'avThreatClas +s'}->{$_}; $avDetails{$fileName}->{'avThreatClass'}->{$_}=1 i +f defined $fileName; } $similar{$domainName}='domain' if defined $domainName; $similar{$hostName}='host' if defined $hostName; $similar{$fileName}='file' if defined $fileName; $similar{$pestName}='pest_name' if defined $pestName; $similar{$server_domainName}='domain' if defined $serv +er_domainName; $similar{$server_hostName}='host' if defined $server_h +ostName; $recordCount++; } } else { my $fileName=$outer->{'Binary'}->{'File_Name'}->{'text'}; my $fileURL=$outer->{'Binary'}->{'Binary_Path'}->{'text'}; my $pestName=$outer->{'Binary'}->{'Pest_Name'}->{'text'}; my $md5=$outer->{'Binary'}->{'Hash'}->{'MD5'}->{'text'}; my $fileSize=$outer->{'Binary'}->{'File_Size'}->{'text'}; $fileName=defined $fileName?$fileName eq ''?undef:$fileNam +e=~m/^-$|^unknown$|^Unidentified Threat$/i?undef:$fileName:undef; $fileURL=defined $fileURL?$fileURL eq ''?undef:$fileURL=~m +/^-$|^unknown$|^Unidentified Threat$/i?undef:$fileURL:undef; $pestName=defined $pestName?$pestName eq ''?undef:$pestNam +e=~m/^-$|^unknown$|^Unidentified Threat$/i?undef:$pestName:undef; $pestName=$1 if defined $pestName && $pestName =~ m/Found +potentially unwanted program (.*)\./; $md5=defined $md5?$md5 eq ''?undef:$md5=~m/^-$|^unknown$|^ +Unidentified Threat$/i?undef:$md5:undef; $fileSize=defined $fileSize?$fileSize eq ''?undef:$fileSiz +e=~m/^-$|^unknown$|^Unidentified Threat$/i?undef:$fileSize=~m/^.[0-9] ++$/?$fileSize:undef:undef; my $server_domainName=$outer->{'Binary'}->{'Server_Propert +ies'}->{'Domain_Name'}->{'text'}; my $server_hostName=$outer->{'Binary'}->{'Server_Propertie +s'}->{'Host_Name'}->{'text'}; my $server_ip=$outer->{'Binary'}->{'Server_Properties'}->{ +'IP'}->{'text'}; my $server_ISP=$outer->{'Binary'}->{'Server_Properties'}-> +{'ISP_Data'}->{'ISP'}->{'text'}; my $server_numBinaries=$outer->{'Binary'}->{'Server_Proper +ties'}->{'ISP_Data'}->{'Number_Hosted_Binaries'}->{'text'}; my $server_city=$outer->{'Binary'}->{'Server_Properties'}- +>{'ISP_Data'}->{'City'}->{'text'} if exists $outer->{'Binary'}->{'Ser +ver_Properties'}->{'ISP_Data'}->{'City'}->{'text'}; my $server_country=$outer->{'Binary'}->{'Server_Properties +'}->{'ISP_Data'}->{'Country'}->{'text'} if exists $outer->{'Binary'}- +>{'Server_Properties'}->{'ISP_Data'}->{'Country'}->{'text'}; my $server_zipCode=$outer->{'Binary'}->{'Server_Properties +'}->{'ISP_Data'}->{'Zip_Code'}->{'text'} if exists $outer->{'Binary'} +->{'Server_Properties'}->{'ISP_Data'}->{'Zip_Code'}->{'text'}; my $server_region=$outer->{'Binary'}->{'Server_Properties' +}->{'ISP_Data'}->{'Region'}->{'text'} if exists $outer->{'Binary'}->{ +'Server_Properties'}->{'ISP_Data'}->{'Region'}->{'text'}; my $server_numSitesHosted=$outer->{'Binary'}->{'Server_Pro +perties'}->{'ISP_Data'}->{'Number_Hosted_Sites'}->{'text'} if exists +$outer->{'Binary'}->{'Server_Properties'}->{'ISP_Data'}->{'Number_Hos +ted_Sites'}->{'text'}; my $webServer=$outer->{'Binary'}->{'Server_Properties'}->{ +'ISP_Data'}->{'Web_Server_Info'}->{'text'}; $server_domainName=defined $server_domainName?$server_doma +inName eq ''?undef:$server_domainName=~m/^-$|^unknown$/i?undef:$serve +r_domainName:undef; $server_hostName=defined $server_hostName?$server_hostName + eq ''?undef:$server_hostName=~m/^-$|^unknown$/i?undef:$server_hostNa +me:undef; $server_ip=defined $server_ip?$server_ip eq ''?undef:$serv +er_ip=~m/^-$|^unknown$/i?undef:$server_ip:undef; $server_ISP=defined $server_ISP?$server_ISP eq ''?undef:$s +erver_ISP=~m/^-$|^unknown$/i?undef:$server_ISP:undef; $server_numBinaries=defined $server_numBinaries?$server_nu +mBinaries eq ''?'1':$server_numBinaries=~m/^-$|^unknown$/i?'1':$serve +r_numBinaries=~m/^.[0-9]+$/?$server_numBinaries:'1':'1'; $server_zipCode=defined $server_zipCode?$server_zipCode eq + ''?undef:$server_zipCode=~m/^-$|^unknown$/i?undef:$server_zipCode:un +def; $server_city=defined $server_city?$server_city eq ''?undef +:$server_city=~m/^-$|^unknown$/i?undef:$server_city:undef; $server_region=defined $server_region?$server_region eq '' +?undef:$server_region=~m/^-$|^unknown$/i?undef:$server_region:undef; $server_country=defined $server_country?$server_country eq + ''?undef:$server_country=~m/^-$|^unknown$/i?undef:$server_country:un +def; $server_numSitesHosted=defined $server_numSitesHosted?$ser +ver_numSitesHosted eq ''?'1':$server_numSitesHosted=~m/^-$|^unknown$/ +i?'1':$server_numSitesHosted=~m/^.[0-9]+$/?$server_numSitesHosted:'1' +:'1'; $webServer=defined $webServer?$webServer eq ''?'unknown':$ +webServer=~m/^-$|^unknown$/i?'unknown':$webServer:'unknown'; $server_country =~ s/\s/_/g if defined $server_country; my (%avDetections,%threatTypes,%classes); next if !defined $outer->{'Binary'}->{'Class'}; foreach(keys $outer->{'Binary'}->{'Class'}) { $classes{$_}=1 if $outer->{'Binary'}->{'Class'}->{$_}- +>{'text'} == 1; } foreach(keys $outer->{'Binary'}->{'Anti-Virus'}) { $avDetections{$_}->{'Signature_Version'}=$outer->{'Bin +ary'}->{'Anti-Virus'}->{$_}->{'Signature_Version'} unless $outer->{'B +inary'}->{'Anti-Virus'}->{$_}->{'Signature_Version'} eq ''; $avDetections{$_}->{'Engine_Version'}=$outer->{'Binary +'}->{'Anti-Virus'}->{$_}->{'Engine_Version'} unless $outer->{'Binary' +}->{'Anti-Virus'}->{$_}->{'Engine_Version'} eq ''; $avDetections{$_}->{'Threat_Name'}=$outer->{'Binary'}- +>{'Anti-Virus'}->{$_}->{'Threat_Name'} unless $outer->{'Binary'}->{'A +nti-Virus'}->{$_}->{'Threat_Name'} eq ''; } foreach(keys $outer->{'Binary'}->{'Type'}) { $threatTypes{$_}=1 if $outer->{'Binary'}->{'Type'}->{$ +_}->{'text'} == 1; } $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasDomainName> <http://cs.org/domain#$domainName> .\n| if d +efined $domainName; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasExploitType> <http://cs.org/exploitAttempted#$exploitTyp +e> .\n| if defined $exploitType; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasExploitDescription> "$exploitDescription" .\n| if define +d $exploitDescription; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/DTGstart> "$inspectedTime"^^<http://www.w3.org/2001/XMLSche +ma#dateTime> .\n| if defined $inspectedTime; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasIpAddr> <http://cs.org/ipv4#$ip> .\n| if defined $ip; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasHostName> <http://cs.org/host#$hostName> .\n| if defined + $hostName; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasURL> <http://cs.org/url#$referenceUrl> .\n| if defined $ +referenceUrl; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasFile> <http://cs.org/file#$fileName> .\n| if defined $fi +leName; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasFileURL> <http://cs.org/url#$fileURL> .\n| if defined $f +ileURL; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasFileMD5> <http://cs.org/MD5#$md5> .\n| if defined $md5; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasFileSize> "$fileSize"^^<http://www.w3.org/2001/XMLSchema +#integer> .\n| if defined $fileSize; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasPestName> <http://cs.org/pest_name#$pestName> .\n| if de +fined $pestName; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasWebServer> "$webServer" .\n| if defined $webServer; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasServerDomain> <http://cs.org/domain#$server_domainName> +.\n| if defined $server_domainName; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasServerHostName> <http://cs.org/host#$server_hostName> .\ +n| if defined $server_hostName; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasServerIpAddr> <http://cs.org/ipv4#$server_ip> .\n| if de +fined $server_ip; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasServerNumSites> "$server_numSitesHosted"^^<http://www.w3 +.org/2001/XMLSchema#integer> .\n| if defined $server_numSitesHosted; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasServerNumBinaries> "$server_numBinaries"^^<http://www.w3 +.org/2001/XMLSchema#integer> .\n| if defined $server_numBinaries; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasISP> "$server_ISP" .\n| if defined $server_ISP; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasServerZipCode> "$server_zipCode" .\n| if defined $server +_zipCode; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasServerCity> <http://cs.org/city#$server_city> .\n| if de +fined $server_city; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasServerRegion> <http://cs.org/city#$server_region> .\n| i +f defined $server_region; $triples .= qq|<http://cs.org/record#$recordCount> <http:/ +/cs.org/p/hasServerCountry> <http://cs.org/country#$server_country> . +\n| if defined $server_country; $triples .= qq|<http://cs.org/country#$server_country> <ht +tp://cs.org/p/hasServerRegion> <http://cs.org/city#$server_region> .\ +n| unless (!defined $server_country || !defined $server_region) || ex +ists $countries{$server_country}->{'regions'}->{$server_region}; $triples .= qq|<http://cs.org/region#$server_region> <http +://cs.org/p/hasServerCity> <http://cs.org/city#$server_city> .\n| unl +ess (!defined $server_region || !defined $server_city) || exists $cou +ntries{$server_country}->{'cities'}->{$server_city}; $triples .= qq|<http://cs.org/city#$server_city> <http://c +s.org/p/hasServerZipCode> <http://cs.org/city#$server_zipCode> .\n| u +nless (!defined $server_city || !defined $server_zipCode) || exists $ +countries{$server_country}->{'zipcodes'}->{$server_zipCode}; $countries{$server_country}->{'regions'}->{$server_region} +=1 if defined $server_region && defined $server_country; $countries{$server_country}->{'cities'}->{$server_city}=1 +if defined $server_city && defined $server_country; $countries{$server_country}->{'zipcodes'}->{$server_zipCod +e}=1 if defined $server_zipCode && defined $server_country; $triples .= qq|<http:cs.org/file#$fileName> <http://cs +.org/p/detectedAs> <http://cs.org/pest_name#$pestName> .\n| if (defin +ed $fileName && defined $pestName) && (!exists $avFiles{$pestName} || + $avFiles{$pestName} ne $pestName); $avFiles{$pestName}=$fileName if defined $fileName && +defined $pestName; foreach(keys %avDetections) { my $sig=$avDetections{$_}->{'Signature_Version'}; my $eng=$avDetections{$_}->{'Engine_Version'}; my $tn=$avDetections{$_}->{'Threat_Name'}; $tn =~ s/\s/_/g if defined $tn; $triples .= qq|<http://cs.org/record#$recordCount> + <http://cs.org/p/detectedBy> <http://cs.org/AV#$_> .\n|; $triples .= qq|<http://cs.org/record#$recordCount> + <http://cs.org/p/hasAvEngineVersion> "$eng" .\n| if defined $eng; $triples .= qq|<http://cs.org/record#$recordCount> + <http://cs.org/p/hasAvSigVersion> "$sig" .\n| if defined $sig; $triples .= qq|<http://cs.org/record#$recordCount> + <http://cs.org/p/detectedAs> <http://cs.org/avThreat_name#$tn> .\n| +if defined $tn; $triples .= qq|<http://cs.org/file#$fileName> <htt +p://cs.org/p/detectedBy> <http://cs.org/AV#$_> .\n| unless !defined $ +fileName || exists $avDetails{$fileName}->{'avDetection'}->{$_}; $triples .= qq|<http://cs.org/file#$fileName> <htt +p://cs.org/p/detectedAs> <http://cs.org/avThreat_name#$tn> .\n| unles +s (!defined $tn || !defined $fileName) || exists $avDetails{$fileName +}->{'avThreatName'}->{$tn}; $avDetails{$fileName}->{'avDetection'}->{$_}=1 if +defined $fileName; $avDetails{$fileName}->{'avThreatName'}->{$tn}=1 i +f defined $tn && defined $fileName; $avFiles{$tn}=$fileName if defined $fileName && de +fined $tn; } foreach(keys %threatTypes) { $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasThreatType> <http://cs.org/threatType#$_> .\n|; $triples .= qq|<http://cs.org/file#$fileName> <http:// +cs.org/p/hasThreatType> <http://cs.org/threatType#$_> .\n| unless !de +fined $fileName || exists $avDetails{$fileName}->{'avThreatType'}->{$ +_}; $avDetails{$fileName}->{'avThreatType'}->{$_}=1 if def +ined $fileName; } foreach(keys %classes) { $triples .= qq|<http://cs.org/record#$recordCount> <ht +tp://cs.org/p/hasThreatClass> <http://cs.org/threatClass#$_> .\n|; $triples .= qq|<http://cs.org/file#$fileName> <http:// +cs.org/p/hasThreatClass> <http://cs.org/threatClass#$_> .\n| unless ! +defined $fileName || exists $avDetails{$fileName}->{'avThreatClass'}- +>{$_}; $avDetails{$fileName}->{'avThreatClass'}->{$_}=1 if de +fined $fileName; } $similar{$domainName}='domain' if defined $domainName; $similar{$hostName}='host' if defined $hostName; $similar{$fileName}='file' if defined $fileName; $similar{$pestName}='pest_name' if defined $pestName; $similar{$server_domainName}='domain' if defined $server_d +omainName; $similar{$server_hostName}='host' if defined $server_hostN +ame; $recordCount++; } } $xml_converter = undef; print "FINISHED: $inFile\n"; return($triples); }

Replies are listed 'Best First'.
Re^4: dynamic number of threads based on CPU utilization
by BrowserUk (Patriarch) on Sep 26, 2012 at 16:42 UTC
    .I thought that since the procXml sub worked just fine, it would not be relevant to the discussion or potential solution.

    You were mostly right. The only relevance it has is that nowhere in that code do I see any sign of locking (the keyword 'lock' does not appear), which means that multiple threads are writing to a shared hash and there is nothing to prevent them from corrupting data through collisions.

    You may 'get away with it', but I wouldn't want to be responsible for when things go wrong.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong