I think this problem is a little advanced for someone "very new" to Perl. Do you understand the basic data structures of perl (perldoc perldata)? Do you have a fair understanding of regular expressions (perldoc perlretut)? How about references (perldoc perlreftut)? If you have a pretty good grasp of those concepts, you should be able to take a stab at this problem.
Generally, it's recommended that you use a module to parse XML type data files, but a quick and dirty solution might go like this:
use strict;
use warnings;
my %info;
my $thisuser;
while (<DATA>) {
my ($var, $val) = /<([^>]+)>([^<]+)/;
if ($var eq 'UserID') {
$thisuser = $val;
}
else {
$info{$thisuser}{$var} = $val;
}
}
use Data::Dumper;
print Dumper(\%info), "\n";
__DATA__
<UserID>46786<UserID>
<start>2004-10-21TO09:57:25Z</start>
<dev>Some Text</dev>
<var1>some string</var1>
<var2>some string</var2>
<USerID>57864</UserID>
<start>2004-10-25TO09:57:25Z</start>
<dev>Some Text</dev>
<var1>some string</var1>
<UserID>46786<UserID>
<var3>some string</var3>
<var4>some string</var4>
<UserID>98766</UserID>
<start>2004-10-21TO09:57:25Z</start>
<dev>Some Text</dev>
<var1>some string</var1>
<var2>some string</var2>
<var5>some string</var5>
<var6>some string</var6>
<USerID>57864</UserID>
<var4>some string</var4>
<var6>some string</var6>
Caution: Contents may have been coded under pressure.
| [reply] [d/l] [select] |
This format is maddeningly close to XML. I would suggest running a regex to to turn it into real XML and then using XML tools to parse it. Try something like s~<UserID>(\d+)</UserID>~</record><record UserID="$1">~g; Then lop off the first </record> and add an enclosing tag for the entire set and you should have real XML. | [reply] |
Your data looks a little suspect (lacking a few /, and inconsistent case), but this parses it and builds a nested hash from it.
my $user;
my %uservariables;
while(<DATA>){
chomp;
my ($key, $value) = m{^<([^>]+)>(.*)<}i or die $_;
$key = lc $key; #to adjust for case differences
if ($key eq "userid"){
$user = $value;
next;
}
$uservariables{$user}->{$key}=$value;
}
use Data::Dumper;
print Dumper(\%uservariables);
__DATA__
<UserID>46786<UserID>
<start>2004-10-21TO09:57:25Z</start>
<dev>Some Text</dev>
<var1>some string</var1>
<var2>some string</var2>
<USerID>57864</UserID>
<start>2004-10-25TO09:57:25Z</start>
<dev>Some Text</dev>
<var1>some string</var1>
<UserID>46786<UserID>
<var3>some string</var3>
<var4>some string</var4>
<UserID>98766</UserID>
<start>2004-10-21TO09:57:25Z</start>
<dev>Some Text</dev>
<var1>some string</var1>
<var2>some string</var2>
<var5>some string</var5>
<var6>some string</var6>
<USerID>57864</UserID>
<var4>some string</var4>
<var6>some string</var6>
| [reply] [d/l] |
Hi, thanks a lot for response, it is working great.
What do i need to change if i have spaces before strings, and number of spaces is not constant?
Thanks again :)
Aida
| [reply] |
Hmmm, you might try changing the regexp to something like
my ($key, $value) = m{^\s*<([^>]+)>\s*(.*)<}i;
Although I'm starting to agree with the others that a real XML parser might be the way to go. | [reply] [d/l] |
Hi,
how would i access these variables?
foreach userid i would like to print only var1 , its value, and var2 , its value.
foreach $key (keys %uservariables) {
???????
}
THanks
| [reply] |
for my $key (keys %uservariables){
print "User $key has var1: $uservariables{$key}->{var1}, var2: $user
+variables{$key}->{var2}";
}
| [reply] [d/l] |
#!/usr/bin/perl
use warnings;
use strict;
my %user_data;
my $current_user;
while (<DATA>) {
if (my ($elem, $content) = m|^ <([^>]+)> (.*) </\1> |x) {
$current_user = $content if $elem eq "UserID";
$user_data{$current_user}{$elem} = $content;
}
else {
print "Bad line $.: $_";
}
}
use Data::Dumper;
print Dumper(\%user_data);
# $VAR1 = {
# '98766' => {
# 'var6' => 'some string',
# 'var1' => 'some string',
# 'dev' => 'Some Text',
# 'UserID' => '98766',
# 'var2' => 'some string',
# 'var5' => 'some string',
# 'start' => '2004-10-21TO09:57:25Z'
# },
# '57864' => {
# 'var6' => 'some string',
# 'var1' => 'some string',
# 'dev' => 'Some Text',
# 'var4' => 'some string',
# 'UserID' => '57864',
# 'start' => '2004-10-25TO09:57:25Z'
# },
# '46786' => {
# 'var3' => 'some string',
# 'var1' => 'some string',
# 'dev' => 'Some Text',
# 'var4' => 'some string',
# 'UserID' => '46786',
# 'var2' => 'some string',
# 'start' => '2004-10-21TO09:57:25Z'
# }
# };
__DATA__
<UserID>46786</UserID>
<start>2004-10-21TO09:57:25Z</start>
<dev>Some Text</dev>
<var1>some string</var1>
<var2>some string</var2>
<UserID>57864</UserID>
<start>2004-10-25TO09:57:25Z</start>
<dev>Some Text</dev>
<var1>some string</var1>
<UserID>46786</UserID>
<var3>some string</var3>
<var4>some string</var4>
<UserID>98766</UserID>
<start>2004-10-21TO09:57:25Z</start>
<dev>Some Text</dev>
<var1>some string</var1>
<var2>some string</var2>
<var5>some string</var5>
<var6>some string</var6>
<UserID>57864</UserID>
<var4>some string</var4>
<var6>some string</var6>
When parsing files, it's a good idea to detect and report errors. A
few of your sample lines, for example, had opening and closing tags
that didn't match. I fixed them in my example data, but only after the
error-reporting code caught them.
Cheers, Tom
| [reply] [d/l] |
Hi,
THank you so much for a prompt response :)
I cut and pasted your response, and it does not work for me. It gives result that every line is bad. and
VAR1{}
any idea why?
Thanks one more time.
| [reply] |
perl -i.bak -pe's/^ //' the-script.pl # unindent the-script.pl
That ought to do it.
Cheers, Tom
| [reply] [d/l] [select] |