Help:getting parts of the strings from a file into managable variables

my_perl has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: getting parts of the strings from a file into managable variables by Roy Johnson (Monsignor) on Nov 11, 2004 at 17:24 UTC
I think this problem is a little advanced for someone "very new" to Perl. Do you understand the basic data structures of perl (`perldoc perldata`)? Do you have a fair understanding of regular expressions (`perldoc perlretut`)? How about references (`perldoc perlreftut`)? If you have a pretty good grasp of those concepts, you should be able to take a stab at this problem. Generally, it's recommended that you use a module to parse XML type data files, but a quick and dirty solution might go like this: use strict; use warnings; my %info; my $thisuser; while (<DATA>) { my ($var, $val) = /<([^>]+)>([^<]+)/; if ($var eq 'UserID') { $thisuser = $val; } else { $info{$thisuser}{$var} = $val; } } use Data::Dumper; print Dumper(\%info), "\n"; __DATA__ <UserID>46786<UserID> <start>2004-10-21TO09:57:25Z</start> <dev>Some Text</dev> <var1>some string</var1> <var2>some string</var2> <USerID>57864</UserID> <start>2004-10-25TO09:57:25Z</start> <dev>Some Text</dev> <var1>some string</var1> <UserID>46786<UserID> <var3>some string</var3> <var4>some string</var4> <UserID>98766</UserID> <start>2004-10-21TO09:57:25Z</start> <dev>Some Text</dev> <var1>some string</var1> <var2>some string</var2> <var5>some string</var5> <var6>some string</var6> <USerID>57864</UserID> <var4>some string</var4> <var6>some string</var6> [download] Caution: Contents may have been coded under pressure.	[reply] [d/l] [select]
Re: getting parts of the strings from a file into managable variables by jZed (Prior) on Nov 11, 2004 at 17:19 UTC
This format is maddeningly close to XML. I would suggest running a regex to to turn it into real XML and then using XML tools to parse it. Try something like s~<UserID>(\d+)</UserID>~</record><record UserID="$1">~g; Then lop off the first </record> and add an enclosing tag for the entire set and you should have real XML.	[reply]
Re: getting parts of the strings from a file into managable variables by Eimi Metamorphoumai (Deacon) on Nov 11, 2004 at 17:29 UTC
Your data looks a little suspect (lacking a few /, and inconsistent case), but this parses it and builds a nested hash from it. my $user; my %uservariables; while(<DATA>){ chomp; my ($key, $value) = m{^<([^>]+)>(.*)<}i or die $_; $key = lc $key; #to adjust for case differences if ($key eq "userid"){ $user = $value; next; } $uservariables{$user}->{$key}=$value; } use Data::Dumper; print Dumper(\%uservariables); __DATA__ <UserID>46786<UserID> <start>2004-10-21TO09:57:25Z</start> <dev>Some Text</dev> <var1>some string</var1> <var2>some string</var2> <USerID>57864</UserID> <start>2004-10-25TO09:57:25Z</start> <dev>Some Text</dev> <var1>some string</var1> <UserID>46786<UserID> <var3>some string</var3> <var4>some string</var4> <UserID>98766</UserID> <start>2004-10-21TO09:57:25Z</start> <dev>Some Text</dev> <var1>some string</var1> <var2>some string</var2> <var5>some string</var5> <var6>some string</var6> <USerID>57864</UserID> <var4>some string</var4> <var6>some string</var6> [download]	[reply] [d/l]
Re^2: getting parts of the strings from a file into managable variables by my_perl (Initiate) on Nov 12, 2004 at 18:01 UTC
Hi, thanks a lot for response, it is working great. What do i need to change if i have spaces before strings, and number of spaces is not constant? Thanks again :) Aida	[reply]
Re^3: getting parts of the strings from a file into managable variables by Eimi Metamorphoumai (Deacon) on Nov 12, 2004 at 18:26 UTC
Hmmm, you might try changing the regexp to something like `my ($key, $value) = m{^\s<([^>]+)>\s(.*)<}i;` [download] Although I'm starting to agree with the others that a real XML parser might be the way to go.	[reply] [d/l]
Re^2: getting parts of the strings from a file into managable variables by my_perl (Initiate) on Nov 19, 2004 at 17:42 UTC
Hi, how would i access these variables? foreach userid i would like to print only var1 , its value, and var2 , its value. foreach $key (keys %uservariables) { ??????? } THanks	[reply]
Re^3: getting parts of the strings from a file into managable variables by Eimi Metamorphoumai (Deacon) on Nov 19, 2004 at 17:54 UTC
Something like `for my $key (keys %uservariables){ print "User $key has var1: $uservariables{$key}->{var1}, var2: $user +variables{$key}->{var2}"; }` [download]	[reply] [d/l]
Re^4: getting parts of the strings from a file into managable variables by my_perl (Initiate) on Nov 19, 2004 at 19:01 UTC
Re: Help:getting parts of the strings from a file into managable variables by tmoertel (Chaplain) on Nov 11, 2004 at 17:37 UTC
(Update: Noticed that UserIDs could repeat; changed code to merge values for duplicate UserIDs.) Here's one way of doing it that stores the data as a hash of hashes: #!/usr/bin/perl use warnings; use strict; my %user_data; my $current_user; while (<DATA>) { if (my ($elem, $content) = m\|^ <([^>]+)> (.) </\1> \|x) { $current_user = $content if $elem eq "UserID"; $user_data{$current_user}{$elem} = $content; } else { print "Bad line $.: $_"; } } use Data::Dumper; print Dumper(\%user_data); # $VAR1 = { # '98766' => { # 'var6' => 'some string', # 'var1' => 'some string', # 'dev' => 'Some Text', # 'UserID' => '98766', # 'var2' => 'some string', # 'var5' => 'some string', # 'start' => '2004-10-21TO09:57:25Z' # }, # '57864' => { # 'var6' => 'some string', # 'var1' => 'some string', # 'dev' => 'Some Text', # 'var4' => 'some string', # 'UserID' => '57864', # 'start' => '2004-10-25TO09:57:25Z' # }, # '46786' => { # 'var3' => 'some string', # 'var1' => 'some string', # 'dev' => 'Some Text', # 'var4' => 'some string', # 'UserID' => '46786', # 'var2' => 'some string', # 'start' => '2004-10-21TO09:57:25Z' # } # }; __DATA__ <UserID>46786</UserID> <start>2004-10-21TO09:57:25Z</start> <dev>Some Text</dev> <var1>some string</var1> <var2>some string</var2> <UserID>57864</UserID> <start>2004-10-25TO09:57:25Z</start> <dev>Some Text</dev> <var1>some string</var1> <UserID>46786</UserID> <var3>some string</var3> <var4>some string</var4> <UserID>98766</UserID> <start>2004-10-21TO09:57:25Z</start> <dev>Some Text</dev> <var1>some string</var1> <var2>some string</var2> <var5>some string</var5> <var6>some string</var6> <UserID>57864</UserID> <var4>some string</var4> <var6>some string</var6> [download] When parsing files, it's a good idea to detect and report errors. A few of your sample lines, for example, had opening and closing tags that didn't match. I fixed them in my example data, but only after the error-reporting code caught them. Cheers, Tom Tom Moertel* : Blog / Talks / CPAN / LectroTest / PXSL / Coffee / Movie Rating Decoder	[reply] [d/l]
Re^2: Help:getting parts of the strings from a file into managable variables by my_perl (Initiate) on Nov 11, 2004 at 22:41 UTC
Hi, THank you so much for a prompt response :) I cut and pasted your response, and it does not work for me. It gives result that every line is bad. and VAR1{} any idea why? Thanks one more time.	[reply]
Re^3: Help:getting parts of the strings from a file into managable variables by tmoertel (Chaplain) on Nov 11, 2004 at 23:27 UTC
To make my code easier to read, I indent it by four spaces when I quote it. (That way, it doesn't get lost in the surrounding flow of text.) As a result, you'll need to unindent it (or at least the `__DATA__` portion) before running it. Try processing the script through this one liner to remove the leading four spaces: `perl -i.bak -pe's/^ //' the-script.pl # unindent the-script.pl` [download] That ought to do it. Cheers, Tom Tom Moertel : Blog / Talks / CPAN / LectroTest / PXSL / Coffee / Movie Rating Decoder	[reply] [d/l] [select]
Re^4: Help:getting parts of the strings from a file into managable variables by my_perl (Initiate) on Nov 12, 2004 at 17:53 UTC
Re^5: Help:getting parts of the strings from a file into managable variables by tmoertel (Chaplain) on Nov 12, 2004 at 18:03 UTC


Your skill will accomplish what the force of many cannot
	PerlMonks