Yeah, they're right that you're probably best off with a parser. However, if you really wanted to stick with regex, you could do the job more slowly and less elegantly with a line-by-line solution like this (make sure to improve before actual use):
while (<INPUT>) {
if (m#<level1 id=\"([^"]*)\"#) {
$id=$1;
print;
}
elsif (m#(\s*)<level2>#) {
print "$1<level2 id=\"$id\">\n";
}
else {
print;
}
}
Your regex doesn't work because you don't capture the id from the level1 tag when you get to the 2nd level2 tag below that tag. This works by capturing the last level1 id into a persistent variable, and replacing level2 tags when it finds them.
Hays