I tried putting the forking higher in the loop structure: it does run more procesess but it only returns the last combination of the loop.
Well of course it does, because you overwrite $temp_out everytime. Or seen another way, each child will only finish once, so you can only expect one value out of it. One way you could make this work is by having the children return arrays instead of strings.
And by the way, your $temp_out isn't defined anywhere, so with strict this means that your program doesn't compile. Without strict, this is a global(ish) variable, so since every child writes to it this makes your script a little confusing. Of course you can check if with ForkManager each child has its own copy of the variable, but it's just better practice to make it clear that they each have their own version. So create a new array for each child:
for my $outer (1..10)
$fork_manager->start() and next;
for my $inner (1..10)
push @output, "[$outer] [$inner]";
Edit: also, do think about benchmarking the result, the data sent from child to parent goes to the hard drive (where accesses are slower than CPU operations), so depending on how heavy the process on the inner loop is, using all available cores might not be the best solution (if only because they will access the same drive, so one will have to wait for the other to complete before writing).