|No such thing as a small change|
Re^2: Regex Extraction Helpby Flexx (Pilgrim)
|on Aug 09, 2012 at 17:02 UTC||Need Help??|
Well, I guess all the fields are variable, and what invaderzard meant, was to get that second field.
So I'd suggest this:
Now, what am I doing here?
First I say: Let's start at the beginning (^). This is important, since we can't exclude the possibility that the pattern repeats in one instance of $line.
Next, I say: give me zero or more non-semicolon characters ([^;]*), followed by exactly one semicolon (;).
Now our "cursor" would be in the second field, quasi. We say, well, there might or might not be some leading space (\s*). Then comes the data we want, that's why we use parentheses to capture it. What do we wanna capture? Well, again, anything not a semicolon ([^;]*?), but this time, non-greedily (using the *? quantifier.). Well, that's because we want any trailing space to go into the \s* that follows, instead of it being captured. Lastly, we need to require that the field is terminated by exactly one semicolon (;).
If you want to capture other fields as well, then a solution using split, like it's been suggested below is a more efficient way of doing it. If you want just a few fields of a long CSV record (which this seems to be, only demimited by semicola instead of kommas, then you also could expand on the regexp above, which might be a bit more performant than split. But I didn't really check that with benchmarks. Just an inkling I'd have, and very dependent on the length of the input, and the number of fields in it.