comment on

The patterns (...) must have the second part
(?:\.\w+)? resp. (?:\.\d+)?)

That's easy, just add the 'second part':

my $pat1 = qr '(\d-\d\w{2}(?:\.\w+)?)';
my $pat2 = qr '([A-Z]\d{2}(?:\.\d+)?)';
[download]

Concerning grep length, its purpose is simply to filter out empty (and undef) items from the list created by splitting the line on the /$pat1|$pat2/ regex.

hth, dave

Update: Concerning the title line, you say that it 'can be distinguished as it has parenthesis part at the end only'. My tentative regex provides for this. But you also say 'There are some "trash lines" with parenthesis'. Well, what if these "trash lines" actually end with a parenthesised item, eg:

Rabbit rabbit rabbit (rabbit!)

What I meant by 'Better regex' was something to replace the second .+ by something that matches the ID code(?) of your titles. The only two examples you give are B23-9 and A12-3, so perhaps /[A-Z]\d{2}-\d/ would work. Otherwise, adjust accordingly.

In reply to Re^3: Insert newline by Not_a_Number
in thread Insert newline by Anonymous Monk

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Think about Loose Coupling
	PerlMonks