Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Research ideas

by Wodin (Acolyte)
on Feb 20, 2001 at 03:24 UTC ( #59526=perlquestion: print w/replies, xml ) Need Help??
Wodin has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks, I am met with something of a quandary. In just a few short days, my submission for a undergraduate summer research topic must be completed. Normally, this would not have been a problem, as I had a research project figured out, and was quite jazzed about working on it. However, I discovered yesterday that this work had already been done, and as such was not a valid topic for my proposal.

My question is the following -- is there some Perlish computer science topic which would be interesting and exciting? My former idea, creating a parser generator in <whisper>Python</whisper> similar to Parse::RecDescent was looking really interesting, but since there's already a module out there which implements that functionality, it's unlikely that I would get funding to build another.

Some background: I have an undergraduate junior's worth of theory, am fairly good at practical implementation, and know several different programming languages, of which VHLLs like Perl and Scheme are favorites. I have somewhat of an affinity towards projects dealing with parsing and parsers, but I'm one of those people who enjoys just about any type of computer science.

Any comments or help would be greatly appreciated.

Replies are listed 'Best First'.
Re (tilly) 1: Research ideas
by tilly (Archbishop) on Feb 20, 2001 at 04:49 UTC
    Well I had an idea for how to optimize NFA RE engines that I think has potential, but I am not planning on doing any time soon. OK, it isn't directly Perl, but it does involve parsing and if it works well it would be appropriate for including in the RE engines of languages like Perl and Python. Take a look at this proposed optimization, and see if trying to implement and do basic tests on that interests you.

    If it does, feel free to contact me privately. (Not that I have done anything with it other than some preliminary analysis...)

    For the record the point is that, at the cost of extra work at compilation, it should allow any RE which can be solved by a DFA to be solved by an NFA without the exponential backtracking disasters that are traditionally problems for NFA engines.

      You mean solved by the DFA right? Like egrep?

      Sounds like a good idea :)


        No, I meant solved by an NFA.

        Today NFAs can solve all problems that DFAs can solve, but it is possible to by accident write an NFA that will take several years to finish. My suggestion allows you to convert an NFA into an equivalent NFA which backtracks at fewer points. However it remains an NFA, and where an NFA would find a different match than a DFA, it will find the match that an NFA would find, and not the one that a DFA would. (Thereby allowing it to be used in an NFA engine without changing the behaviour.)

        Now if the RE is one that you could use a DFA to solve, then you can convert the NFA into an NFA with no backtracking anywhere. However it still retains NFA characteristics. For instance with alternation it will prefer to match the first alternative in the sequence, and won't match the second if that allows a longer overall match.

        Unfortunately optimizing the NFA that far may result in a combinatorial explosion from n states to n!, but since the optimization goes in steps, you can just optimize until you hit some threshold, and then stop. (Or you could run, profile, incrementally optimize, wash, rinse, and repeat.)

      Are you sure you know the theory?
        I'm positive that I know the theory, and that my suggestion will work. Figuring things like this out is what I studied math for. :-)
Re: Research ideas
by chromatic (Archbishop) on Feb 20, 2001 at 04:59 UTC
    I'd like to see a module that can take a code reference and dump out a rough approximation of the code that created it.

    It would end up something like Data::Dumper or Storable, but it would allow me to serialize highly complex data structures.

    I guess the tricky bits are closures and code with eval EXPR bits in them. It's still worth some research, in my opinion.

      I believe Damian Conway is working on something like this. Doing incredibly weird things with closures and such that are far beyond my understanding. Definately worth the effort, however.


Re: Research ideas
by Trinary (Pilgrim) on Feb 20, 2001 at 03:34 UTC
    one of my favorites was the transitions from directed graphs interpreted as state machines to regular expressions. I did some translation back and forth in some undergrad theory classes, but never got to deal with doing things like traversal problems while the data is in regexp form. From a pure theory standpoint it offers an interesting way to use the solution of a known set of problems (graph coverage, etc) to be effective in a new space (operating on regexps). I wish I'd explored them further.



A Research idea
by gryng (Hermit) on Feb 20, 2001 at 03:28 UTC

    My slant is towards AI -- so my suggestion is genetic programming. That is, a program that creates programs to solve problems. I'm not sure if this is the kind of thing you are looking for, or allowed to do, but it's very interesting (at least to me :) ).

    Good luck :)


      If "originality" is your goal, lots of experiments with GP have already been done. There should be some room for more, however. Some resources to examine first (that don't necessarily stay within the bounds of Perl) are:

      As an interesting and strange coincidence, I just started reading a book titled The Monk in the Garden by Robin Marantz Henig. If you didn't guess from the title, it's about Gregor Mendel.

      I also find AI very interesting. During an expert systems uni course a while ago we looked at several very 'budget' expert system generation utilities such as 'First Class', 'Level 5' and probably a better one 'Prolog' etc.. they each cost tens of thousands of dollars in most cases, but are very primitive for tackling REAL issues.

      Perhaps expert system development could be an option. There are several AI modules around but I doubt any from what I have seen can provide full expert system generation or perhaps trained and untrained neural network creation, jNeural looks like it might one day - but not yet.

Re: Research ideas
by chorg (Monk) on Feb 20, 2001 at 04:00 UTC
    You like parsers and the like? How about working on the B::CC backend, and making it work? That would be great!!
    "Intelligence is a tool used achieve goals, however goals are not always chosen wisely..."

      I like them, but I don't have a lot of experience with them. A non-grammar based parser for Quake 3 logfiles is the extent of my working parsers so far. Now, I've come a decent way since then, but I'm not the guy to fix B::CC -- I don't know C, for one thing.

      Thanks for the interesting tip though -- reading the POD for it was fun.

        How about a pure perl XML parser? This is from Simon Cozens weekly summary of the p5p list for the week of 2/13 to 2/19. You can find a copy of this at

        Parsing XML

        One of the targets for Perl 5.8 is to speed up XML parsing, but nobody really has any idea how to do that. The current XML::Parser uses an external library, which means that a lot of speed is lost in flapping around in XS. The idea was mentioned of a pure-Perl version, which we would then be able to ship in core. Jarkko says: Okay, I lied. I do have an opinion: relying on an external library to do XML parsing is weird. expat is nice and is a de facto standard, and reinventing the wheel that has already been extensively invented and debugged is silly in the extreme -- but we are, after all, supposed to be The Text Processing Language.

        Matt Sergeant, as ever, had good XML suggestions:

        If you do that, I suggest/recommend at least doing it the Python way - by letting XML experts (i.e. a SIG) discuss what would be the best way to do it. Note also that ActivePerl ships with XML::Parser, though in a few months it may not necessarily be the best option any more.

        He also pointed out that:

        The speed problem is that expat is basically a callback/event based parser, so you have a storm of events crossing the XS/Perl barrier, meaning that you're constantly building SV's. Orchard can get around this by doing the parsing to a tree structure in C. (Note that Orchard is also based on expat). Or it can also do SAX based event passing, but again that's about as slow as XML::Parser.

        Doing it all in Perl is possible, but not entirely trivial to get exactly right. XML has a lot of annoying nuances that were left in from SGML, mostly to do with DTDs. And while I don't use most of the annoying features, I think I'd be upset if the core Perl started shipping an "XML" parser that wasn't fully XML compliant.

        I wonder if the perl-xml mailing list could go chew on that and let us know the best way to proceed.

        End of quote...


Re: Research ideas
by BrentDax (Hermit) on Feb 20, 2001 at 14:07 UTC
    I've always thought it would be interesting to see a language with hybrid Perl/C++ish syntax. For example, you actually have specifiers like class, but you keep Perl things like (roughly) typeless variables and different funny symbols for different ways of organizing data. Maybe you could even make the funny symbols extensible, so you could, for example, define + to mean a tree (so +foo is a tree named foo). The whole thing would be parsed and converted into Perl for execution, so you wouldn't have to re-write everything. I'm probably just insane, but it's an idea...

    --Brent Dax

    @HPAJ=split("", "rekcaH lreP rentonA tsuJ"); print reverse @HPAJ; #sucky but who cares?
Re: Research ideas
by 2501 (Pilgrim) on Feb 20, 2001 at 04:38 UTC
    How about work with natural language?
    Even with all sorts of modules, handling natural language is something huge. You could set a small goal to improve a piece of functionality in that area. For example, search engines. I put in the phrase "Is there a site where I can obtain pearl modules?"
    work on code which through a bit of intelligence and a bit of brute force realizes I typo'd and what I want is computer related PERL modules and not a module for a pearl or a module made of pearl.
    There is so much that can be done with natural languages it can make ones head spin just trying to grok it all. The only problem you might run into is biting off more then you can chew for the scope of the project.

Re: Research ideas
by Wodin (Acolyte) on Apr 08, 2001 at 13:15 UTC

    Well, thank you for all of the nice suggestions -- I corresponded with tilly about his, and wrote up a proposal about it. Sadly, it came to a rather untimely end, as I was denied funding -- I guess they want to refocus the summer research grants on harder sciences this year. Thanks to those of you who put time and effort into helping come up with ideas. It was greatly appreciated, and there's the possibility I'll still tackle some of the stuff sometime in here.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://59526]
Approved by root
[Corion]: I just found out that in $current_top_prior ity_project , the part I am in is not even in the top three worries. That's somewhat bad, not because I'm happy with being a top worry but because that means that I don't even know how bad the rest ...
[Corion]: ... of the situation is :-)
[marto]: it's good to know that things can always get worse :P
[hippo]: Ignorance can be bliss
[Corion]: hippo: Yeah - I'll just avoid the project lead :)
[Corion]: marto: Yeah, it helps with the perspective :-D

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (9)
As of 2017-07-26 08:11 GMT
Find Nodes?
    Voting Booth?
    I came, I saw, I ...

    Results (385 votes). Check out past polls.