Hi

Sourcefilter are notorious because only Perl can parse Perl, so attempts to manipulate embedded Perl code is doomed.

BUT

Did anyone ever attempt to parse and translate a different language with easily defined consistently specified syntax?

Like JS or LISP or a pedantic Perl sub dialect?

Cheers Rolf

( addicted to the Perl Programming Language)

) or Perl6, Ruby or Python provided that they have such a pedantic syntax (?).

updates

  • maybe better forget I mentioned Ruby -> how-to-parse-ruby

  • corrected typo hdb++
  • Comment on sourcefilter with complete parser?
    Re: sourcefilter with complete parser?
    by roboticus (Canon) on Dec 10, 2013 at 20:10 UTC
        Hey Roboticus,

        Talking about translation

        Thanks for your reply and I perfectly know what you are talking about. Little details can produce an unsupportable overhead.

        I spend much time meditating over JS and there are indeed many problems not easily solved.

        BUT I learned not only a lot about JS but also about Perl.

        Saying so I have my personal toy project patching B::Deparse to generate JS and eLisp from a limited Perl dialect. Avoiding the mental overhead to always switch between different keywords and syntax already pays off for me. (like sub {} <->  function () {...} for lambdas or my <-> var for lexicals.).

        This is far away from a full automated translation of the whole language but good enough for me, since I am aware of the different scoping rules of lexicals and I can check against possible conflicts. =)

        But the intention of my question was different

        When digging into Ruby I'm always surprised to be surrounded by Perl idioms hiding behind a pretty syntax, to an extent that Ruby feels like a Perl-dialect + a prebuild object system.

        (It's somehow fascinating and disturbing hearing people praising Ruby-idioms taken from Perl and simultaneously bashing the source of that features.)

        This has lead me to the question if anyone ever tried to use sourcefilters to completely tokenize and translate a different language ( not necessarily more than 90% compatible to an existing one) and then to evaluate the generated Perl-code.

        Cheers Rolf

        ( addicted to the Perl Programming Language)

        Hi,

        Rolf, to start with, I should say that I have never done anything with source filters and that I barely know what it is all about (having read just a couple of articles on the subject). So my answer might be off-topic, sorry for that if it is.

        I just want to say that I have had an experience similar to Roboticus, although certainly much less extensive. We have a major application (35 million customers, many thousands of programs) running under VMS. We were studying the possibility of migrating it to Unix/Oracle (because support for VMS is likely to end within a few years, perhaps before 2020). The main language used for this application (especially for all the functional parts and the database access part) runs under both VMS and Unix, so that most of the work would be recompilation of the sources and any adaptations needed, large project, but it looks feasible. But probably a quarter to a third of the programs are scripts are written in DCL, which is a VMS scripting language, more or less the equivalent of shell script under Unix. These are used to launch multiple processes in parallel, to synchronize processes, transfer, copy or sort files, etc. The immense majority of these DCL scripts would have to be translated into shell script (or even possibly Perl script in some cases).

        I participated to a "phase-0" pilot proof-of-concept automatic DCL2shell translating effort, and we were able to produce automatically shell equivalent of our DCL programs within a few weeks. But we knew that we had selected relatively easy cases. Therefore, a second phase (still proof-of-concept) was launched to get into the more complicated things. I was not directly involved in this second phase, so that I can only say what was reported to me: once you get into the more gory details, it gets really very complicated. There are a number of things that just can't be processed automatically and need complete refactoring. And that was only phase 2 of proof of concept study.

        The cost of the project, if it were to be launched, was estimated to be in the order of 15 to 20 million euros. A very big amount, indeed, but probably much less than migrating to a completely different system (that would probably cost 3 to 5 times as much). We have some extra time before deciding to go for it or not, but at least we have an idea on how difficult and costly it would be.

        My point was just to broadly confirm the general idea of Roboticus's post: translating 80% of the code is relatively easy, the next 15% are getting really hairy, and the last 5% might take more time than all the rest together.

        OK, I am not talking about a simple program, but about a very complex application with thousands of programs. What you are trying to do might be simpler (hopefully it is), but it is definitely not a simple task.

          Thanks, even if you replied to the wrong person. =)

          Please let me point out that 100% translation is always possible if you don't care about performance.

          The most brutal force (if this word exists) is done by emulating the CPU of a processor supporting the language.

          I'm not interested to emulate or translate more than 80% of a language even if it might be quite easy with some LISP dialects were most of the complicated constructs are just implemented in some core stuff.

          Let me give you an example: there is a very subtle difference how JS and Perl numify strings.

          Frankly I don't care to support software which relies on such differences. It even throws warnings.

          DB<116> use warnings;0+"2" => 2 DB<117> 0+"ss2" => 0 DB<118> 0+"3ss2" => 3 DB<119> use warnings;0+"3ss2" Argument "3ss2" isn't numeric in addition

          >>> a="ss2"; 0+parseInt(a) NaN >>> a="3ss2"; 0+parseInt(a) 3

          Of course it's possible to wrap every variable in numeric context with a function which numifies the Perl way.

          Such a code would be incredibly slow.

          But only supporting the "normal" scalar case were strings are to be treated like numbers is far faster.

          Just substract zero:

          >>> a="2"; 0+a "02" >>> a="2"; 0+a-0 2

          My theory is that it's pretty feasible to create a new language which is a subset of Perl5 or Perl6 and creates acceptable JS.

          "Acceptable" doesn't mean 100% compatible. Neither different versions of Perl nor JS are ever 100% compatible.

          perlito is already quite good at this.

          Cheers Rolf

          ( addicted to the Perl Programming Language)

    Re: sourcefilter with complete parser?
    by choroba (Abbot) on Dec 10, 2013 at 21:14 UTC
      XML::XSH2 defines its syntax and translates it into Perl (see also XML::XSH2::Compile. Also, PML-TQ has a language (two of them, in fact, one for queries and one for reports), both of them are parsed and translated either to Perl or SQL.
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        Thanks for the links very interesting.

        But these are rather external DSLs for embedded scripts and not source-filters (or am I missing something?!?)

        I'm sure they could be somehow used as source-filters but I'm interested to know if anyone every tried to do this.

        Cheers Rolf

        ( addicted to the Perl Programming Language)

          external DSLs
          True. I probably did not understand (or read) your question properly.
          لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ