Automated Translation of Java to Python
Posted by troy Thu, 15 Feb 2007 07:58:48 GMT
I've written a tool to automatically translate Java source code to Python source code. The tool is useful, and it's already working for me as I intended. It's called java2python (clever, no?) and you can download it here.
Let me back up a bit and explain the motivation behind this. I'm the author and sole maintainer of the Python port of the Interactive Brokers API (IbPy). IB provides a default/reference Java implementation for UNIX and MacOS. This reference implementation is straight forward: it contains a Thread subclass that reads from a socket, an associated class for writing to the socket, plus a few other support classes. Conceptually pretty simple, and the initial port was actually easy (once I figured out the difference between writing data to a socket in Java and writing data to a socket in Python).
That was 5 years ago. In those 5 years, I've refactored the Python code quite a bit, and IB has enhanced their code and their network protocol significantly as well. I dropped out of trading for a while, and Life, the Universe, and Everything have conspired to keep me from tending to the port as much as I should. I have kept up with the code in spare moments, but haven't been able to put together a tested release in quite a while. I devoted some time recently to working on the port and ran into trouble. I downloaded the latest release from IB and started to go through their source, line by line, matching it against my code. In reading their source, I realized how complex and hopeless the whole thing had become. Complex is bad, but even worse, it was only thru force of will that I would ever finish. That's doom for a project, and that's the lowest you can rank in engineering software.
So I stepped back for a bit and thought about the problem. What was consuming so much time was translating code using my (increasingly older) noodle. Maintaining the port in the manner I was had no promise of ever getting better.
Now, the IB Java code is pretty reasonable. It's not tricky, it's fairly consistent, but it is a bit verbose for what it does (it's Java, after all). And there's lots and lots and lots of conditional logic, that if not perfect, breaks the communication between the trading application (the client) and the trading platform (the server). I toyed with the idea of regex-ing the snot out of their source to produce something not unlike Python code, but ultimately rejected the idea as untenable and fraught with complexity. And complexity is what's to be avoided.
Then I took a serious look at ANTLR. I knew it had strong support for both Java and Python, and I believed that most (if not all) of the bits I needed would already be there. I read the documentation, examples, and the various articles I could find. As luck (or actually, the hard work of many other developers) would have it, the ANTLR distribution includes an example grammar for lexing and parsing Java source. Given this grammar, and the Python script examples, I was able to print out an abstract syntax tree of all the IB Java source. Half the work done, and I didn't have to lift a finger!
But I had no earthly idea what to do with an AST. I could walk it recursively, printing out its content, and not much else. So I did what any good programmer does when he or she doesn't know how to solve a problem: I tried to ignore it. It wouldn't go away, of course, so I sought help from my good friend Bob. We chatted a bit about it, and he pointed me to an article about another feature of ANTLR, tree walker grammars. I had already skimmed the article, but reread it for comprehension and found the answers for which I was looking.
Like input (lexer and parser) grammars, a tree grammar is used to describe and generate a class for processing some input. The difference is that tree grammars are used to generate code to walk an AST. The ANTLR implementation allows code to be specified directly in the tree grammar, which provides a way to hook into the AST walk and do interesting things.
That was two weeks ago, and now I have a tool that works. It's not perfect, but it already translates the entire IB reference implementation without syntax errors. I have to tackle the problem of matching semantics between the two languages, but I think the more difficult problem has been solved. Most importantly, I have something that is repeatable, and is no more difficult to use than typing "make". Take that, demons of late nights past!
I've written java2python with the idea that it should provide a high degree of customization to the generation process. It allows for multiple, cumulative configuration modules, which means you can have a configuration for an entire translation project, and also have configurations for individual modules.
Let me add a few more waffles before concluding. Yes, I know Python is not Java. Yes, I know that this tool doesn't translate the meaning of the input source code. Yes, the tool does not produce idiomatic Python. And yes, I know the tool isn't even close to perfect. But even with all of those problems, I know this is better than what I was doing.
As always, I'm interested in your feedback if you use my code, or in this case, even if you don't use my code. Feel free to drop me a note with any comments you have. You can reach me at troy@gci.net.
Your Comments.
Spread the word.
Round Rock River supports RSS (Real Simple Syndication), and Trackbacks from other blogs.
Any plans to make a Python2Java equivalent?
I think that having the translation in both directions, would increase the corectness and testability of each version, and about the "visibility" of the project not to mention :).
Thank you,
D.
no plans for python2java. i don't need such a thing, and i'm not sure it would even be possible. if you're inclined to write it, please keep me posted!
A video tutorial of python2java might be quite neat, especially to show interested users just how easy it is to get going. If anyone is interested in the idea, I'd help to make the video and we'd host it for free (and the tools are open-source) over at ShowMeDo. Just throwing this in as an idea for some extra publicity for you :-) Regards, Ian Ozsvald (co-founder showmedo.com)
Given Python's runtime typing, direct conversion of Python source to Java source would be difficult if not impossible. Given the available of Jython, why bother anyway?
Pretty neat. Python is sort of a superset of Java so it seems like it should translate reasonably well. I wonder how it'll handle integer overflow.
How far do you think you can go with this?
Do you think it might be possible to convert any Java library to Python with little or no manual intervention?
Don,
i don't think that complex translations could be totally automated -- not without a lot more work on the translator.
but i'm not really after lexical analysis; the tool is working for me exactly like i need it to work. there's support for mapping java types to python types, there's support for changing some common java statements into the equivalent python. most important to me, there's support for per-file imports and per-file regex substitutions applied to the output.
what did you have in mind? do you need to translate a java library without intervention?
Seems the python2java translation could possibly be done with some of the cool stuff the folks at PyPy are doing with type inference and how they translate to C or CLI. Maybe one could have a backend to java code. Just a thought.
Troy:
'what did you have in mind? do you need to translate a java library without intervention?'
None in particular. I just briefly fantasized about being able to use any Java library from Python without using Jython.