C to Java Translation. Automatic, Complete, Correct. Free for Open-Source.
mtsystems.chMain author here. Let me know if you have any questions. I’d be happy to answer.
I tried so hard to break main arguments (char argv), on my last try I sent char argv to a function and It wrapped main, and inserted program name to beginning and called the wrapped main with new arguments. I give up.
Hahaha, cool!
If it really works - very impressive! Have you considered using sun.misc.Unsafe for things like pointer arithmetic?
Yes, but it's way too limited for replicating what you can do with C pointers. Therefore we wrote our own classes.
it's very kind of you to translate open source projects for free. I personally think you should give that more attention rather then just an italic sentence at the bottom of the page.
Thank you!
But of course a lot of development and money went into our translation framework. So the main aim has to be to make money with it. But as long as we have capacity, we're happy to translate open-source software and improving our translation while doing so.
Interesting how gotos are translated to switches
90% of C should be pretty easy to translate, but of course, the devil (and a lot of functionality in existing libraries) is in the details.
There would be probably money in translating COBOL to Java, but maybe there are solutions already?
Yep, supporting 90% of C took 10% of the time. Or even much less time.
Native C libraries (libc, libmath, ...) are just directly used from Java, not translated.
Yes, there is a lot of money in Cobol/Fortran to Java. Many tried, none successful (I know many stories). We'll look into those two languages in the future. But creating real translators takes years.
> There would be probably money in translating COBOL to Java, but maybe there are solutions already?
There is, Proyect NACA (http://developers.slashdot.org/story/09/06/24/1915205/Automa...).
Have a look at the generated codeif you want a good laugh, it translated COBOL code to Java line by line.
The main problem being undefined behaviour that is actually undefined and varies from compiler to compiler!
Yeah, you might need a config switch to configure what would you want to do with the most common undefined behaviours (or "just change" your C code)
(Because "just change" usually ends up creating other problems. Been there, done that, "code is wrong but works" and when you try to fix stuff breaks)
You are free to assume that undefined behavior never happens. Unless you want to support invalid c programs.
This looks pretty nice. Pointer/array arithmetic seems to be handled nicely, (double)malloc(sizeof(double)*100) looks pretty ugly but it's hard to tell what's going on under the DoubleContainer hood.
For OSS (or other projects) that just need running JVM byte code, checkout the GCC Bridge component of Renjin, which uses a combination of GCC to Soot to compile C and Fortran code to bytecode: https://github.com/bedatadriven/renjin/tree/master/tools/gcc...
Thanks!
malloc() we only optimized for char* so far (there are endless possible optimizations when translating C the way we do).
"(double* )malloc(sizeof(double) * 100)" should be translated as "new DoubleContainer(100, true)". We'll add that add some point.
NestedVM was able to do this over a decade ago.
http://git.megacz.com/?p=nestedvm.git;a=summary
And is actually open source, not "free for open source".
Kudos to the developers though, I'm not trying to bash your skills or diminish the quality of your work... I'm sure many enterprises can/will benefit from this. Just wanted to let the free/open source community know that you don't have chomp on the "free for open source" carrot.
I think I saw this the other day in the VIM discussion. Great job! Are the other examples, e.g. micro http, also available for download?
We translated dozens of open-source projects and decided to list only the interesting ones on the website and upload only the most interesting ones; The ones that are very well known and have a nice GUI.
Feel free to send us an email and we'll be happy to send you the other programs you're interested in.
You can also ask for translations of open-source software we didn't translate yet if you want to see the translation of a specific project.
It says the translation is correct, is there any proof of that ?
Only in the form of translating and running dozens of C applications (programs and libraries) and running their testsuites. E.g. libcurl comes with a great extensive testsuite (a perl script running against the binary).
Translated applications still need to be thoroughly tested and usually some bugs are still found.
So we didn't formalize and verify our translation. Interestingly enough, we run into bugs in javac and ecj (Eclipse Java Compiler) surprisingly often. So verifying our translation would still lead to translations with bugs ;-)
Another fun fact: Since our translation knows the limits of allocated memory (and many other things), we found many illegal memory accesses in C programs that were unknown before (libgmp, micro httpd, vim, ...) since they didn't (or only very seldomly) lead to segfaults.
This sounds like a great spin off - bug finding in C code. Have you put much thought into pursuing this?
Didn't think too much about it since there are many C specific analyzers and tools that do the same. Well, they do it way better. E.g. Valgrind.
I would guess the value would be if you find bugs that other tools don't. If you just find the same bugs as Valgrind then I agree that there would not be too much value, but if you find unique bugs then it would be useful.
What do I need to do to get opensource software translated? I would like to have this http://stjarnhimlen.se/comp/sunriset.c as java code..
Send us an email. We're currently getting overrun with requests but will handle it as fast as possible.
Please note that the software needs to be in some public repository (github, bitbucket, sourceforge, ...).
Pretty neat. I did find a bug while I was playing around with it though: it doesn't correctly translate Duff's device.
Can you elaborate on that? What's the C code, the translated Java code and your expected Java code?
Thanks!
I saw that ;-) The translation is:
which looks correct to me. Hence my question what orodley thinks is wrong.do { to.set((from = from.shift(1)).get(-1)); } while(--count > 0);I think you might be looking at the wrong thing on the Wikipedia page. The core feature of Duff's Device is interleaved switch and do statements:
(extraneous register statements removed for conciseness)n = (count + 7) / 8; switch (count % 8) { case 0: do { *to = *from++; case 7: *to = *from++; case 6: *to = *from++; case 5: *to = *from++; case 4: *to = *from++; case 3: *to = *from++; case 2: *to = *from++; case 1: *to = *from++; } while (--n > 0); }That would be a bug in the translation. We'll investigate. Thanks!
Yes indeed. We handled goto into do-while statements wrong. Fixed now. Thanks!
Is there any tool that does the opposite?
I've seen many research papers on Java -> C translation during my PhD. Some of them came with a prototype tool. But as with previous work on C -> Java translation, none of the tools actually really worked completely.
I actually wrote a Java -> Eiffel translator:
http://se.inf.ethz.ch/research/j2eif
Based on that experience I can say it would be quite a big effort to write a Java -> C translator. But not impossible.
Thanks for the post. What was the big sticking point with the Java -> C translators?
It's easy to do a minimal prototype when doing research. But writing real translators means to get all the details right. In research we usually don't have the time for that.
Translating Java - in my experience - is very hard because of the extensive runtime system (reflection, base classes, synchronization, ...). E.g. if your application does System.out.println("Hello"); you already need the System and PrintWriter classes. They in return depend (among many other things) AWT which needs the security classes. And so on. A HelloWorld pulls in 1208 classes of the base library. They in return depend on java.dll which you have to re-implement from scratch. Or you rewrite all base library classes which is even harder.
I hope this gives you a basic idea of the problem.
It certainly does. I can't imagine even starting on a project like this.
One million Dollars for a couple of days of work?
You can always format your whole source code into 1 line, so it will cost only 1$.
Do you translate open source free of charge?
Yes. As long as it's non-commercial software.
C++ to Java would be more interesting.
True. But also adds a lot of complexity on top of an already very complex translation. But it's certainly something we'll look at in the future (together with supporting Cobol and Fortran).
But I heard there are C++ to C translators. I don't know how good they are and how the resulting code looks like. But if they're decent, you could do C++ -> C -> Java :)
On the C++ to C translators: The first actual C++ compiler was actually a trans-compiler to C (actually the language was still called cFront back then, but basically it's the first version of C++[0]).
So in that case, the C++ to C compiler was there before the first real C++ compiler which appeared some time later.
I want a Java-to-Java translator.
I wonder what the performance impact is.
Why do young students no longer learn C? Don't you want to be closer to the underlying OS?
>Why do young students no longer learn C?
Who said they don't? This project is about porting existing stuff, for interoperability, etc. Not as a way to "avoid learning C".
>Don't you want to be closer to the underlying OS?
No, why would I want to do that unless I have a specific need for that?
No, why would I want to do that unless I have a specific need for that?
At least if you're using a Unix-like system, then understanding POSIX is essential for knowing how things work starting on an intermediate level.
And why would I want to understand how things work down in the OS?
Anyway, C is a horrible language. Between Fortran and Ada it's unclear to me why C should be chosen for anything, inertia and herdthink aside.
Because it intersects with most of what you're doing. Why is that process supervisor I'm using crashing? Oh, it's still using select(2) instead of a modern I/O multiplexing mechanism, and exceeding its FD_SETSIZE limit. What do all these calls mean in my tracing output?
All the infrastructure you're using goes through C/POSIX in one way or another. When things go haywire, you can only get by fixing it on the higher levels or just blindly restarting for so long.
Have you used Fortran or Ada?
Have used Fortran. It's pretty nice, actually. I mean, it does have unstructured programming, but its structured alternatives feel pretty natural. Fortran code, like C code, feels a bit fragile but unlike C, Fortran doesn't seem to be actively malevolent. I only know Ada by its (excellent) reputation.
EDIT: Only true if you use the implicit-none flag when compiling Fortran.
Performance is of course lower. But it's impossible to put a number on it. Certainly native calls (printf, scanf, fopen, fread) add a lot of overhead. Also whenever the C memory layout has to be used, performance suffers (e.g. complex pointer arithmetic).
But the approach is to get the correct translation first, and then - if needed - bottlenecks can be removed manually. This should lead to a better migration/upgrading experience.
Most universities still teach a mandatory OS course where students build their own shell in C. Just because we start people off in block-based languages and python doesn't mean that Computer Scientists don't get the full tour of languages.