Settings

Theme

C to Java Translation. Automatic, Complete, Correct. Free for Open-Source.

mtsystems.ch

55 points by marco2357 11 years ago · 54 comments

Reader

marco2357OP 11 years ago

Main author here. Let me know if you have any questions. I’d be happy to answer.

  • shultays 11 years ago

    I tried so hard to break main arguments (char argv), on my last try I sent char argv to a function and It wrapped main, and inserted program name to beginning and called the wrapped main with new arguments. I give up.

  • MrBuddyCasino 11 years ago

    If it really works - very impressive! Have you considered using sun.misc.Unsafe for things like pointer arithmetic?

    • marco2357OP 11 years ago

      Yes, but it's way too limited for replicating what you can do with C pointers. Therefore we wrote our own classes.

  • rip747 11 years ago

    it's very kind of you to translate open source projects for free. I personally think you should give that more attention rather then just an italic sentence at the bottom of the page.

    • marco2357OP 11 years ago

      Thank you!

      But of course a lot of development and money went into our translation framework. So the main aim has to be to make money with it. But as long as we have capacity, we're happy to translate open-source software and improving our translation while doing so.

raverbashing 11 years ago

Interesting how gotos are translated to switches

90% of C should be pretty easy to translate, but of course, the devil (and a lot of functionality in existing libraries) is in the details.

There would be probably money in translating COBOL to Java, but maybe there are solutions already?

  • marco2357OP 11 years ago

    Yep, supporting 90% of C took 10% of the time. Or even much less time.

    Native C libraries (libc, libmath, ...) are just directly used from Java, not translated.

    Yes, there is a lot of money in Cobol/Fortran to Java. Many tried, none successful (I know many stories). We'll look into those two languages in the future. But creating real translators takes years.

  • loopbit 11 years ago

    > There would be probably money in translating COBOL to Java, but maybe there are solutions already?

    There is, Proyect NACA (http://developers.slashdot.org/story/09/06/24/1915205/Automa...).

    Have a look at the generated codeif you want a good laugh, it translated COBOL code to Java line by line.

  • emsy 11 years ago

    The main problem being undefined behaviour that is actually undefined and varies from compiler to compiler!

    • raverbashing 11 years ago

      Yeah, you might need a config switch to configure what would you want to do with the most common undefined behaviours (or "just change" your C code)

      (Because "just change" usually ends up creating other problems. Been there, done that, "code is wrong but works" and when you try to fix stuff breaks)

    • TorKlingberg 11 years ago

      You are free to assume that undefined behavior never happens. Unless you want to support invalid c programs.

bedatadriven 11 years ago

This looks pretty nice. Pointer/array arithmetic seems to be handled nicely, (double)malloc(sizeof(double)*100) looks pretty ugly but it's hard to tell what's going on under the DoubleContainer hood.

For OSS (or other projects) that just need running JVM byte code, checkout the GCC Bridge component of Renjin, which uses a combination of GCC to Soot to compile C and Fortran code to bytecode: https://github.com/bedatadriven/renjin/tree/master/tools/gcc...

  • marco2357OP 11 years ago

    Thanks!

    malloc() we only optimized for char* so far (there are endless possible optimizations when translating C the way we do).

    "(double* )malloc(sizeof(double) * 100)" should be translated as "new DoubleContainer(100, true)". We'll add that add some point.

bcg1 11 years ago

NestedVM was able to do this over a decade ago.

http://git.megacz.com/?p=nestedvm.git;a=summary

And is actually open source, not "free for open source".

Kudos to the developers though, I'm not trying to bash your skills or diminish the quality of your work... I'm sure many enterprises can/will benefit from this. Just wanted to let the free/open source community know that you don't have chomp on the "free for open source" carrot.

haches 11 years ago

I think I saw this the other day in the VIM discussion. Great job! Are the other examples, e.g. micro http, also available for download?

  • marco2357OP 11 years ago

    We translated dozens of open-source projects and decided to list only the interesting ones on the website and upload only the most interesting ones; The ones that are very well known and have a nice GUI.

    Feel free to send us an email and we'll be happy to send you the other programs you're interested in.

    You can also ask for translations of open-source software we didn't translate yet if you want to see the translation of a specific project.

fuklief 11 years ago

It says the translation is correct, is there any proof of that ?

  • marco2357OP 11 years ago

    Only in the form of translating and running dozens of C applications (programs and libraries) and running their testsuites. E.g. libcurl comes with a great extensive testsuite (a perl script running against the binary).

    Translated applications still need to be thoroughly tested and usually some bugs are still found.

    So we didn't formalize and verify our translation. Interestingly enough, we run into bugs in javac and ecj (Eclipse Java Compiler) surprisingly often. So verifying our translation would still lead to translations with bugs ;-)

    Another fun fact: Since our translation knows the limits of allocated memory (and many other things), we found many illegal memory accesses in C programs that were unknown before (libgmp, micro httpd, vim, ...) since they didn't (or only very seldomly) lead to segfaults.

    • danieltillett 11 years ago

      This sounds like a great spin off - bug finding in C code. Have you put much thought into pursuing this?

      • marco2357OP 11 years ago

        Didn't think too much about it since there are many C specific analyzers and tools that do the same. Well, they do it way better. E.g. Valgrind.

        • danieltillett 11 years ago

          I would guess the value would be if you find bugs that other tools don't. If you just find the same bugs as Valgrind then I agree that there would not be too much value, but if you find unique bugs then it would be useful.

sputnik27 11 years ago

What do I need to do to get opensource software translated? I would like to have this http://stjarnhimlen.se/comp/sunriset.c as java code..

  • marco2357OP 11 years ago

    Send us an email. We're currently getting overrun with requests but will handle it as fast as possible.

    Please note that the software needs to be in some public repository (github, bitbucket, sourceforge, ...).

orodley 11 years ago

Pretty neat. I did find a bug while I was playing around with it though: it doesn't correctly translate Duff's device.

  • marco2357OP 11 years ago

    Can you elaborate on that? What's the C code, the translated Java code and your expected Java code?

    Thanks!

    • snnw 11 years ago
      • marco2357OP 11 years ago

        I saw that ;-) The translation is:

          do {
        
            to.set((from = from.shift(1)).get(-1));
        
          } while(--count > 0);
        
        
        which looks correct to me. Hence my question what orodley thinks is wrong.
        • jmuhlich 11 years ago

          I think you might be looking at the wrong thing on the Wikipedia page. The core feature of Duff's Device is interleaved switch and do statements:

            n = (count + 7) / 8;
            switch (count % 8) {
            case 0: do { *to = *from++;
            case 7:      *to = *from++;
            case 6:      *to = *from++;
            case 5:      *to = *from++;
            case 4:      *to = *from++;
            case 3:      *to = *from++;
            case 2:      *to = *from++;
            case 1:      *to = *from++;
                    } while (--n > 0);
            }
          
          (extraneous register statements removed for conciseness)
danieltillett 11 years ago

Is there any tool that does the opposite?

  • marco2357OP 11 years ago

    I've seen many research papers on Java -> C translation during my PhD. Some of them came with a prototype tool. But as with previous work on C -> Java translation, none of the tools actually really worked completely.

    I actually wrote a Java -> Eiffel translator:

    http://se.inf.ethz.ch/research/j2eif

    Based on that experience I can say it would be quite a big effort to write a Java -> C translator. But not impossible.

    • danieltillett 11 years ago

      Thanks for the post. What was the big sticking point with the Java -> C translators?

      • marco2357OP 11 years ago

        It's easy to do a minimal prototype when doing research. But writing real translators means to get all the details right. In research we usually don't have the time for that.

        Translating Java - in my experience - is very hard because of the extensive runtime system (reflection, base classes, synchronization, ...). E.g. if your application does System.out.println("Hello"); you already need the System and PrintWriter classes. They in return depend (among many other things) AWT which needs the security classes. And so on. A HelloWorld pulls in 1208 classes of the base library. They in return depend on java.dll which you have to re-implement from scratch. Or you rewrite all base library classes which is even harder.

        I hope this gives you a basic idea of the problem.

danbruc 11 years ago

One million Dollars for a couple of days of work?

  • shultays 11 years ago

    You can always format your whole source code into 1 line, so it will cost only 1$.

marvel_boy 11 years ago

Do you translate open source free of charge?

ExpiredLink 11 years ago

C++ to Java would be more interesting.

  • marco2357OP 11 years ago

    True. But also adds a lot of complexity on top of an already very complex translation. But it's certainly something we'll look at in the future (together with supporting Cobol and Fortran).

    But I heard there are C++ to C translators. I don't know how good they are and how the resulting code looks like. But if they're decent, you could do C++ -> C -> Java :)

    • SCHiM 11 years ago

      On the C++ to C translators: The first actual C++ compiler was actually a trans-compiler to C (actually the language was still called cFront back then, but basically it's the first version of C++[0]).

      So in that case, the C++ to C compiler was there before the first real C++ compiler which appeared some time later.

      [0] http://www.cplusplus.com/info/history/

TheLoneWolfling 11 years ago

I want a Java-to-Java translator.

lessthunk 11 years ago

I wonder what the performance impact is.

Why do young students no longer learn C? Don't you want to be closer to the underlying OS?

  • coldtea 11 years ago

    >Why do young students no longer learn C?

    Who said they don't? This project is about porting existing stuff, for interoperability, etc. Not as a way to "avoid learning C".

    >Don't you want to be closer to the underlying OS?

    No, why would I want to do that unless I have a specific need for that?

    • vezzy-fnord 11 years ago

      No, why would I want to do that unless I have a specific need for that?

      At least if you're using a Unix-like system, then understanding POSIX is essential for knowing how things work starting on an intermediate level.

      • tormeh 11 years ago

        And why would I want to understand how things work down in the OS?

        Anyway, C is a horrible language. Between Fortran and Ada it's unclear to me why C should be chosen for anything, inertia and herdthink aside.

        • vezzy-fnord 11 years ago

          Because it intersects with most of what you're doing. Why is that process supervisor I'm using crashing? Oh, it's still using select(2) instead of a modern I/O multiplexing mechanism, and exceeding its FD_SETSIZE limit. What do all these calls mean in my tracing output?

          All the infrastructure you're using goes through C/POSIX in one way or another. When things go haywire, you can only get by fixing it on the higher levels or just blindly restarting for so long.

        • TorKlingberg 11 years ago

          Have you used Fortran or Ada?

          • tormeh 11 years ago

            Have used Fortran. It's pretty nice, actually. I mean, it does have unstructured programming, but its structured alternatives feel pretty natural. Fortran code, like C code, feels a bit fragile but unlike C, Fortran doesn't seem to be actively malevolent. I only know Ada by its (excellent) reputation.

            EDIT: Only true if you use the implicit-none flag when compiling Fortran.

  • marco2357OP 11 years ago

    Performance is of course lower. But it's impossible to put a number on it. Certainly native calls (printf, scanf, fopen, fread) add a lot of overhead. Also whenever the C memory layout has to be used, performance suffers (e.g. complex pointer arithmetic).

    But the approach is to get the correct translation first, and then - if needed - bottlenecks can be removed manually. This should lead to a better migration/upgrading experience.

  • acbart 11 years ago

    Most universities still teach a mandatory OS course where students build their own shell in C. Just because we start people off in block-based languages and python doesn't mean that Computer Scientists don't get the full tour of languages.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection