Exception Safety, Garbage Collection etc. (a.k.a. Why Does Java Suck So Bad?)
slideshare.netIt’s very likely that you’ve been writing totally incorrect code without realizing it. Once you do realize it, it’s usually not too hard to fix the problem, depending on the language you're using. Interesting how java is the only language included in the title, but the slides have the opinion that C#, Ruby and Python all suck as well. Seems like a cheap way to get upvotes. And there actually is an idiomatic way to avoid the problem in Ruby: Which counts as the sugar which he mentions. See also "using" in C#. These are pretty clearly design warts. WPF (C# UI library) jumps through some interesting hoops to make it look like all your resources can be properly garbage collected, but even then it tends to come back and bite you for any non-trivial application. One man's sugar is a other man's carbohydrates. The argument for try...finally, using, lisp's (with-open-file...), etc. is that it makes it clear that there is 'invisible' code at the end of the block. I suspect that is the reason for the 'scope' sugar in D. Its advantage is that it is more light-weight; its disadvantage that it does not stand out more. I guess it depends on what you think your audience can handle which is better for your case. I do not think there are many programmers that need java's very explicit try-finally because they cannot grasp e.g. what D provides, but I also have been surprised at times by the, let's say, intelligence, of programmers I met. It doesn't really taste like syntactic sugar though; it's a special functionality of the standard library, and you could easily get away without knowing about the Java/C-style API of keeping a file handle that has to be controlled around. Out of the three languages he mentions as having syntactic sugar, he was wrong only in Ruby's case. Like you said, it's not a specific syntactic sugar for Dispose pattern, unlike C#'s "using" and Python's "with". Honest question: what's the difference between Ruby and Python/C# here? Let's look at rst's example: In Python, you would do: The difference is that the syntax used in the Ruby example uses is not syntactic sugar for Dispose pattern, it's part of Ruby's syntax for working with blocks in general, whereas the syntax used in Python example is syntactic sugar meant for Dispose pattern (but can be used for other stuff too). To be fair, the author mentions that his is a Java school and (presumably, therefore) gives most attention to Java. He sort of glosses over the "syntatic sugar" in C#. Which is simply: Java finally (pun intended) gets a similar feature in Java 7 later this year: try-with-resources: http://download.java.net/jdk7/docs/technotes/guides/language... It's unclear to me why he says that the JDK7 addition doesn't really solve the problem. It seems to me that it is exactly this problem that ARM blocks are trying to solve. It's still not as clean as leveraging deconstructors when an object goes out of scope but it's much better than before. Well, it's an awkward solution, because exceptions have this general problem of blowing away everything you're doing, and it's only a solution for this one case. The solution to this problem that makes sense to me is either conditions and restarts a la Common Lisp, or a type system that can handle multiple return values of different types in a sane way (e.g. return either a result or an error code and then pattern match against them) so that you don't always need to throw an exception in order to deal with an error. Exceptions are little more than a formalization of a particular pattern of what Haskell might call the Either monad for return values, combined with pattern matching on the error type, and automatic unwinding. A condition system has positive value; but multiple return types with error codes alongside, like Go, are a regression from exceptions, IMO. And in python you'd use the with construct. To expand on this a bit: What was his other example? DB connections? Well, in Python, the DB API requires that DB connections not easily leak, but if you're stuck with a crappy driver, you can still auto-close your connections: I think it's unfair of him to pick on Ruby and Python just because their syntax is more oriented towards assuming the garbage collector is non-sucky and exceptions aren't expensive. Edit: Fixed formatting. The majority of the comments seem to be negative, yet many of them reflect a misunderstanding of RAII. I've argued the exact argument the author does many times before and got similar responses, it seems to be hard to convey the power of RAII to people who haven't practiced it before. The meaning of RAII is that you can tie a resource to the lifetime of an object, in an environment where the object gets destroyed deterministically. There are two practical uses: 1) You can hide the fact an object is holding a resource from users of the object. 2) You can leverage the power of objects within the language and apply it to resources. The part regarding languages allowing something that might look similar with syntactic sugar didn't convey it's message very well. The syntactic sugar other languages are introducing is great on its own, but it's inferior to the object based approach since it doesn't allow (1) nor (2). It's annoying that languages with garbage collection support have gained a lot of attention solely due to the fact that you don't have to worry about freeing memory, while languages with RAII support in which you basically don't have to worry about freeing any resource, got none. Amen. RAII will be in the next great language, as it is a useful tool. This is not an academic concern, it is something that happens all the time due to rushed deadlines, stressed developers, or simple naivete. People love to slag off C++ but the higher-level devs have done some serious thinking about how to engineer robust programs. Sutter's 'Exceptional C++' is eye-opening the first time around, and the concepts are applicable to any language that has exception handling. Programming in a transactional manner has visibly improved my designs -- mostly through the paranoia that almost any statement could throw an exception. Python's "with" statement handles this scenario: http://effbot.org/zone/python-with-statement.htm The author is aware of that, but you have to remember to use "with". If you forget to use it, you can still have unclaimed resources. You also need to remember to use RAII, and that always requires creating a class. You also need to get RAII right, where as Python's with is implemented for you. Also there's no mention about the fact that Java forces you to check for exceptions (well not for unchecked ones, but those should not be fatal, and I never really got their idea). You can't forget the try-catch. The library author needs to remember to use RAII, but the client doesn't need to remember anything (as in his examples). Whereas with 'with' it's the opposite - the client has to remember to use it. Likewise for auto_ptr<T>, but he glosses over that. How do you figure? There isn't an implicit conversion from auto_ptr<T> to T *. There isn't. There are two explicit methods to get raw pointer value with different semantics: - get(): auto_ptr retains the ownership - release(): auto_ptr loses the ownership Then implement '__del__' which will e.g release the lock on garbage collection, and use 'with' when you want to be deterministic about it. But that brings up the non-deterministic point the author makes. You can't know when the object will be garbage collected. It could be a long time from now, or never if you get an unexpected deadlock. __del__ is actually perfectly deterministic in the moment it is called: as soon as the reference count reaches zero. See http://docs.python.org/reference/datamodel.html#object.__del... So when does it gets decremented? When the variable gets out of scope, or voluntarily when you call 'del', just as in C++ or D. The difference lies in the notion of scope which is just different than of the one of C/C++/D/Java/C#. Example: So you can write the exact same code that he wrote in C++ in python and it would work the exact same way, and it doesn't create the Java dispose() mess of slide 12. There is a twist though. Exceptions. Say you have an uncaught exception raised in bar(), then the Foo object created in bar() will be referenced in the stack frame of the exception, which belongs to the caller of bar(), so the Foo object reference count will drop to zero only after the caller scope closes. Try it for yourself by adding this function: If you want to override this behavior you can just as well call del when you want to get rid of the object, which is just as forgettable as adding 'scope' to variables declarations in the author's almighty D. An interesting task is to examine (and why not, control) the gc behavior with the gc module (http://docs.python.org/library/gc.html) However, to be outrageously precise about it, this behavior is the one of CPython's GC, and should not be relied on as specified by the documentation (beginning of http://docs.python.org/reference/datamodel.html) Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable. CPython implementation detail: CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. See the documentation of the gc module for information on controlling the collection of cyclic garbage. Other implementations act differently and CPython may change. Do not depend on immediate finalization of objects when they become unreachable (ex: always close files). As the Zen of Python says: explicit is better than implicit and in the face of ambiguity, refuse the temptation to guess. Hence the decision Python (IMHO rightfully) made about being explicit about it, and allow whatever current of future memory management scheme to be appropriate and forward compatible. This is already happening with PyPy (http://pypy.org/compat.html). ...am I reading correctly that his argument for using C++/D (!) is that it's hard to remember to say this: The second example should actually be 1. The author's point could be made in far fewer slides. As in, like, two slides. I hate presentations that are disrespectful of the audience's time for the sake of being cute. 2. I am generally unimpressed by arguments of the form:
(a) Language X has flaw Y.
(b) Therefore language X is unsuitable for development.
This can be instantiated for every language X for some flaw Y and is not an argument against any language. You need to additionally make the argument:
(c) Y is so serious that it outweighs the considerations in favor of X and against the other languages one might reasonably use.
Of course (c) is an incredibly high bar, which is why most anti-language zealot arguments do not even attempt to make it, and also why most anti-language zealotry is silly. In order to do (c) in this case, you would have to make the case that writing finally clauses is worse overall than, e.g. debugging memory corruption errors, and writing copy constructors and overloaded assignment operators, and all the other baggage of C++, rather than handwaving them away with "the downsides have been exaggerated". Which, by the way, is ironically the biggest flaw of this slide deck: the author vastly exaggerates the downsides of finally. try/finally is no worse than checking the result codes of C syscalls, and it is sometimes better. I don't recall the C style of programming stopping Ken Thompson from building great things and I doubt that try/finally is actually what stops Java programmers from building great things. You seem to have failed to grasp that the slides are arguing for RAII. While admittedly mildly trollish, it disparages java along with C#/ruby/python implicitly in favour of C++ only because C++ has the most 'correct' implementation of RAII. There is no argument against try/catch/finally, the argument is that the most common usage of try blocks is for dealing with resource management, not actual exceptions in program state. Given exceptions should be used to expression 'exceptional' program states, that is a significant downside to code readability. Syntactic sugar like using/yield/with blocks improve the signal to noise ratio of try block usage but still rely on programmer acceptance of that idiom. Ideally, the responsibility of cleanup would be moved entirely to the class implementation rather than the consumer. C++ did this with destructors. In a managed world where maybe you don't always want an eager dispose, rather than syntactic sugar in the caller, move it to the signature of the dispose. Something along the lines of public scoped void Dispose() {
....
} Alternatively, @Scoped or <Scoped> if you don't want more keywords. The topic is partly to blame but RAII is orthogonal to whether a language is garbage collected or not. It's sad that in an age where PLs are undergoing a sort of renaissance period, mention of C++ causes everyone to circle their respective wagons. And RAII in C++ relies on programmer acceptance of that idiom. Arguably it relies on the acceptance of fewer programmers, the library writers. No, you're just oversimplifying his argument. It's not only "hard to remember", it also makes your code overly verbose. That, in turn, degrades its readability, making everyone else spend more time on it and making you, as the author, avoid using the pattern, which leads to writing "prettier", yet unsafe code. Then again, if "hard to remember" is meant as a euphemism for "I know I should write things this way but I really don't want to", then you're completely right. All in all, he's arguing for RAII, which is impossible in Java and a bunch of other popular languages. Bjarne Stroustrup's FAQ has an entry on why C++ doesn't have a "finally" construct: http://www2.research.att.com/~bs/bs_faq2.html#finally I cannot remember where I saw this (which is a giant problem in itself because I can't remember the details, just that there was a gotcha...) but I read someplace that it is actually pretty easy to introduce disastrous bugs into try/finally blocks. Perhaps it had something to do with managing locks. It could have been the Go guys who said it when talking about why Go doesn't support exceptions, or perhaps it was in multicore literature (perhaps TBB talking about its RAII locking mechanism?). If anybody has any ideas what it is I'm trying to remember here, please comment. If not, well, ignore. That seems like a pretty serious argument to me. Not only is it a pain in the ass to remember and write that every time you consume a resource that needs to be released somehow, it also introduces all kinds of scoping headaches in Java. If mightFail() returns a value that you want to use after cleanup (i.e. all the time,) then you have to declare the variable to store that value outside of the try {} block. > [...] then you have to declare the variable to store that value outside of the try {} block That's kind of the point. The variable will only be assigned a value if mightFail returns normally. The scoping rules make it impossible to accidentally use an uninitialized variable (i.e. a variable whose first assignment was not reached due to an exception). I take it that you're imagining a case like this: Sure. Another alternative is to pull the try/catch/finally into a separate function that just returns the value. But that's not always convenient, YMMV, etc. Either way, I'd consider it just an occasional annoyance (occasional because frequently print() is quick, or the resource is not expensive or contended). Honestly, I'm not clear what he's saying, as I don't know D well. It looks like he was saying that SCOPE variables get cleaned up when leaving scope. Although if doCleanup() also does some additional cleanup, like cleaning up some associated resources that aren't scoped to this block then I think you're still screwed. Maybe someone can clarify if I'm wrong. Any variables created in the method are cleaned up when the thread leaves the method. I've never heard of clean up being done after execution leaves a try-catch block. Does anyone know the answer to this? if PHP gets something right before your language does, you should reassess your life goals :) I am not super familiar with all the details of Java inner classes, but why can't you get most of the way there by doing something like: Basically, I'm just stealing the JavaScript-y way of doing this that uses closures: Answer: the inner class can only access constant locals :(. So you have to put any mutable state in members. Blech. This might work for some cases, but would be a huge pain in the ass in others. Also, Go gets this right with "defer", and with mostly deterministic error handling (panicking is not the normal way to signal an error, returning a status is). That just pushes the problem around. You don't have to worry about forgetting to catch an exception, you have to worry about ignoring an error code. Go's multiple return at least makes that simpler than C, you don't have to remember to check a global error code, and there are no magic return codes multiplexed with the expected response. But the fundamental difference is that if you have code that does Yes, but it's much easier to recognize ignored error codes than ignored exceptions: http://blogs.msdn.com/b/oldnewthing/archive/2005/01/14/35294... In Go, it's even easier to recognize ignored error codes because you tend to have fewer levels of indentation and sometimes an explicit "_" when you're discarding a return value. High-performant, world-class C++ is fairly well understood (afaik) to only use a very limited subset of C++'s features (e.g., see the JSF Coding Standard or Google Style Guide or go work for a hedge fund where low latency and high reliability is important). The same for Java. I honor all the exceptions in the standard/platform libraries. But I see past the hype for my own interfaces. Imho, 9/10 exception classes clutter the interface. 9/10 (again for high reliability, high quality code that you want to work reliably but also be flexible enough to extend), what you want to return is a boolean (and log) for stateful methods. Meanwhile prefer stateless methods wherever possible. And generally speaking treating the JVM and Java as basically a really really high performant scripting engine (i.e., closer to JS than C; though the syntax is somewhere in between). Imho, if you can't do RAII, and you're not deterministic, you are basically a scripting language (or are in the GC family of languages, if you don't like the term 'scripting' -- I think it's cool...). Anyway, that's how I approach it. But I don't buy into the hype of exceptions most of the time (though of course I honor whatever contract other libraries use). Imho, 9/10 exception classes clutter the interface. Thats why checked exceptions are such a bad idea... Personally, I really like Go's defer, panic, recover mechanism[1] for handling exceptional circumstances. [1] http://blog.golang.org/2010/08/defer-panic-and-recover.html What you really want is something along the lines of: You'd probably also need to add checking for references to the object by still-living objects (i.e. objects not eligible for gc). If any live object has a reference to the disposable object, its "disposable" status gets removed. Similarly, returning the disposable object from the method also strips its "disposable" status. It would add extra processing at the end of the scope level, but generally methods that create/use resources don't need to be lightning fast anyway. You could even add the "disposable" modifier to class definitions, making all instances of that class disposable by default (and thus destroyed unless referenced or returned). It's not so easy because what happens if you have code like this: The only reason RAII works in C++ is because you can refer to an object by value and separately by reference. You can create stack-based objects that have a defined scope. You really can't have this in a language that always treats objects only by reference. My answer would be to create a shortcut in the existing reference system of the gc. Invoke a subset of the gc which checks a reduced list of object references made from that scope or deeper. Reference counted systems only work if you don't create cyclic (strong) references, so that argument is moot. In fact, reference counted systems can deal with resource objects easily so long as the compiler/interpreter ensures that pending autoreleases are executed when unwinding the stack during an exception. Isn't that exactly what reference counting ala shared_ptr does? I think you can avoid nesting finally thus: what happens if x.dispose() throws an exception? if (x != null) try { x.dipose(); } catch (Exception){} } it's why the using keyword is so nice. Still, this is all hoops languages force us to deal with when they shouldn't (which is the OPs point) You're lucky Java is checked at compile time. Ruby would eat that "no such method dipose" and silently leak x. You're lucky Java is checked at compile time. Not lucky enough, because x.dispose() could throw an unchecked exception. I was thinking the same thing, but replace your finally block with, I've taken Groovy's with... syntax and built utils that are ie: public void withConnection(Callback<Connection> callback) {
Connection connection = createConnection();
try {
callback.call(connection);
} finally {
connection.dispose();
}
} Once Java actually gets closures it'll make this soooo much nicer. The problem with this whole argument is that the author assumes that deterministic memory performance is completely necessary. It's certainly nice, but there are so many times when it just doesn't matter. While I agree that Java sucks because it makes certain very common things require extreme verbosity, worrying about garbage collection isn't all that important except in systems-level programming (which isn't done in Java really), and large GUI that need tons of memory and still need responsiveness. But many people wouldn't even think to use Java in those cases anyways, so I'm not really sure what this guy's point is. Read the slides again. The author isn't concerned about deterministic memory performance. He's concerned about the fact that that you can't do RAII in Java and therefore any method that allocates resources, performs an action that could throw an exception and then deallocates the resources must wrap the action in a try...finally block. This is overly verbose and the compiler won't tell you if you forget to do it. This fact has lead to the popularity of the springframework in Java. They use the template design pattern to hide all of the resource acquisition and release. This makes it much easier to code as you don't have to "remember" to close your db connections. The remember argument is somewhat weak because you still need to remember to write your destructor. I do buy that it's easier to remember it in one place than all over the code. Non-slideshare link: http://docs.google.com/viewer?url=https%3A%2F%2Fs3.amazonaws... Edit: Warning. Actually seems to cut off some of the slides. In the Java example with three try/finally's, he calls 'dispose()' on a File. I've never seen this before, what does it do? Or did he just hallucinate that to make the example look more dramatic?
If the "stuff" throws an exception, the file gets closed automatically. And while File is a library class, it's getting no special favors here --- any pure ruby library can easily implement similar APIs, and ActiveRecord's connection pool, for example, actually does. File.open("...") do |f|
firstline = f.readline
... stuff that might throw an exception ...
end
The syntax you see here is for passing a block to a method. In this case, you're passing a block to File.open, which opens the file, executes your block with it and then makes sure to close the file no matter what. File.open("...") do |f|
... stuff that might throw an exception ...
end
What this does is evaluate open("x.txt"), call the __enter__ method on the resulting value (called the context guard), assign the result of the __enter__ method to f, executes the body of the with statement and makes sure to call __exit__ method of the guard. with open("x.txt") as f:
... stuff that might throw an exception ...
using(var resource = new Resource())
{
// potential code that throws exception
}
f is an opened file which is automatically closed. This isn't strictly necessary in Python, since files are flushed and closed when reaped, including in case of exception, but it's useful. You can also do this with all of the threading primitives: with open("some.filename") as f:
...
Useless syntactic sugar? Maybe. It's an explicit scope which makes certain guarantees, though, so it's not just fluff. with threading.Lock():
do_that_one_contentious_thing()
So it's definitely possible. with contextlib.closing(dbapi.Connection(...)) as handle:
cursor = handle.cursor()
...
will output: from __future__ import print_function
class Foo:
def __init__(self, text):
self.text = text
def __del__(self):
print("deleting %s" % self)
def __str__(self):
return ("%r(%s)" % (self,self.text))
def bar():
if True:
print("+scope 1 in bar")
f = Foo("in bar")
print("-scope 1 in bar")
if True:
print("+scope 1")
bar()
print("bar quit")
if True:
print("+scope 2")
f = Foo("global")
print("-scope 2")
print(f)
print("-scope 1")
By now you have noticed that the scope is function-wide (or module-wide for global code). +scope 1
+scope 1 in bar
-scope 1 in bar
deleting <__main__.Foo instance at 0x1004d4a70>(in bar)
bar quit
+scope 2
-scope 2
<__main__.Foo instance at 0x1004d4a70>(global)
-scope 1
deleting <__main__.Foo instance at 0x1004d4a70>(global)
and calling it in place of bar() after "+scope 1", while raising any exception in bar(). def baz():¬
try:¬
bar()¬
except:¬
print("caught")¬
instead of this: try {
mightFail();
} finally {
doCleanup();
}
mightFail();
doCleanup();
Anyway, I hate this slide deck so much. class C {
void mightFail() { ... }
~C() { doCleanup(); }
};
C c;
c.mightFail();
That's true, the scoping protects you from making that mistake. However, the more common case is this: try {
var result = mightFail();
}
catch {
handleError();
}
finally {
cleanup();
}
print(result) // we don't want to allow this
There are two things I can do to fix the latter case. I can either declare the variable outside the block, adding a silly-looking extra line of code, or I can move the print() within the block, which can be problematic -- it means that if print() were something time-consuming, I would be holding onto the resource during print() for no reason. I think this is a crappy thing to force onto the programmer. try {
var result = mightFail();
}
finally { // let the exception bubble up
cleanup();
}
print(result); // this must be OK, but I can't do it in Java
You can even design your resource layers such that they can only be used this way (or are easiest to use this way). DB.open(new Runnable() {
public void run() {
// ... do stuff here ...
}
});
DB.open(function() {
...
});
then you can guarantee that a() will be executed, then b(), then c(). And if there are any branches in case of errors, they will be explicit. Exceptions surround every statement with the possibility of an unannounced exit. a()
b()
c()
An idiom designed specifically for the purpose of resource management would make for a far cleaner implementation than shoehorning an existing mechanism. void myMethod()
{
disposable File myFile = new File(somePath);
// ... do stuff with the file
// "disposable" modifier causes myFile to be
// forcibly destroyed upon leaving scope for any
// reason (except if the disposable object itself
// is returned from the method).
}
If you answer, "add referencing counting", reference counting isn't perfect because you can create cyclic references. static File globalFile;
void register_file(File aFile)
{
globalFile = aFile;
}
void myMethod()
{
disposable File myFile = new File(somePath);
register_file(myFile);
}
X x = null;
Y y = null;
try{
x = foo();
y = bar();
yadda(x,y);
} finally {
if (x!=null) x.dispose();
if (y!=null) y.dispose();
}
...and let that handle all the possible issues. Util.dispose(x,y);