Scala: Consider syntax with significant indentation
github.comHaving just moved to my first Scala team, I would really really not like this to be in the language. Given 5 ways to do something, a large organization is going do something all 5 ways.
Scala is a fun language, but consistency is not one of its ecosystem's strengths.
Grain of salt: again, I'm new to Scala, but I'd hate to think of it as a language that can only be good once you're Stockholm'd.
Having just moved to my first Scala team, I would really really not like this to be in the language. Given 5 ways to do something, a large organization is going do something all 5 ways.
I agree with you on one level. But taking a step back, and looking at it from a much longer time perspective than is practical in business - is the problem really the programming languages giving us too much choice?
Constraining how a programming language does something because we're not yet "there" in terms of software engineering does seem a bit backwards.
Programming language design is a part of software engineering. The easier a language makes it for you to do something, the more likely you are to do - regardless of whether it's considered good software engineering.
That said, I don't see anything wrong with this proposal. It makes more likely that code structure mirrors its layout, which is a good thing. Even if it's not always used, it's an improvement where it is.
There should be a styleguide that is enforced by CI tools, regardless of language, so that's not an issue.
However, special cases in the syntax increase the complexity and I would always argue against that.
Yeah, and there's certainly style tools in place at some level (formatting, implicit types, etc). But I don't know if I agree that there should be global language rules about what defines sematic style across an org with 20+ relatively independent teams.
And that says nothing about third party libs that one can't control. Again, I am new to the ecosystem, but with languages like C#, it seems that the comparatively fewer language constructs streamline API patterns.
I don't know about how your organization works, but any one development team with a shared codebase should enforce coding styles. This means indentation but also how common constructs are used. Especially if someone external might work on the code. Some standards are just necessary to achieve uniformly looking code. But some might prevent bugs (For example mandatory curly braces after ifs). It's also just a simple Tool to prevent friction and create a sense of shared code ownership.
External libraries are usually black boxes and you don't work on the code so I don't understand the argument.
Like Scalastyle?
A combination of scalastyle, hairyfotr's linter, wartremover, scalafmt.
In a previous project I spent a lot of time thinking about significant indentation (http://akkartik.name/post/wart), so I'm glad to see it getting more mind-share. However, the end comments are absolutely insane. Here's a counter-proposal: if you want optional end delimiters, make them not look like comments. Then the relevant examples in OP would look like this:
def f =
def g =
...
(long code sequence)
...
end f // optional
while
println("z")
true
do
println("x")
println("y")
end while // optional
package p with
object o with
class C extends Object
with Serializable with
val x = new C with
def y = 3
val result =
if x == x then
println("yes")
true
else
println("no")
false
end C // optional
end o // you guessed it: optional
Edit: somebody already brought this up in the comments on Github: https://github.com/lampepfl/dotty/issues/2491#issuecomment-3...Folks commenting on the willingness to make breaking changes in 'Scala' should note that this is Dotty, an upcoming very eventual breaking replacement for Scala (think Python 3 or perhaps even Perl 6 but less radical than the latter).
For now, to quote the authors, it's a: 'Research platform for new language concepts and compiler technologies for Scala'
So this is exactly the right place to test this sort of concept.
Yes, but they're unusually willing to make significant breaking changes in point releases as well — how many times now (including, IIRC, one or more ongoing efforts) have they rewritten the collections API? Yes, that's a standard library change, not a language change, but still.
And this is a huge breaking change: if they make it you won't be able to convert your Scala 2.1X code to Dotty/Scala 3 by simply fixing all the compiler errors; you'll either need a foolproof automated conversion tool or to hand-audit your entire codebase.
Well they last rewrote the collections API in Scala 2.8 (I think), which was in 2010. If they redo it in 2.13 that'll have been about 8 years. That's certainly not C++ levels of backwards compatibility, but it's hardly frequent.
They do make some other breaking changes in 2.xx releases (which come out about every 1.5 - 2 years!), but I wouldn't really call them point releases, given that the 2.x hasn't changed in over a decade - that's like saying Python shouldn't make breaking changes in Python 2.6 -> 2.7. They don't generally make breaking changes in actual point releases (2.XX.yy).
Also, they're building an automated conversion tool (https://www.scala-lang.org/blog/2016/10/24/scalafix.html) for Dotty. As I said, compare this to Python 3. The Scala -> Dotty rewriter should be able to be more complete than 2to3 was however, mostly because they're not fixing many ambiguities like 2to3 was with encoding. Their rewriter is also based on a full sophisticated framework that can parse or unparse multiple versions of Scala, including Dotty, in one build.
Hopefully being able to rewrite a much, much higher percentage of code and the backporting of changes into Scala 2.13+ will make Dotty adoption happen faster (than Python 3) when it comes.
What kind of version number scheme does Scala use where breakage is allowed in point releases?
Epoch.major.minor
I've been programming Scala almost 4 years, and I only know one collections library. I think it's pretty excellent to work with, but I'm even more excited about the upcoming changes.
How is it a breaking change? It is pretty straightforward to write an automated tool to convert one syntax into another one.
It's even easier to write a converter considering the strong typing in Scala; a lot less room for ambiguity than with Python's 2to3.
I use Scala as my primary development language since 2010. My first reaction to this change was "well, this is the end of it" -- as I can't read or write anything in Python because of that.
But I looked more carefully and the proposal is more balanced. It looks more like Haskell than Python -- and Haskell's syntax is one of the best ever invented, in my opinion (too bad I am not yet that smart to casually emit production-grade Haskell code).
So, I'm fine with that.
Lovely thread to read, these guys are so analytical, well mannered and mutually respectful. Each comment is well thought, constructive, detailed, clear and down to the point. But most of all they are proof reading before submitting. I don't do it all the times.
I spent two or three years heavily using Scala. During that time, I more than once got to the point where I literally couldn't casually read code I'd written less than a week before.
This seems like yet another great way to make that worse.
1. If you're going to make indentation significant, implement it in a way that makes tab/space confusion impossible. Python 2.7 does this. The check for indent ambiguity between two strings is:
- Remove the common leading whitespace of both strings.
- The remaining parts of both strings must be all tabs, all spaces, or empty.
This is the least restrictive rule which catches all indent ambiguities.
2. Indent-based syntax is great for imperative languages, but not so good for functional languages with very long expressions. It's not clear how to indent stuff like "a.b(x).c(y).d.e(z)", where a-e x-z may be long expressions. In LISP, the all-parenthesis syntax was so simple, and so hard for humans to parse without help, that indentation was nailed into EMACS and everybody did it that way. The indentation wasn't significant, but it was standardized.
Here's word wrap in Rust:
s.lines()
.map(|bline| UnicodeSegmentation::graphemes(bline, true) // yields vec of graphemes (&str)
.collect::<Vec<&str>>())
.map(|line| wordwrapline(&line, maxline, maxword))
.collect::<Vec<String>>()
.join("\n")
Note that the first "collect" is one level deeper in parentheses than the second one.
How would you do that with indentation only?You can get rid of the second map using itertools, which has join for iterators, incidentally.
I feel like you can get rid of the inner one too but forget the right combinator.
Off topic - If only that worked.
error[E0061]: this function takes 1 parameter but 2 parameters were supplied.use self::itertools::join; use self::itertools::Itertools; .... s.lines() .map(|bline| UnicodeSegmentation::graphemes(bline, true) .collect::<Vec<&str>>()) .join(|line| wordwrapline(&line, maxline, maxword),"\n")Rust is picking the wrong version of "join". There's one in Iter with one parameter and one in Itertools with two parameters. Haven't figured out how to get the one from Itertools yet. The obvious syntax, ".Itertools::join(...)" gets "error: expected `<`, found `join`".
Without either function overloading or member function qualification, how do you do this?
This is the free function version; leave it out. I haven't tried this myself, but I'd bet that's what's going on; https://docs.rs/itertools/0.5.6/itertools/trait.Itertools.ht... is different from https://docs.rs/itertools/0.5.6/itertools/fn.join.html, though.use self::itertools::join;You are getting the join from itertools. There's no join on iterators in the standard library, only on &[T] (which is why the collect is needed).
A quick search of itertools docs indicates there's no join method in it that takes two parameters, only a method that takes a &str like the slice one in std, and a freestanding version of it that takes the iterator as the first argument (instead of as the method's self) and the &str as the second.
Based on reading that documentation, I think steve's original comment was meant to say "second collect", not "second map".
One advantage of significant whitespace is that it may enable a very concise (G)ADT notation:
enum Tree[T]
Branch(t: Tree[T])
Leaf(t: T)
Compare that with preset day Scala: sealed trait Tree[T]
case class Branch(t: Tree[T]) extends Tree[T]
case class Leaf(t: T) extends Tree[T]
In general with this proposal the `case` keyword could be implied in any pattern matching block. That alone would be a big win wrt to reducing keyword noise, something the MLs have enjoyed for decades.You wouldn't need significant whitespace to do this; any block syntax would work. The following, for instance, would still drastically reduce the noise:
The main benefit here involves using a block instead of "extends", and using "enum" instead of the odd use of a class hierarchy.enum Tree[T] { Branch(t: Tree[T]) Leaf(t: T) }Indeed, there is already a proposal for an enum syntax in dotty that looks almost exactly like that:
https://github.com/lampepfl/dotty/issues/1970enum Tree[T] { case Branch(t: Tree[T]) case Leaf(t: T) }Not sure, all of the proposed syntax changes in Dotty require `case` in pattern matching blocks. With braces how would you parse this (contrived) example?
With significant whitespace the first block of indented code would mark a `case` pattern, with subsequent indents belonging to the matched pattern.foo match { x: Bar => (y: Int) => x.num + y x: Baz => ... }With braces I suspect the `case`less version becomes more difficult to parse. Otherwise why require `case` in pattern matches?
I have to say I don't know of any other language where the maintainers have such a cavalier attitude towards making breaking changes. It makes the common analogy between Scala and C++ a bit ironic.
Not that that's necessarily a bad thing; it's good to have some popular languages that follow a less conservative approach, if only to get some real-world experience with different strategies for dealing with the tradeoffs between keeping a language modern and maintaining backwards compatibility with legacy code.
I thought Scala was supposed to be following a route where the amount of "stuff" they include is getting reduced?
The language is idiosyncratic and multiparadigm enough as it is, I'd say.
Reminds me of the difference between JavaScript and CoffeeScript.
CoffeeScript attempts to become more concise by removing delimiters and making things more implicit. I don't think that actually adds much value.
In a sense, it was the equivalent of trying to simplify traffic by removing lane delimiters and street signaling. You could somehow imagine they're there, but it's better if you can see them.
This is a relatively small change to how blocks are defined syntactically in a language that already brings quite a bit to the table over Java.. You say reminds, which is fair, but I'd say it's also quite different.
I'm a huge proponent of TypeScript and critic of using CoffeeScript in 2017, so while you may not(but may!) agree that TypeScript brings significant value over raw ES6+ I definitely relate with CofeeScript not bringing enough value to the table to warrant such a divergence in syntax. I will say though that pre-ES6+ it really tidied up a few things like classes, this binding, etc.
I've argued for and successfully migrated CoffeeScript projects to TypeScript, but I'll be the first to admit it is "ugly" compared to CoffeeScript. F# is a pretty elegant language IMHO, and it's success using significant white space is cited early in the post. If we can have all the added VALUE of Scala AND a tidier, potentially optional syntax then why not?
"Tidy" is very relative. We could go back to the traffic lanes example and say that streets would be tidier without lines drawn on them. But those lines look better than crashed cars and dead people. Minimalism is about removing redundant stuff, those delimiters aren't redundant unless made redundant through whitespace. I do not think whitespace is the way to go.
The one significant effect of indentation-based syntax is that it makes copy-pasting code from places like books or Stack Overflow somewhat harder and error prone. I've been hit by this few times when learning Python, I can imagine it would cause some frustration for Scala users too.
Here's my straw-man proposal:
Allow both indentation-based and bracket-based syntax. Have a tool like scalafmt/goformat that freely converts between the two on a per-file and per-project basis.
When you're writing your own code, use the indentation-based syntax. You can paste in bracket-based code anywhere, hit the auto-format key on your IDE or run the commandline formatter and everything becomes nice and indentation-based.
When you're writing books, libraries, example projects, SO posts, use the bracket-based syntax. That way people who read your book/library source/example/SO post can freely copy and paste into their own projects.
This has minimal impact on bracket-based syntax diehards; everything they read and write stays the same. They won't even see the new syntax if they don't want to.
I hear people say this but I don't really get it.
This has been a problem for me so infrequently that I can't even recall an incident where it wasn't so trivial to fix that I did it unconsciously.
Maybe if the website/ebook has messed up formatting or you're pasting from some monstrosity of a PDF - but in the real world - not an issue.
Maybe my editor (PyCharm) is doing clever things to protect me from this - who knows...
The proposal is carefully balanced to minimize this effect (unlike Python).
And copy-pasting is essential to writing code; if anybody tells you otherwise, they are lying. :)
I'm working on a text editor that automatically formats the code, it will have trouble with significant whitespace if the code is valid both with and without white-space, and even compiles, and even runs with or without, although with a possible hidden bug. I've never worked with a language with significant whitespace so I'm wondering, does it create bugs!? In my experience, bad formatting can cause bugs, or is annoying, eg. syntax errors like "missing bracket" and you got no idea where it's missing, which is why my editor does the indentation automatically and enforce it (you can't change it).
Well, significant whitespace will play the role of braces, right? You can't have your editor "fix" the braces, either -- the author has to put them in. In the same way, you can't have your editor "fix" the indentation -- the author has to put it in.
But with braces, you can do consistency checks, such as that they are properly nested. In a similar way, you can do consistency checks with indentation. For example:
The third line is wrong, the transition from line 1 to line 2 introduced an indentation step, and the third line is neither left nor right.line 1 line 2 line 3I find adding braces much easier then managing white space though. For example commenting out code, adding branches, removing branches etc.
A non-backwards compatible syntax would lower the barrier to all breaking changes.
Language spec [1], replacing the standard library [2], etc.
[1] Disallow implicit conversions between unrelated types https://github.com/lampepfl/dotty/pull/2060
[2] Remove parallel collections from scala-library https://github.com/scala/scala/pull/5603
This is a big change to make to an existing language. Even if you really, really, really like indentation-based syntax, I doubt that the benefit of the new syntax would match the cost of change.
The benefit being...? Making it more complex?
I love scala, but scala already has some of the more confusing syntactic decisions, not sure if this is gonna help.
What does the literature say?
Is it known (as in quantitatively) if significant indentation is a good thing, or a bad thing?
I think it's a bit of a bikeshed-type issue.
Subjectively, languages with significant indentation (Python, F#, Nim, Coffeescript, Haskell, etc) are often thought to have nicer-looking syntax.
If I had the patience I would try and measure something about the productivity of those languages vs. opensource projects (not at all sure what) and control a lot of things and do lots of stats and then write it all up and then realize that no one else cares.
There is very little literature about objectively comparing programming languages like you are asking here.
I know.... Computer Science research has some huge issues!
The key flaw with indentation comes when you try to use it with printing. If your function extends across a page-break, such as in a book or a printout, it is not possible to resolve the indentation by eye.
Just no.
Sigh. Why are the people with the worst ideas most vocal about it?
If you like significant invisible characters, use Python. No need to introduce this into other languages.
> Sigh. Why are the people with the worst ideas most vocal about it?
Given the creator of this proposal it feels very odd to take this stance. He created the language the proposal was made for. How vocal Odersky is or isn't in the community, it's very likely the precise degree of vocal he should/can be.
Also, it's a proposal to explore a topic. Odersky is a language designer and he wants to explore what his creation can be. You're being awfully dismissive of someone exploring their own project.