Thoughts on Designing a new Web Apps Language

31 points by jhferris3 15 years ago · 42 comments

Reader

Ur/Web hits every single one of your bullet points out of the park: http://www.impredicative.com/ur/

jhferris3OP 15 years ago

I had a friend mention this to me before I wrote this, and it does seem pretty neat.
That said, I think it's big shortcoming to me (havent spent much time with it, spent ~10 minutes going through some of the demos) is that its a functional language. Personally, I don't mind haskell/sml/their ilk, but since I was trying to conceive of a language that people would actually use/adopt, it needs to be imperative and probably resemble something along the lines of C/php/python.
gdp 15 years ago

Upvoted! I was reading those bullets and mentally ticking them off against Ur/Web as well.

He rather lost me at first two points.

I don't hear a lot of complaints from the Python or Ruby camps about how much they desperately miss static typing. It would have to be via a seriously 'get out of my way' type-inference for me to want to allow all that ugly back into the language.

And point 2. Performance? Again - don't hear much complaining about this for the vast majority of applications. I rarely find the bottlenecks to be in my web language constructs. Most people aren't writing another Twitter.

Chris_Newton 15 years ago

> I don't hear a lot of complaints from the Python or Ruby camps about how much they desperately miss static typing. It would have to be via a seriously 'get out of my way' type-inference for me to want to allow all that ugly back into the language.
I can't speak for anyone else, but I do miss static typing when I'm using languages like Python and JavaScript for web work. However, I think it would take more than just type inference to make a strongly/statically typed language that was good for the same jobs. You also need powerful tools for parsing freeform input such as JSON or XML, both to bring valid input into your static type system with little effort and to give as much control as you need to recover from unexpected input.
The first problem is solved by many languages; the second one, not so much. I think that is part of why dynamic languages are so popular for web development today.
As an aside, there is also a design/architectural question here. The theoretician in me says of course I should parse and validate all incoming data as close to the point where it comes into my server-side code as possible, so everything internal is clean. This fits nicely with the whole static typing thing. On the other hand, the pragmatist in me says that sometimes, particularly while prototyping, it's useful to keep the parsing and error recovery logic close to where the data will be used. That's much easier if you can just dump all the input into a nested array of hashes of objects of dictionaries of widgets when it arrives and worry about the details if and when you get to the code that cares.
- nxn 15 years ago
  
  With regards to you saying we're not there when it comes to tools for parsing JSON/XML:
  Perhaps I'm not sure what you mean exactly, but I've been using frameworks and tools that handle JSON de/serialization for years now. For example, whenever building applications in ASP.NET MVC and ExtJS on the client side, I would use a project called Ext.Direct.MVC. All that does is set up a handler that automatically grabs specific types of requests, converts the JSON data in the request to an instance of a model, or any data structure that you expect to receive, and passes that created instance over to your controller automatically for you. So on the client you just call the controller with some JSON, and on the server you declare a controller that receives an instance of one of your defined models. That's it, you're done. The only time you'd have to so much as interact with the JSON serializer is when you wanted to return some data like a model -- but all this means is wrapping the model you're returning in a call to the JSON serializer and you're done.
  Also, if your selection of languages/frameworks does not offer a tool like this, it likely does offer enough sub-components for you to be able to create a system like that in a day or two. EDIT: (or perhaps I am just making a bold claim here assuming that all language communities have at least one JSON serializer/parser as awesome as Json.NET).
  In my personal opinion I'd actually say the opposite of you and claim that often times a language's type inferencing could be better. It's certainly not perfect in C# (not when compared to Haskell, or even F#), and I'm not even sure if it exists for Java.
  - Chris_Newton 15 years ago
    
    Of course just about every language has a freely available library for working with JSON, and that is one of the main reasons to use JSON in the first place. However, the rendering or parsing in a dynamic language like JavaScript or Python typically takes one line of code. The equivalent in something like Java can be horrible.
    Even if you have a library that can parse JSON according to some known format and give you back a nice object of some known type in your static type system, you still have the problems of how to describe that format and how to handle errors where the incoming JSON doesn't match your expected format in some way. I suppose you could simplify common cases by determining the expected format using reflection if your language supports it, but that's not going to be powerful enough to cope with the general case without providing some sort of metadata as well.
    In short, I'm still waiting to find a library that can parse arbitrary incoming JSON within a statically typed language without at best requiring the programmer to repeat structural information that is already implicit in the code that uses the resulting object. It's just that architectural issue I mentioned before, where converting from a freeform format to a known object type in a static language essentially requires you to do all the parsing and error recovery up-front whether you want to or not. Perhaps someone has come up with a clever approach I haven't yet encountered, but I don't see any sign of it in the documentation I looked up quickly for the libraries you mentioned; they look downright painful to use compared to dynamic languages from the code snippets I saw!
    
    nxn 15 years ago
    
    > However, the rendering or parsing in a dynamic language like JavaScript or Python typically takes one line of code. The equivalent in something like Java can be horrible. [...] you still have the problems of how to describe that format.
    I'm not familiar with Java, but in the scenario I illustrated it took 0 lines of your own code to receive a JSON structure, and one line to return one. That's outside of defining the structure itself (creating the class; what I think you mean by "the problem of describing the format"), but generally you'd want to do the exact same thing in a dynamic language to make working with the structure easier. Example: you still define model classes in a Django app.
    > ... and how to handle errors where the incoming JSON doesn't match your expected format in some way.
    I don't see how you wouldn't have the same exact problem in a dynamic language. You can't just receive input and know magically what to do with it, you need to have the input be in a defined/expected form in order to process it. This to me is outside of dilemma of dynamic vs static because it is a problem in both approaches.
    > I suppose you could simplify common cases by determining the expected format using reflection if your language supports it, but that's not going to be powerful enough to cope with the general case without providing some sort of metadata as well.
    Right, in the scenario I gave the framework looked at the signature of the controller, saw that it expected to receive an object of such and such type, and told the serializer to use the JSON data to create an instance of that type. I'm not sure what more metadata would be needed to make that work? I suppose if you get a parsing error you can just use the exception handler in your client code because the client submitted data in the wrong format, etc.
    > In short, I'm still waiting to find a library that can parse arbitrary incoming JSON within a statically typed language without at best requiring the programmer to repeat structural information that is already implicit in the code that uses the resulting object.
    Defining a class once and saying you expect to receive an instance of it is not "repeating structural information" in my opinion. I can only see it as being "repeated" if you look at the initial JSON data structure as sort of the type itself. I personally look at the role of JSON to be a "data container" and not a "structure descriptor", but if that's the way you like looking at it, well then yeah, dynamic typing is going to be your best bet right now. The closest I can think of is Haskell's type inference which constrains types based on how they're used in the function, but even then the types have to be defined at compile time, and it will not just create one for you that matches what you're trying to do at run time -- it just accepts already defined ones based on whether they meet the constraints gathered from the code.
    
    Chris_Newton 15 years ago
    
    Sorry, I think we're talking across each other slightly here. I'll try to explain again.
    I think the most awkward thing about working with freeform data in a statically typed language is a timing issue. It's not that you don't need to do things like error handling in a dynamic language but you do in a static system; it's that in the dynamic language, you can typically choose when and where to do it, while static typing effectively forces you to do some of the heavy lifting up-front to convert the freeform data into types within your design in the first place.
    You seem to be considering as your main example the serialisation of data between two sides of an app where you maintain both sides. Fair enough, that's one use case for something like JSON. But consider what happens if you want to use it as a simple interchange format so that your code on one side of an HTTP link can communicate with someone else's code on the other side using a well-specified protocol.
    Maybe part of the incoming JSON says
    { "forename": "Chris", "surname": "Newton" }
    but what you really want internally is to look up something from a database using those values as key.
    In a dynamic language, you can typically just pull out the strings when you want them, plug them straight into your database API, and get the result you care about. If the values were missing or invalid, this is going to fail, but maybe it was going to fail anyway if the database didn't contain a matching record and so you've already got all the error recovery code you need in place.
    With a static type system, in contrast, you probably need to parse the JSON into some type within your system as soon as it arrives. You can basically do that in one of two ways. One is to convert the JSON into some sort of general JSONObject/JSONArray/etc. classes, as for example the basic Java JSON library from json.org does. In that case, you retain the structural flexibility, but you also haven't really gained anything from using a static type system because you still have the hassle of manually navigating the resulting tree and doing all your type-checking later, which is a chore. The alternative is to parse the JSON into a more semantically meaningful type right from the start:
    public class Person { String forename; String surname; }
    That's nice, it gets the data into a format we understand, and if the parser can use reflection to figure out what kinds of fields to look for and what types they should be, so much the better. But now I'm stuck with this other class to maintain, tied to the external interface of my code rather than however I model things internally. If I want to handle errors gracefully, such as receiving text data where I expected an integer, I have to specify how to do it at that stage (which is where the metadata issue comes in if you're trying to use reflection to generate your parsers automatically). If I later change my external JSON protocol definition to add another field or (worse) change the structure a bit, I have to reconfigure my corresponding set of classes to match.
    Now, I'm not saying this is necessarily a bad design. As I mentioned before, I think as a general principle it is usually best to validate and convert data as soon as it comes into a system anyway. However, for the kind of rapid prototyping development process that is widespread in web development, that kind of formality can get in the way in the early stages, and I think that is one reason that dynamic languages are popular for this type of work.
    
    koper 15 years ago
    
    I don't want to start (continue?) another static-vs-dynamic-typing war, but I feel like I need to add my 2 cents.
    For me the biggest pragmatic gain from using static typing is that your programs contain less bugs right from the start -- simply because many types of errors are detected by the compiler based on type information. Just to clarify, I'm talking real strong, static typing like in Ocaml, Haskell and the likes and not Java-style typing, which is not very powerful and imposes the cost of adding type information on the developer, whereas in the aforementioned languages the type inference pretty much does away with this problem.
    This having said, I'd suggest taking a look at the new programming language: Opa (http://opalang.org). It complies with most of the requests of the author of the article. It's compiled (#2) and statically typed (#1). It allows easy interfacing with JS, C and Ocaml (#3). First class XML elements: yup (#4); case sensitivity: check (#7) and JS is automatically generated from Opa sources (#8). I happen to be writing a blog about Opa (http://blog.opalang.org) and would be very interested to hear what you think about it.
    
    hbbio 15 years ago
    
    Indeed, Opa is very close to what the original link describes.
njs12345 15 years ago

Have you ever used anything in the ML family of languages (e.g OCaml/Standard ML/Haskell)? You barely ever have to write types in these as long as your program makes sense..
- jhferris3OP 15 years ago
  
  Right on. I'm not sure I'd go so far as wanting the language to be as strict/typesafe as those (also calling Haskell an ML derivative is kind of funny, but I know what you mean)
- true_religion 15 years ago
  
  It'd be better to simply have a JIT with type inference for performance.
  - jhferris3OP 15 years ago
    
    Perhaps? It's unclear that a JIT with type inference will give you better performance than compiling (especially if you were to do profile guided optimization) (sidethought: not sure if llvm has support for PGO).
    Also, if most of your types are static anyway (which in my middling amount of experience, they tend to be), I personally would rather get compile time errors rather than runtime errors. But maybe thats just me.
- shawndumas 15 years ago
  
  F#
jhferris3OP 15 years ago

So, like I said in the post (but perhaps didn't make clear enough) is that there are plenty of options for when scaling isn't too much of an issue. But you take a look at Twitter (Ruby/rails and now scala, I think? which kinda goes towards my point) and facebook (hacked up php and made it compile). My thoughts were focused on a language that you could start something small in (aka easy enough to use), but that could scale up reasonably well.
nxn 15 years ago

> I don't hear a lot of complaints from the Python or Ruby camps about how much they desperately miss static typing.
Perhaps you would if Python or Ruby were used more often for, and I hate to use this term, "Enterprise" type applications -- especially ones that have teams with 50 some developers. They're not though, and they're probably not for the fact that they're dynamically typed and that often causes chaos when there's that many people involved and communicating with them becomes a job in and of itself. The more you have to spell out things in your code the less likely another person will misunderstand its meaning.
Anyway, I personally would still prefer a statically typed language just for the added compile time safety. To me the "save + refresh browser + manually check if change works" process of development is way more tedious than having to work with type constraints that fail to compile if they don't make sense. That and decent type inference really helps in 90% of the cases where static typing seemed tedious to me.

BerislavLopac 15 years ago

What are "Web apps" actually? Each Web app is either a) a bunch of more or less tightly connected marked-up text documents, delivered by an up running on some server somewhere, or b) two applications, one running on a server and one delivered to be executed in the visitors' browsers (or, most often, some combination of the two). There is no single thing called "Web app", even though many vendors have been trying (and failing) to make it look so.

What we need is a better way to use Internet as a platform on a large scale; sadly, none has been widely accepted so far.

ventu 15 years ago

I would suggest Opa: http://opalang.org

thurn 15 years ago

Facebook's XHP is a magnificent tool that you need to use to appreciate. It lets Facebook build a website out of reusable components that know how to load their own data. It's very different from the MVC paradigm, but very light-weight. The XML components are full PHP classes, including allowing for methods, subclassing, etc.

The emphasis on components instead of pages is not strong enough in many other web frameworks like Rails.

thumper 15 years ago

I had the same experience -- I didn't fully grasp how useful XHP was until I had to use it. Not only are they reusable components, but the type system they create and can enforce is a powerful aid to helping me suss out how they were meant to be used. (Now if only XHP::render could detect when it's already been called, to avoid weird validation bugs from the side-effects!)
And when I had to include some multi-line javascript in my code, I found myself feeling a huge loss. First, heredocs seemed to be the only way to make it readable. Second, I'd have to actually run the code and interact with the page to find out if I got the syntax right. It would be awesome if there was a way to make JS into an object in PHP, the way that XHP is done, and have it support some simple sanity checks and easily import JS components (which I suppose Javelin tries to do).
jhferris3OP 15 years ago

Couldn't agree more. Its been the use of XHP and looking at the ... contraption HPHP is that started me thinking about this in the first place.

lucianof 15 years ago

For web apps I would like a language that works both compiled and not compiled. Either I just copy my scripts to the web server when I'm lazy (or developing), or I compile it (and run unit and integration tests and whatever) when I'm done developing. Is there any language/platform that works like that?

tmhedberg 15 years ago

The Snap framework (http://snapframework.com/) for Haskell works this way. Your application (including the web server itself) gets compiled down to a single binary, but during development you can make changes and see them reflected on the fly without manually recompiling. This is my understanding based on an interview with one of the framework's developers; I haven't used it myself for any significant project, though I plan to.
Haskell in general meets a lot (though probably not all) of the author's requirements. Static type inference is great, and Haskell's system makes explicit type declarations completely optional except for in a handful of rare cases, though you will find yourself wanting to use them on most functions anyway because of how they clarify and improve the readability of your code. Testing is also dead simple with tools like QuickCheck--it essentially manufactures test cases for you based on invariants that you specify about your code.
- microtonal 15 years ago
  
  I use Snap on one small project, it is pretty awesome. Snap applications have the terseness and simplicity of, say Sinatra or Rails, but with type checking. As is often the case in Haskell, if your program compiles it is usually correct.
  What I also like about Snap (and Yesod) is that it is integrated well with the enumerator package. Simply said, the enumerator package allows you to implement composable data sources, manipulators, and sinks. Since many web applications consist of extracting data from a source, manipulating, and sending it, it allows you to write applications short and simple.
- lobster_johnson 15 years ago
  
  Looked briefly at Snap, but without something like HAML (http://haml-lang.com/) and SASS (http://sass-lang.com/) it's not something I will use. Not going back to the stone age of writing XML-with-expansions templates ever again.
- lucianof 15 years ago
  
  Thanks for the link, I'll have to check this out some time. For some reason I always find it hard to imagine how I could write a whole app in a functional language. Probably that's because I've never actually done it..

micrypt 15 years ago

Scala ticks all those boxes. http://www.scala-lang.org/

scriptproof 15 years ago

Looks like a description of the Scriptol programming language but for the JavaScript last sentence.

angerman 15 years ago

How close would clojure/clojurescript and a webframework like noir be to what he wants?

St-Clock 15 years ago

clojure is not statically typed.

radious 15 years ago

Sounds like Go with web framework to me.

franksalim 15 years ago

Go differs on points 3,4,6, and 8. That's only 50%. I think it doesn't sound much like Go.

nirvana 15 years ago

I'm dealing with this very question right now, only I'm coming from a different angle. I've already picked the language, but I'm attempting to build a framework in it that makes it work in a very different domain.

What I'm working on is sort of an answer to node.js. It is a coffeescript platform (use js if you prefer) built on top of erlang. So, the coffeescript runs in an erlang environment. This means, when you call into the DB, this spawns off an erlang process. Your collection of coffeescript functions can be executed on any number of cores, or any number of hosts. In fact, your handler for a web request can be spread out all over a cluster, with each function running on the node that has the data... or it can all run on a single node, but across many processes. (to some extent the amount of distribution will be controllable as a configuration parameter-- so if you're doing processing that analyzes big data, you can move your code to the data and run it there, lowering the cluster communications load, but if the data is small, it may make sense to keep handling a request constrained to a single node where everything is conveniently in RAM.)

This is accomplished by compiling the coffeescript in to javascript and running the javascript on a vm, specifically erlang_js, though I'm looking at going with V8 via erlv8. Your code and the libraries are all rendered into a single ball of javascript that we'll call the "application" that is handed off to various nodes.

How do I plan to get sequential code to work in a fundamentally distributed environment? That's the $64,000 question and why I'm bringing this up here-- I could be doing it wrong.

The plan is simple: 1. The developer needs to know that their application is not running in a single environment and account for that. 2. Each entry-point provided by the developer to the platform's API is assumed that it could be running in isolation in a separate process. 3. There's a shared context that all the processes have access to. (an in-RAM Riak database where the bucket is unique to a given request, but the keys are up to the developer.) 4. The APIs let the developer give callback functions which will be called when the data is available. (EG: "Go fetch a list of blog posts" could have a callback that is invoked when the list is returned from Riak. 5. There's a set of known phases that each request goes thru, in a known sequence, and we don't move on to the next phase until the processes spawned by the previous phase are finished. All of the phases are optional, so the developer can implement as many as they want or only a single one. The phases are: init, setup, start, query, evaluate, render, composite, finish functions. The assumption is that you can get your app to work with 9 opportunities to do a bunch of DB queries and get the results. 6. Init will be called when the request comes in. Init can cause any number of processes to be started (DB queries, or map reduce, etc.) They will all be finished, and their callbacks called (if any) before setup is called. Setup can also spin up any number of processes, and so on. All of these are optional and a hello world app might just implement one (it doesn't matter which.)

So, the developer can write in a sequential style, they are called regularly in sequence and know for each phase the previous phase's queries will have data. Each phase can cause more queries, or even spin up other apps, that will be rendered before the next phase. And they get the results from a context that is always available.

This way, init, start ,query and render could all run on different nodes, though they would run in sequence and each one would have access to the shared context for the query.

Another way of looking at this, and the way it might be implemented, is that each of those phases is a long running process that lives on, and is invoked with different contexts each time to handle its part of handling a query. (So this lets us, or the developer, experiment with the right way to arrange things for best resource utilization, since the results can be dramatically different depending on the kind of work the application needs to do.)

That's how I'm running a sequential language in a genuinely distributed manner...you can think in callbacks, or in phases, or both, and your coffeescript really can run in parallel.

A downside of this, though, is that you couldn't write a request handler that, say, generated a random key, did a lookup on the database, and then would loop and do that again until it got a result it liked. You have your 9 phases, and that's it, for a given request. However, there is an API to invoke another application (e.g.: you could have a login application that is responsible for part of your page, so, rather than implement a login/logged-in area on each page, you write it once and include it as a sub-application.) Conceivably you could do recursion but I haven't thought about the consequences of that yet. This does sort of lock you into a specific way of doing things, which is why there are 9 phases, if you only need 3, only implement 3... but if you need all 9 you have them.

I'm sure I've managed to make something that is not so complicated sound muddy... This works for me, since coffeescript is convenient, and it is easy for me to think in terms of erlang concurrency... but it might be an adjustment for js programmers who are used to setting variables and expecting them to be there later on... (you'd just have an API that store the values under a key.)

If you're interested in this project, you can find periodic announcements on twitter @nirvanacore I expect to have an alpha sometime in late September, and a Beta sometime after Riak 1.0 (on which this is based) ships.

Apologies if it seems like I'm hijacking a thread here... obviously my thoughts are about concurrency, but I am differing from the author in assuming json for common data structures, and directly programming in coffeescript/javascript. I'm not too worried about compiled speed- I'm more interested in concurrency than performance. I'd rather add an additional node and have a homogenous server infrastructure and no thinking about server architecture... than try to optimize for single CPU performance, etc.

justincormack 15 years ago

Are you going to be able to pass data between the phases other than through the DB? It doesn't sound like it from your description, but living without closure equivalent would be painful. Maybe some way to add some data that gets message-passed to the next phase?
Sounds interesting.
- nirvana 15 years ago
  
  My "through the DB" solution is not as good as a heap or stack would be, but it's not as bad as it might sound, because the DB lives in memory. If, in a given phase, you have some data, you add it to the context, it will be there in the next phase.
  It would be easy to have an API that is along the lines of "in the next phase, call this function, pass it this data". I could make an API that does that, or you could put the data under a key in the context, and then call that function at the beginning of the next phase. IF the set of functions you'd like to have called that way varies from request to request, they could be stuffed in a list under a key, and you just process each of the functions in that list.
  I think it will be quite possible to provide something equivalent to closures, via an API, though I can't yet say how syntactically convenient they will be, but really not too bad, I don't think.
  On further thought, I think it would be quite possible to do actor style message passing... I'm focusing a bit much on the mechanics of implementation right now, and not making this transparent, but the context could easily be used to manage a set of mailboxes and "processes", where, in each phase, or even between phases, whenever a message is available in a mailbox, the function that it was sent to gets woken up and executed. In fact, not function, but process.
  So, I can add an API that provides an actor model interface. The actors can be identified by a process ID, they can send messages to each other (addressed by PID) and include arbitrary data, and this can happen in concurrently in coffeescript.
  - justincormack 15 years ago
    
    Wouldnt it be cleaner if you send messages to a computation state (this request in a future phase) as an indirection, as the pid might not be allocated yet?
    
    nirvana 15 years ago
    
    I think the pids are getting confused. When I say pid, I mean an id for a combination of a given function and some data, an instance, a fake sort of process that is facilitated by my code invoking the function with the data from its mailbox, whenever there is a message sent to the function by another "process". I'm not talking about erlang processes or "real" processes. So, you wouldn't have the problem of the "pid might not be allocated yet" because you would allocated it.
    example in pseudo coffeerlangscript:
    init-> pidOne = spawn(functionA, argumentlist), pidTwo = spawn(functionA, differentarguments), contextSet("pidOne",pidOne), contextSet("pidTwo",pidTwo), lookupData(bucket, key, pidOne), lookupData(bucket, key, functionB).
    functionA(message) -> doStuff().
    So, the here you're "spawning" two processes. For a function to act like a process it is written such that it takes any messages it get as arguments. I could set up their own contexts too, so "contextSet" in pidOne and pidTwo would be unique namespaces. LookupData, instead of taking a function to invoke, takes a process, and sends a message when it has retrieved the data off of the disk.
    FunctionB could send a message or to pidOne and pidTwo (which it can find in the context).
    So, the init phase is here, and later the start phase will be called. But the thread of execution would be: init, then the database queries happen in parallel, when they are successful, pidOne gets a message and functionB are called (possibly running in different environments.) FunctionB sends a message to pidOne and pidTwo, both of which are invoked with these new messages. When there are no more messages waiting for any of these pseudo processes, and no more database queries or other long running processes running in parallel, then the next phase is called.
    If you're saying there's a better way to do this, my ears are open, I just need a little more explanation.
    
    justincormack 15 years ago
    
    Ah ok by pid I took it to mean a unixy pid or an Erlang mailbox. What you are saying is what I was thinking...

snorkel 15 years ago

static typing isn't worth all of the visual noise it adds to code.

Sandman 15 years ago

Type inference helps with that: http://en.wikipedia.org/wiki/Type_inference

Settings

Thoughts on Designing a new Web Apps Language

Keyboard Shortcuts