Why HTML is a strategic dead end for business transactions and e-commerce (1999)

jimgray.azurewebsites.net

89 points by sun_bear 3 years ago · 62 comments (61 loaded)

Reader

terom 3 years ago

Disregarding the emphasis on XML -> replaced with JSON, isn't this actually mostly spot-on re the popularity of Single-Page Apps (SPA)? There is very little if any HTML in the form of the text-based markup involved in a modern SPA - the actual display logic is all DOM manipulation in JavaScript, operating on separate APIs providing just the raw data in JSON form.

> Truly powerful applications can be built using combinations of JavaScript and XML. Not only can the data and its format (XML) be shipped to another system, but the associated processing logic to validate data entered into the record or form can be shipped along as JavaScript as well.

The main thrust in this article is that scraping HTML display markup is a terrible form of data interchange between systems. Think in terms of modern banking systems that are based on screen-scraping terminal UIs of that era.

lucideer 3 years ago

> isn't this actually mostly spot-on
Only if you accept the initial premise, which is nonsense. The article is based on the idea of HTML as an interchange format, something that's only the case in dysfunctional situations (scraping data illegally, or a horrific breakdown in communication/collaboration between two business entities).
Sure, there was a for a time a big focus on XHTML as a "hybrid" format - interweaving Microformats/RDF & other machine-readable metadata into display documents to give them a dual purpose, but even with that, the primary purpose was always human display, not machine-readability.
HTML isn't designed as, nor primarily intended to be, a machine interchange format. And espousing such hyperbole as "HTML is a strategic dead end" based on it not meeting a use-case for which it was never designed, is harmful.
> The main thrust in this article is that scraping HTML display markup is a terrible form of data interchange between systems.
This is a perfect summary. Would it be nice if more HTML websites were more machine-readable - sure. For me, a hacker, it would make life nicer. But should it be a pre-requisite for business transactions & e-commerce to function - absolutely not.
- HillRat 3 years ago
  
  This article predates XHTML, and was almost certainly not informed by the then-current work on RDF/Dublin Core. At the time, terminal screen-scraping was ... not an uncommon method of data interchange between systems (hence the context of 3270 terminals); using BMS maps was an improvement over the naive approach, basically letting you hook into the screen formatter. The article's point is that these approaches were legacy baggage, and the attempt to "modernize" BMS maps by letting them output HTML in addition to green-screen was doomed to fail. Instead, Duquaine advocates where we landed, with SOAP (and successor) services making data available instead of forcing integrations to go through human-readable display functions. (It's probably worth noting that this is what he focused on at Sybase, specifically his work on their RPC gateway that hooked into legacy mainframe transports such as CICS.)
- Sacho 3 years ago
  
  This article was written during a time where the idea of a semantic web was still bright and strong. Scrapable HTML websites would have been at the forefront of interchange ideas then.
- DonHopkins 3 years ago
  
  He didn't realize that gardens need walls, and HTML is the perfect building block for walls.
- commandlinefan 3 years ago
  
  > the initial premise, which is nonsense.
  The initial premise is: "HTML is a strategic dead end _for business transactions and e-commerce_". That premise is absolutely spot-on.
  - lucideer 3 years ago
    
    > That premise is absolutely spot-on.
    So... if you feel like it, we can split hairs and say that HTML is a dead end as a data interchange format, in the same way that orange juice is a dead end as motor fuel. That's not what's really being discussed here though.
    The pertinent quote in the article is:
    > HTML is ultimately as strategically dead as 3270 is. HTML suffers from the same ultimate fatal weaknesses that doomed 3270
    The implication is that HTML is a dead end in general because it doesn't act as an interchange format, not that it's specifically & narrowly a dead end within that use-case.
    As for "business transactions & ecommerce", that's a vaguer phrase. If you mean using HTML to exchang transaction data between business application APIs, then of course it's not appropriate. If you mean using HTML to provide human interfaces to ecommerce & business transactions, then that's a different debate (not at all touched on by this article).
- naasking 3 years ago
  
  > The article is based on the idea of HTML as an interchange format, something that's only the case in dysfunctional situations (scraping data illegally, or a horrific breakdown in communication/collaboration between two business entities).
  Think again:
  https://schema.org/docs/gs.html
  - lucideer 3 years ago
    
    > Think again
    I mentioned microformats & RDF in my comment...
hodgesrm 3 years ago

> The main thrust in this article is that scraping HTML display markup is a terrible form of data interchange between systems.
Agree. Many of the comments seem to come from reading "Why HTML Is a Strategic Dead End" leaving off "for Business Transactions and E-Commerce." Screen scraping 3270 protocol blocks was a thing for integrating IBM applications (and not a good thing IMO).
I don't think Wayne was really focused on behavior outside of using HTML for integration purposes. This is partly from reading the article. It's also from knowing Wayne Duquaine, the author. We met in the mid-1990s at Sybase, where he made all the IBM system integrations work. Wayne viewed most problems through a mainframe lens. His note seems pretty typical of his point of view.
- lucideer 3 years ago
  
  This adds great context to the article - thanks.
  > I don't think Wayne was really focused on behavior outside of using HTML for integration purposes.
  This makes me more curious about his work now - I'll have to look into it more. It sounds like he was needing to do something he really shouldn't've needed to do, but perhaps I'm (again) missing context.
miohtama 3 years ago

> The main thrust in this article is that scraping HTML display markup is a terrible form of data interchange between systems. Think in terms of modern banking systems that are based on screen-scraping terminal UIs of that era.
As a sidenote this is intentional for banks. They do not want to provide APIs or anything similar to ensure they can lock in their customers to their own platform. This is because competition is toxic for companies that do not invest in R&D. Sometimes user hostile banks do not even provide CSV downloads for your tranasctions.
The situation is so bad that the EU rolled out regulation called PSD2 to address the situation. You need to get access to your bank account via API by law. However, the bureacrats made this extra political and complicated and you need to sign up to a third party company and request the access through this company to your own bank account. The solution, PSD2, has not been the great success that takes down Visa/Mastercard/etc. as its proponents claimed.
hinkley 3 years ago

JSON didn’t exist when this article was written. Crockford did his prototype the following year.
The history of science and industry includes a bunch of people seeing the same set of problems and answering them in their own way. The scene is set, the tools are there, someone just needs to articulate it and offer a solution.
wolfprogramming 3 years ago

SPAs don't solve data exchange problems. Json and xml API's do.
You can have a templated html multipage site that still has a Json or xml api for other UI's that need it like android or ios. SPAs just try to morph html into some kinda more responsive dynamic app at the cost of complexity and initial load times.

DemocracyFTW2 3 years ago

Funny.

Among other things, XML is self describing. Yeah sure. We all know how that turned out. That was the day when everybody was crazy about the Semantic Web, and for some reason XML / some convoluted format built on XML was viewed by many as "semantic", apparently because of the many intractable URLs the format called for. Y'know, URLs, or was it URIs or URNs, those little pieces of text that, even when they point to nothing in particular, somehow have the power to bring "semantics" to the data.

qsort 3 years ago

If you replace XML with JSON and PDAs with phone apps he's spot on.
"Truly powerful applications can be built using combinations of JavaScript and XML"
"XML can be used to mediate between Database row-based output, asynchronous message records, and other types of business forms and data."
"Business-based TXP systems of the future will need to incorporate XML as part of their backbone services technology. The ability to read and understand an XML description of an incoming record or object, and the ability to easily generate XML to describe an outbound record or object, will be a major requirement for system-to-system, business-to-business transaction processing. Intelligent browsers will have XML built in, allowing XML to be used bi-directionally all the way up and down the business hierarchy: from PDAs to PCs to Web/App Servers to Transaction processors."
- coldtea 3 years ago
  
  The difference being that JSON doesn't have the pretentiousness, mental, tooling and processing overhead, and the marketing-imposed-solution-of-a-non-problem XML was sold as...
  - qsort 3 years ago
    
    Oh absolutely. XML and HTML are the epitome of trying to solve a problem that's not very hard, and not solving it very well.
    S-expressions reinvented badly.
    JSON does have the same problem at its core, but at least it's simple and the data structures it offers (dicts and lists) are semi-reasonable primitives.
agumonkey 3 years ago

XML was just a few steps in the wrong direction and it diverted all users energy nowhere.
When I see sexp as serialization format I miss very few, same for json.. but xml wasn't that different. Just too much noise (ns,attributes) and just not enough genericity (no generic list or map) made it useless. Surprising.
- jayd16 3 years ago
  
  Turns out every language has lists and unordered maps. That's pretty much all you need.
  When you start expecting ordered maps or add arbitrary attributes to container types things becomes exotic very quickly.
  Reference could might be ok (some json APIs make it work) but it complicates the parsing.
  And that doesn't even get into the hell of xml that actually executes code.
  Not surprising to me that a smaller standard won out.
zozbot234 3 years ago

The Semantic Web is not dependent on XML in any way. You can use JSON-LD.

gauddasa 3 years ago

This article would have turned true in its prediction about the fate of HTML, and the idolized XML with JS would have met the same doom, except for one piece of technology that saved them all: AJAX.

paganel 3 years ago

> except for one piece of technology that saved them all: AJAX.
We have to thank the MS Office guys for that. Also interesting how RPCs have become cool again, even though under different names, one would have thought that the whole REST thing would have had the better of them.
- pdntspa 3 years ago
  
  The way everyone seems to demand finished API libraries like spoiled children, it would seem we are building RPC right on the back of REST.
  - paganel 3 years ago
    
    > it would seem we are building RPC right on the back of REST
    Yeah, that, most definitely, but as I've stopped following that space some many years ago (I used to be a REST groupie) I didn't know how best to put it.
- DonHopkins 3 years ago
  
  https://en.wikipedia.org/wiki/NeWS
  NeWS was architecturally similar to what is now called AJAX, except that NeWS coherently:
  - used PostScript code instead of JavaScript for programming.
  - used PostScript graphics instead of DHTML and CSS for rendering.
  - used PostScript data instead of XML and JSON for data representation.
  http://www.chilton-computing.org.uk/inf/literature/books/wm/...
  Methodology of Window Management: 29 April 1985: 5. SunDew [nee NeWS] - A Distributed and Extensible Window System: James Gosling
  5.1 INTRODUCTION
  SunDew is a distributed, extensible window system that is currently being developed at SUN. It has arisen out of an effort to step back and examine various window system issues without the usual product development constraints. It should really be viewed as speculative research into the right way to build a window system. We started out by looking at a number of window systems and clients of window systems, and came up with a set of goals. From those goals, and a little bit of inspiration, we came up with a design.
  GOALS
  A clean programmer interface: simple things should be simple to do, and hard things, such as changing the shape of the cursor, should not require taking pliers to the internals of the beast. There should be a smooth slope from what is needed to do easy things, up to what is needed to do hard things. This implies a conceptual organization of coordinated, independent components that can be layered. This also enables being able to improve or replace various parts of the system with minimal impact on the other components or clients.
  Similarly, the program interface probably should be procedural, rather than simply exposing a data structure that the client then interrogates or modifies. This is important for portability, as well as hiding implementation details, thereby making it easier for subsequent changes or enhancements not to render existing code incompatible.
  Retained windows: a clean programmer interface should completely hide window damage from the programmer. His model of a window should be just that it is a surface on which he can write, and that it persists. All overlap issues should be completely hidden from the client. I believe that the amount of extra storage required to maintain the hidden bitmaps on a black and white display is negligible - based on the observation that people generally do not stack windows very deeply. The situation is somewhat different with colour, but there are games to be played. Retained windows is one way of hiding window damage, but we do not want to commit to a particular solution to this problem at this time.
  Flexibility: users need to attach devices, change menu behaviours and generally modify almost all components of the system. For example, the menu package ought to be independent of the particular format or contents of the menu, thereby allowing the user to develop his own idioms without having to reimplement the entire system. [This paragraph is what caught my attention and excited me during the time I was experimenting with pie menus on X10, because of SunDew's/NeWS's ability to redefine all menus in the system to use pie menus. See: "Just the Pie Menus from All the Widgets" -DonHopkins]
  https://www.youtube.com/watch?v=mOLS9I_tdKE
  Part of flexibility is device independence: SUN provides a spectrum of display devices to which clients need consistent and transparent interfaces. This leads directly to portability, which we also need to achieve.
  Users should be able to make various tradeoffs differently than in the standard system, because of either particular hardware or performance requirements. For example, if the system provides retained windows because we believe that the cost in terms of memory usage is worth the performance improvements, a user should be able to make this tradeoff differently, for example if he has less memory.
  This extreme flexibility might appear to be at odds with having a clean, simple, well-abstracted programmer interface, but we do not believe that it is.
  Remote access to windows: in the kind of distributed networked environment that SUN promotes, it is natural to want to be able to access windows on another machine as naturally as the NFS promises to support accessing remote files. We believe that this will fall out of any reasonably designed system.
  Powerful graphical primitives: the primitives that the Macintosh provides should be considered as a lower bound. Curves and colour need to be well integrated. Attention should also be paid to what CGI [30], GKS [28], CORE [24] and PHIGS [6] need. A consequence of an emphasis on power and flexibility is the ability to emulate other window systems, eg it would be very valuable to be able to provide an emulation of the Macintosh toolbox.
  Exploit the hardware: in particular, none of the systems mentioned above deal well with colour. In the future, colour is going to play an even larger part in display design. One can view black and white as a temporary technological stopgap, just as happened with television. Besides, SUN makes some pretty good colour displays, so the window system should exploit them. One implication of this is that the font file format must completely hide the details of the representation of characters, since we might eventually want to support antialiased text, and even illuminated monastic typefaces.
  Perform well: the performance of the current window system should be considered as the minimum acceptable level. Performance in the common cases is especially critical. The new system should perform faster than the current system on such common operations as repainting and scrolling of text.
  DESIGN SKETCH
  The work on a language called PostScript by John Warnock and Charles Geschke at Adobe Systems provided a key inspiration for a path to a solution that meets these goals. PostScript is a Forth-like language, but has data types such as integers, reals, canvases, dictionaries and arrays.
  Inter process communication is usually accomplished by sending messages from one process to another via some communication medium. They usually contain a stream of commands and parameters. One can view these streams of commands as a program in a very simple language. What happens if this simple language is extended to being Turing-equivalent? Now, programs do not communicate by sending messages back and forth, they communicate by sending programs which are elaborated by the receiver. This has interesting implications on data compression, performance and flexibility.
  What Warnock and Geschke were trying to do was communicate with a printer. They transmit programs in the PostScript language to the printer which are elaborated by a processor in the printer, and this elaboration causes an image to appear on the page. The ability to define a function allows the extension and alteration of the capabilities of the printer.
  This idea has very powerful implications within the context of window systems: it provides a graceful way to make the system much more flexible, and it provides some interesting solutions to performance and synchronization problems. SunDew contains a complete implementation of PostScript. The messages that client programs send to SunDew are really PostScript programs.
  Two pieces of work were done at SUN which provide other key components of the solution to the imaging problems. One is Vaughan Pratt's Conix, a package for quickly manipulating curve bounded regions, and the other is Craig Taylor's Pixscene, a package for performing graphics operations in overlapped layers of bitmaps.
  Out of these goals and pieces grew a design, which will be sketched here. The window system is considered in four parts. The imaging model, window management, user interaction, and client interaction. The imaging model refers to the capabilities of the graphic system - the manipulation of the contents of a window. Window management refers to the manipulation of windows as objects themselves. User interaction refers to the way a user at a workstation will interact with the window system: how keystrokes and mouse actions will be handled. Client interaction refers to the way in which clients (programs) will interact with the window system: how programs make requests to the window system.
  What is usually thought of as the user interface of the window system is explicitly outside the design of the window system. User interface includes such things as how menu title bars are drawn and what the desktop background looks like and whether or not the user can stretch a window by clicking the left button in the upper right hand corner of the window outline. All these issues are addressed by implementing appropriate procedures in the PostScript.

rado 3 years ago

HTML is one of my favourite technologies. Functional, resilient, progressively enhanced.

capableweb 3 years ago

You really like HTML for system-to-system communication? Or we read two different blog posts?
- rado 3 years ago
  
  Sorry, my mistake
sph 3 years ago

My prediction: HTML will die way before HTTP.
We will replace that language eventually, HTTP is here to stay.
- Rexxar 3 years ago
  
  Which version of http ?
  - yurishimo 3 years ago
    
    Does it really matter? It will likely remain backwards compatible forever.
- happosai 3 years ago
  
  HTTP is kinda dead already, with browsers insisting on HTTPS.
  - sph 3 years ago
    
    HTTPS is HTTP wrapped in TLS. The protocol is exactly the same, but with another layer on top.
  - forgotmypw17 3 years ago
    
    HTTP is still the best anywhere I want simplicity and resilience.
    HTTPS has way too many failure modes for me to use it anywhere where accessibility and dependability are the priority.

cesaref 3 years ago

All this talk of XML has given me some bad XSLT flashbacks, not to mention XSL:FO.

smrtinsert 3 years ago

Nonsense! XSLT was cool, maybe not that useful ultimately but very cool to use. I remember a few xsls I wrote tidying a very complex piece of xml cleanly whereas the corresponding java or javascript would have looked absolutely awful.
- commandlinefan 3 years ago
  
  > a few xsls I wrote tidying a very complex piece of xml cleanly
  Me too! But then I saw other people's XSL's... yes, XSL should be buried or highly restricted to a very few people.
coldcode 3 years ago

Having done some XSLT coding I agree; it's a language that was both unreadable and unwritable.
- foobarian 3 years ago
  
  I liked to call it steganographic programming. It was near impossible to spot the business logic among all the markup and content.
DonHopkins 3 years ago

https://news.ycombinator.com/item?id=16226209
[...]
It may be 15 years old, but I'd joyfully use Genshi again to transform XML to XML, long before I ever considered coming within a mile of XSLT (yech).
My impression of XSLT is that there were representatives from every different programming language paradigm on the XSLT standard committee, and each one of them was able to get just enough of what was special about their own paradigm into the standard to showcase it while sabotaging the others and making them all look foolish, but not enough to actually get any work done or lord forbid synergistically dovetail together into a unified whole.
The only way I was ever able to get anything done with XLST was to use Microsoft's script extensions to drop down into JavaScript and just solve the problem with a few lines of code. And that begs the question of why am I not just solving this problem with a few lines of JavaScript code instead of inviting XSLT to the party?
[...]
https://news.ycombinator.com/item?id=16226209
[...]
When you try to design something from the start without a scripting language, like a hypermedia browser or authoring tool, or even a window system or user interface toolkit, you end up getting fucked by Greenspun's Tenth Rule [6]
[6] Greenspun's Tenth Rule: Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp. https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule
But when you start from day one with a scripting language, you can relegate all the flexible scripty stuff to that language, and don't have to implement a bunch of incoherent lobotomized almost-but-not-quite-turing-complete kludgy mechanisms (like using X Resources for event handler bindings and state machines, or the abomination that is XSLT, etc).
TCL/Tk really hit the nail on the head in that respect. TCL isn't a great language design (although it does have its virtues: clean simple C API, excellent for string processing, and a well written implementation of a mediocre language design), but its ubiquitous presence made the design of the Tk user interface toolkit MUCH simpler yet MUCH more extensible, by orders of magnitude compared to all existing X11 toolkits of the time, since it can just seamlessly call back into TCL with strings as event handlers and data, and there is no need for any of the ridiculous useless brittle contraptions that the X Toolkit Intrinsics tried to provide.
[... tannhaeuser replied:]
> the abomination that is XSLT
Not trying to defend XSLT (which I find to be a mixed bag), but you're aware that it's precursor was DSSSL (Scheme), with pretty much a one-to-one correspondence of language constructs and symbol names, aren't you?
[... DonHopkins replied:]
In the ideal world we would all be using s-expressions and Lisp, but now XML and JSON fill the need of language-independent data formats.
>Not trying to defend XSLT (which I find to be a mixed bag), but you're aware that it's precursor was DSSSL (Scheme), with pretty much a one-to-one correspondence of language constructs and symbol names, aren't you?
The mighty programmer James Clark wrote the de-facto reference SGML parser and DSSSL implementation, was technical lead of the XML working group, and also helped design and implement XSLT and XPath (not to mention expat, Trex / RELAX NG, etc)! It was totally flexible and incredibly powerful, but massively complicated, and you had to know scheme, which blew a lot of people's minds. But the major factor that killed SGML and DSSSL was the emergence of HTML, XML and XSLT, which were orders of magnitude simpler.
James Clark: http://www.jclark.com/ https://en.wikipedia.org/wiki/James_Clark_(programmer)
There's a wonderful DDJ interview with James Clark called "A Triumph of Simplicity: James Clark on Markup Languages and XML" where he explains how a standard has failed if everyone just uses the reference implementation, because the point of a standard is to be crisp and simple enough that many different implementations can interoperate perfectly.
A Triumph of Simplicity: James Clark on Markup Languages and XML: http://www.drdobbs.com/a-triumph-of-simplicity-james-clark-o...
I think it's safe to say that SGML and DSSSL fell short of that sought-after simplicity, and XML and XSLT were the answer to that.
"The standard has to be sufficiently simple that it makes sense to have multiple implementations." -James Clark
My (completely imaginary) impression of the XSLT committee is that there must have been representatives of several different programming languages (Lisp, Prolog, C++, RPG, Brainfuck, etc) sitting around the conference table facing off with each other, and each managed to get a caricature of their language's cliche cool programming technique hammered into XSLT, but without the other context and support it needed to actually be useful. So nobody was happy!
Then Microsoft came out with MSXML, with an XSL processor that let you include <script> tags in your XSLT documents to do all kinds of magic stuff by dynamically accessing the DOM and performing arbitrary computation (in VBScript, JavaScript, C#, or any IScriptingEngine compatible language). Once you hit a wall with XSLT you could drop down to JavaScript and actually get some work done. But after you got used to manipulating the DOM in JavaScript with XPath, you being to wonder what you ever needed XSLT for in the first place, and why you don't just write a nice flexible XML transformation library in JavaScript, and forget about XSLT.
XSLT Stylesheet Scripting Using <msxsl:script>: https://docs.microsoft.com/en-us/dotnet/standard/data/xml/xs...
Excerpts from the DDJ interview (it's fascinating -- read the whole thing!):
>DDJ: You're well known for writing very good reference implementations for SGML and XML Standards. How important is it for these reference implementations to be good implementations as opposed to just something that works?
>JC: Having a reference implementation that's too good can actually be a negative in some ways.
>DDJ: Why is that?
>JC: Well, because it discourages other people from implementing it. If you've got a standard, and you have only one real implementation, then you might as well not have bothered having a standard. You could have just defined the language by its implementation. The point of standards is that you can have multiple implementations, and they can all interoperate.
>You want to make the standard sufficiently easy to implement so that it's not so much work to do an implementation that people are discouraged by the presence of a good reference implementation from doing their own implementation.
>DDJ: Is that necessarily a bad thing? If you have a single implementation that's good enough so that other people don't feel like they have to write another implementation, don't you achieve what you want with a standard in that all implementations — in this case, there's only one of them — work the same?
>JC: For any standard that's really useful, there are different kinds of usage scenarios and different classes of users, and you can't have one implementation that fits all. Take SGML, for example. Sometimes you want a really heavy-weight implementation that does validation and provides lots of information about a document. Sometimes you'd like a much lighter weight implementation that just runs as fast as possible, doesn't validate, and doesn't provide much information about a document apart from elements and attributes and data. But because it's so much work to write an SGML parser, you end up having one SGML parser that supports everything needed for a huge variety of applications, which makes it a lot more complicated. It would be much nicer if you had one SGML parser that is perfect for this application, and another SGML parser that is perfect for this other application. To make that possible, the standard has to be sufficiently simple that it makes sense to have multiple implementations.
>DDJ: Is there any markup software out there that you like to use and that you haven't written yourself?
>JC: The software I probably use most often that I haven't written myself is Microsoft's XML parser and XSLT implementation. Their current version does a pretty credible job of doing both XML and XSLT. It's remarkable, really. If you said, back when I was doing SGML and DSSSL, that one day, you'd find as a standard part of Windows this DLL that did pretty much the same thing as SGML and DSSSL, I'd think you were dreaming. That's one thing I feel very happy about, that this formerly niche thing is now available to everybody.
[... tannhaeuser replied:]
> But the major factor that killed SGML and DSSSL was the emergence of HTML, XML and XSLT, which were orders of magnitude simpler. That interview is wonderful, but in 2018, while XML has been successful in lots of fields, it has failed on the Web. SGML remains the only standardized and broadly applicable technique to parse HTML (short of ad-hoc HTML parser libraries) [1]. HTML isn't really simple; it requires full SGML tag inference (as in, you can leave out many tags, and HTML or SGML will infer their presence), SGML attribute minimization (as in `<option selected>`) and other forms of minimization only possible in the presence of a DTD (eg. declarations for the markup to parse).
> JC: [...] But because it's so much work to write an SGML parser, you end up having one SGML parser that supports everything needed for a huge variety of applications.*
Well, I've got news: there's a new implementation of SGML (mine) at [2].
> But after you got used to manipulating the DOM in JavaScript with XPath, you being to wonder what you ever needed XSLT for in the first place, and why you don't just write a nice flexible XML transformation library in JavaScript, and forget about XSLT
My thoughts exactly. Though I've done pretty complicated XSLTs (and occasionally am doing still), JavaScript was designed for DOM manipulation, and given XSLT is Turing-complete anyway, there's not that much benefit in using it over JavaScript except for XML literals and if we're being generous, maybe as a target language for code generation, it being itself based on XML. Ironically, the newest Web frameworks all have invented their own HTML-in-JavaScript notation, eg. react's JSX to drive virtual DOM creation, even though JavaScript started from day one with the principle design goal of a DOM manipulation language.
> My (completely imaginary) impression of the XSLT committee is that there must have been representatives of several different programming languages (Lisp, Prolog, C++, RPG, Brainfuck, etc) sitting around the conference table facing off with each other, and each managed to get a caricature of their language's cliche cool programming technique hammered into XSLT
+1. Though to be fair, XSLT has worked well for the things I did with it, and version 1 at least is very portable. These days XSLT at W3C seems more like a one man show where Michael Kay is both the language specification lead, as well as providing the only implementation (I'm wondering what has happened to W3C's stance on at least two interoperable implementations). The user audience (publishing houses, mostly), however, seem ok with it, as I witnessed at a conference last year; and there's no doubt Michael really provides tons of benefit to the community.
[1]: http://sgmljs.net/blog/blog1701.html
[2]: http://sgmljs.net/docs/sgmlrefman.html

WesSouza 3 years ago

One would never use HTML to transfer data between two systems*, this author really missed the Forest for the trees.

*unless of course one system is scraping the other and has no control of what it outputs.

zulu-inuoe 3 years ago

You have to put it in context, though. This was back in 1999. Plenty of people even today push for semantic HTML. HTML is said to represent a document, and documents have structure. So I'm sure it wasn't all that uncommon to take it to mean that this structure was machine readable.
coldtea 3 years ago

>One would never use HTML to transfer data between two systems
And yet it happened all the time back in the day, and there was even a big push to get into XML-izing html to xhtml and make it stricter and parsable and "semantic".

m_mueller 3 years ago

I can agree with this statement: "HTML provides a pretty face, but a lousy system-to-system transaction environment."

Imagine if we had something like JSON, but with a more standardized schema to e.g. describe something simple like a table of data. There is so much glue code being written that does nothing else than reformatting data to something an organization can actually use. Think what it would mean if every table on the web would be fully adressable with an URI, such that you can directly load it into whatever you want - pandas dataframes, tableau, a C++ vector, Excel, your next fancy executive board PowerPoint chart, whatever. Make it n-dimensional from the start and give the shape plus each column a type definition. I guess one can dream...

sph 3 years ago

There is a lot of glue code written because we are only transporting data through HTTP, that is the problem.
A concept that has been buzzing around my head a lot lately is, what if we could model objects/actors? So that I do not only go to google.com to display some HTML, but we have a standardised RPC language I can tell it to "search this query" and it returns a structured object? The same RPC I can use to talk to my Hue lamp and tell it "turn red."
In fact, our current HTML model can nicely map to a "render a thing to HTML" method call.
We spend too much time building complex systems by either scraping HTML, or gluing together incompatible APIs from vendors.
I want the Internet to be like the Erlang virtual machine. Each server is an independent actor that holds state, can send and receive messages, but they are not very trusted.
- m_mueller 3 years ago
  
  > I want the Internet to be like the Erlang virtual machine. Each server is an independent actor that holds state, can send and receive messages, but they are not very trusted.
  you sound quite a bit like Alan Kay there (and that's not a bad thing IMO).
  - sph 3 years ago
    
    I know, my extended idea is borne out of Alan Kay and his vision. It is very nebulous at this stage to expand further, but I do strongly agree with him now that "the computer revolution hasn't happened yet."
- wruza 3 years ago
  
  Are you trusted? The reason g-search and others don’t have an API is not a technical one. They won’t let you neither JSON nor HTML without captcha module injected into your browser because this is the way they earn money.
  - sph 3 years ago
    
    So? That can still be modelled in a RPC manner.
    Instead of sending a "query <string>" command, I have to do:
    -> get-captcha <- returns a captcha object -> solve-captcha <id> <solution> <- returns a token string -> query <token> <string> <- returns a list of results
    Very simplified. If HTTP is the transport, the current Authentication header is very good at encapsulating these details without having to repeat them every command.
    Point is, HTTP is still too low level and we're paying people $150k a year to write glue code and reimplement API clients until they quit and do the same thing at another company.
    
    Linosaurus 3 years ago
    
    Could work for the traditional image check, but not at all for the user behavior analysis magic they do. Which seems to take over.
- samsquire 3 years ago
  
  I agree with you.
  The problems that computers are trying to solve are partly the expression problem.
  Given this state, do this calculation.
  Object orientation for me means creating relationships between arbitrary groups of objects and sending messages between objects to do things.
blacksmithgu 3 years ago

CSV, for all it's faults, is probably the closest thing we have to a "universal 2d table format you can import anywhere."
jayd16 3 years ago

What would you want that's significantly better than an n-dimensional json array? A sparse table format?
You'll still need to handle the core cell-type parsing. You'll still need to deal with what level of normalization the table used ie are cells primitives or objects or different objects with conflicting structures.
- m_mueller 3 years ago
  
  an n-dimensional json array still doesn't come with strong type information (other than str/null/decimal/int) for the fields and it doesn't give you a guaranteed shape either. also, there is no standard how to declare the field header labels. also, it's not binary, thus very inefficient for numerical data. there are various data formats that help in those regards, but none of it is standardized, hence all our fingers being sticky from the glue.

gadders 3 years ago

The web will never catch on. It's just the CB Radio of the 90's.

(I actually remember people saying this.)

college_physics 3 years ago

From the iXBRL site [0]: "iXBRL is used by millions of companies around the world to prepare financial statements in a format that provides the structured data that regulators and analysts require, whilst allowing preparers to retain full control over the layout and presentation of their report. iXBRL takes the HTML standard that is used to power the world’s web pages, and embeds extra “tags” into it that give meaning to the figures and statements in a format that can be understood by a computer."

[0]: https://www.xbrl.org/the-standard/what/ixbrl/

gerikson 3 years ago

Googling "Grandview Systems" now leads to a system for handling Masonic lodges.

mlatu 3 years ago

the only dataexchange format i can respect is clojures extensible data notation.

fight me.

ildon 3 years ago

This aged well

Settings

Why HTML is a strategic dead end for business transactions and e-commerce (1999)

Keyboard Shortcuts