Fixed-Point HTML

61 points by tahoupt 3 years ago · 32 comments

Reader

I’m surprised to see the highlights don’t include another common detail of the parsing algorithm that often trips people up: table rows and cells (tr/th/td) must be in one of thead/tbody/tfoot. If they’re not, they’re implicitly nested into a tbody. As in:

  <table>
    <!-- <tbody> -->
      <tr>
        <th>Column one</th>
        <th>Column two</th>
      </th>
      <tr>
        <td>Row one col one</td>
        <td>Row one col two</td>
      </th>
    <!-- </tbody> -->
  </table>

I’ve frequently seen it cause a variety of issues with VDOM libraries, and even plain DOM libraries with a notion of declarative templates, ranging from hydration mismatch logs (meh) to actual logic errors (corruption of the real DOM when nodes aren’t where they’re expected to be).

Other implied/omitted tags like body can cause similar issues too, but I think that’s become a far less common “mistake” (all of these are totally valid since at least HTML5) in recent years.

samwillis 3 years ago

Annother interesting table one, tr/td/th outside of a <table> will never appear in the DOM. You can make up your own tags and they appear anywhere, but those three are magic and can only exist inside a table.
Forms are also weird, if you leave off the closing tag, an implicit one is included in the DOM. However, if you have inputs further down the page, and technically outside the form, they are included in the submitted form data.
(Don’t ask how I discovered that…)
- Doxin 3 years ago
  
  Also fun stuff like you can't have a form inside a form, but if you stick a form inside a form inside a form you end up with a form in a form in the DOM anyways back when I ran into this last.

skybrian 3 years ago

Perhaps a more intuitive name would be "round-trip serialization HTML". That is, if you use the browser to parse and print some HTML, it matches the source code.

Or in other words, it's formatted the same way that the browser would do it. So, you use the browser to pretty-print the HTML page, and save the code as the source. It's not hard at all and could be done automatically.

Round-trip tests are often used to check that a deserialization routine outputs data that can be serialized again and no data is lost. It even lets you change the serialization format, provided that you change the parser and printer to match.

I expect that these sort of tests are a lot more useful with fuzzing, though. Finding one example that works mostly just tells you that the browser's HTML printing code isn't completely broken. A single test of that sort is only useful for catching stupid bugs quickly.

kazinator 3 years ago

This is called print-read consistency in the Lisp world: an object is printed in such a way that the syntax can be read to produce a similar object, or else is given a deliberately unreadable notation like #<...>, where the #< combination is required to produce a read error.

https://stackoverflow.com/questions/70797208/what-is-print-r...

gavinray 3 years ago

Thanks for this
I've had this notion in my head, of making variables capable of echoing out their own definition when printed for easier time writing tests/debugging
Didn't know it had a name
- nerdponx 3 years ago
  
  In Python, there is a distinction between the text representation of an object, and the result of converting the object to a string. Classes can implement both methods independently, and it's not uncommon to have a repr method that returns something that you could (at least in theory) evaluate as literal Python code. This is very useful for debugging and logging, although not nearly as cool or powerful as the Lisp equivalent.

PaulStatezny 3 years ago

> Why write Fixed-Point HTML?

> simply the satisfaction of knowing that you and the browser are in total agreement

So, just to clarify: there's no technical benefit, correct?

nyanpasu64 3 years ago

My favorite example of the technical failings of HTML: https://research.securitum.com/mutation-xss-via-mathml-mutat... is a HTML sanitizing vulnerability that came about because some HTML not only doesn't survive a parse-stringify cycle, but the generated DOM tree does not survive a stringify-parse cycle!
eyelidlessness 3 years ago

There actually may be! Depending on what you’re trying to do and what’s inconsistent between your markup and the actual DOM. As noted in my earlier comment, implicit insertion/wrapping of certain elements can cause structural changes which lead to actual code errors or unexpected behavior.
- kevincox 3 years ago
  
  CSS errors are also common here. Especially with the child selector.

tomxor 3 years ago

> the real reason to code in Fixed-Point HTML is simply the satisfaction of knowing that you and the browser are in total agreement about the HTML.

Interesting idea, I've been trying to achieve something similar but in reverse... rather than make my source match the browser, make the browser match my source by making it not ignore spacing.

i.e The basics being `white-space: pre;` on the body element, and fixed width and sized fonts. But I still want a HTML document so i can opt in to html where it matters. My reasons are to A) avoid a pre-processor and build toolchain complexity, stick to nice simple static files, and B) I get something similar to WYSIWYG but as source code. C) I like fixed width fonts and to plain text formatting (reducing decisions is helpful for focus).

tfsh 3 years ago

Before now I've explicitly reduced the size of my HTML docs (nothing critical/production facing, all passion projects) by removing certain HTML tags (e.g DOCTYPE, closing tags, etc) because I know modern browsers will still render them correctly.

This means there are miniscule savings from a bandwidth serving perspective. I wonder what the trade off is between the HTTP call and document parse/paint.

E.g is it correct to assume the browser will parse/paint the HTML content - fixing incorrectly closed tags on the fly faster than the few milliseconds more it would take to serve fixed-point HTML from the server?

pushedx 3 years ago

Interesting concept.

On latest Chrome, the "Check Fixed-Point" button appears to fail.

chuckhoupt 3 years ago

Thanks [I'm the author]. I tested with Chrome 105 on macOS and it succeeded. Possibly there are OS/plugin/etc issues?
Of course, I know there is no guarantee that every browsers innerHTML implementation will produce exactly the same result, but so far I haven't found any variation (Chrome, FF, Safari, Edge).
- hrunt 3 years ago
  
  In Firefox 105.0.1 on MacOS, the button also always fails when I click it.
  EDIT: In my case, it appears to be some extra "<div style=\"position: static !important;\"></div>" text added before the closing </body> tag. I suspect this is introduced by a plugin, probably LastPass.
- Tijdreiziger 3 years ago
  
  Fails on Safari on iPadOS 15.6.1 too.
  edit: another commenter says ad blockers are the culprit.
- thayne 3 years ago
  
  It also fails on Firefox 104 on Linux.
Georgelemental 3 years ago

It might be a browser extension; on my Firefox install I have browser extensions that add HTML to every webpage, making it fail.
yazzku 3 years ago

Html wouldn't be a Web standard if it were consistent across browsers.

WirelessGigabit 3 years ago

XML-flavored self-closing elements are banished (use <br> instead of <br />)

God I hate that. It just doesn’t make sense. Where is the <br> closed?

alexaholic 3 years ago

br is an empty tag and empty tags are self-closing
- WirelessGigabit 3 years ago
  
  Doesn’t make sense. What’s wrong with <br />? It’s a hell of a lot easier to parse than having an exception for <br> which is then transformed in <br></br>.

bhedgeoser 3 years ago

Fixed-point check failed on chrome 104.0.5112.101

MatmaRex 3 years ago

It's because you have an ad-blocker of some sort enabled. They inject stuff into pages.
- tromp 3 years ago
  
  That would explain why it fails on my Brave browser...

exabrial 3 years ago

Basically: xhtml is fast and verifiable?

mirekrusin 3 years ago

They say not to use <br /> but <br> instead.
- exabrial 3 years ago
  
  Correct, that surprised me

Settings

Fixed-Point HTML

Keyboard Shortcuts