Settings

Theme

Write HTML Right

lofi.limo

263 points by aparks517 4 years ago · 211 comments (209 loaded)

Reader

shakna 4 years ago

Whilst the spec certainly allows you to ignore closing of a whole range of elements, it's not necessarily the wisest of choices to make. The parser does actually get slower when you fail to close your tags in my experience.

Unscientific stats from a recent project where I noticed it:

+ Document is about 50,000 words in size. About 150 words to a paragraph element, on average.

+ Converting the entire thing to self-closing p elements added an overhead of about 120ms in Firefox on Linux, before initial render.

+ Converting the entire thing to self-closing p elements added an overhead of about 480ms in Chrome on Linux, before initial render.

+ Converting the entire thing to self-closing p elements added an overhead of about 400ms in Firefox on Android, before initial render.

+ Converting the entire thing to self-closing p elements added an overhead of about 560ms in Chrome on Android, before initial render.

+ The time differences appeared to be linearly increasing, as the document grew from 20,000 to 50,000 words.

+ Curiously, Quirks Mode also increased the load times by about 250ms on Firefox and 150ms on Chrome. (Tried it just because I was surprised at the massive overhead of removing/adding the tag endings.)

The most common place this was going to be opened was Chrome on Android, and a whopping half-second slower to first render is going to be noticeable to the end user. For some prettier mark up.

Whilst you can debate whether that increased latency actually affects the user, a decreased latency will always make people smile more. So including the end tags is a no-brainer. Feel free to write it without them - but you _might_ consider whether your target is appropriate for you to generate them before you serve up the content.

  • niconii 4 years ago

    I can't verify your numbers. As far as I can tell, loading a ~900,000 word document with no other differences than including or excluding </p> has about the same load time, though there's too much variance from load to load for me to really give definitive numbers.

    Are you sure you converted it properly? I'd expect those kinds of numbers if your elements were very deeply nested by mistake (e.g. omitting tags where it's not valid to do so), but I don't see why leaving out </p> should be so slow.

    Try these two pages:

    https://niconii.github.io/lorem-unclosed.html

    https://niconii.github.io/lorem-closed.html

    • shakna 4 years ago

      For five runs, on the same hardware with the same load:

      + Unclosed: 4.00s, 3.91s, 3.59s, 4.45s, 3.93s

      + Closed: 3.90s, 2.74s, 3.9s, 2.05s, 3.39s

      Though I'd note that the newline you have immediately following the paragraph, even when closing, would probably reduce the backtracking effect. And having no explicit body or head element would probably cause some different rendering patterns as well.

      • paulirish 4 years ago

        I don't know what you're measuring (onload?), but it's not giving you enough precision to make a conclusion about the performance of the HTML parser. If you profile the page w/ devtools Performance panel, you'll see that just 5% of the CPU cost used to load & render the page is spent parsing the HTML. At that level I'm seeing costs of 22-36ms per load.

        And, spoiler alert: after repeated runs I'm not seeing any substantial difference between these test pages. And based on how the HTML parser works, I wouldn't expect it.

        (I work on web performance on the Chrome team)

      • niconii 4 years ago

        Were the five unclosed runs before the five closed runs? I could see that making a difference vs. interleaving them, if the hardware needs to "warm up" first.

        For me, on Firefox on Linux (I know it's the one with the smallest difference, but I don't have the others on hand, sorry), using the "load" time at the bottom of the Network tab, with cache disabled and refreshing with Ctrl+F5, interleaving the tests:

        - Unclosed: 1.38s, 1.49s, 1.45s, 1.52s, 1.48s

        - Closed: 1.47s, 1.37s, 1.48s, 1.49s, 1.35s

        The one with </p> omitted takes about 0.032s longer on average going by these numbers, but that's about 2 frames of extra latency for a page almost twice the length of The Lord of the Rings.

        Regarding the page itself, I tried to keep everything else as identical between the two versions as possible, including the DOM, hence why I wrote the </p> immediately before each <p>. As for backtracking, I'm not sure what you mean. The rule for the parser is simply "If the start tag is one from this list, and there's an open <p> element on the stack, close the <p> element before handling the start tag."

  • myfonj 4 years ago

    Well this sounds like really interesting observation. May I ask where exactly were the original closing tags located and how the stripped source looked like? I can imagine there _might_ be some differences among differently formatted code: e.g. I'd expect

        <p>Content<p>Content[EOF fig1]
    
    to be (slightly) slower, than

        <p>Content</p><p>Content</p>[EOF fig2]
    
    (most likely because of some "backtracking" when hitting `<p[>]`), or

        <p>Content</p>
        <p>Content</p>[EOF fig3]
    
    (with that that small insignificant `\n` text node between paragraph nodes), what should be possibly faster than "the worst scenarios":

        <p>Content
        <p>Content[EOF fig4a]
    
    or even

        <p>
        Content
        <p>
        Content
        [EOF fig4b]
    
    with paragraph text nodes `["Content\n","Content]"` / `["\nContent\n","\nContent\n]"`, where the "\n" must be also preserved in the DOM but due white-space collapsing rules not present in the render tree (if not overridden by some non-default CSS) but still with backtracking, that

        <p>Content
        </p>
        <p>Content
        </p>[EOF fig5]
    
    should eliminate (again, similarly to fig2 vs fig1).

    (Sorry for wildly biased guesswork, worthless without measurements.)

    • shakna 4 years ago

      It was just paragraphs of text. p, strong, em, and q mingled at most. No figures or images or anything of the like to radically shift DOM computations. That the effect can even be seen is probably due to the scale of the document, as I noted it's a little larger than most things.

      All paragraphs had a blank line between them, both with and without the p end tag. The p opening tag was always at the top-left, with no gap between it and the content.

      So, for example:

          <p>Cheats open the doorway for casual play. They make it easier for disabled players to enjoy the same things as their peers, and allow people to skip parts of a game that <em>they bought</em> that they find too difficult.</p>
      
          <p>Unfortunately, cheats are going away, because of extensive online play, and a more corporate approach to developing games that despises anything hidden.</p>
      
      Versus:

          <p>Cheats open the doorway for casual play. They make it easier for disabled players to enjoy the same things as their peers, and allow people to skip parts of a game that <em>they bought</em> that they find too difficult.
      
          <p>Unfortunately, cheats are going away, because of extensive online play, and a more corporate approach to developing games that despises anything hidden.
      
      (You can also discount CSS from having a major effect. Less than a hundred lines of styles, where most rules are no more complicated than: `p { font-family: sans-serif; }`. No whitespace rules.)

      However, if you wanted to look at this in a more scientific way - it should be entirely possible to generate test cases fairly easily, given the simplicity of the text data I saw my results with.

      • myfonj 4 years ago

        Yay, thanks for info and inspiration, sure it seems like fun weekend project.

        (BTW your snippet's content sounds interesting and feels relatable, definitely intrigued.)

      • myfonj 4 years ago

        Finally did some synthetic measurements of (hopefully) parse times (not render nor CSSOM or anything like that). Differences seems microscopic but overall aligned with my initial expectations (omitting the closing tag actually shaves a bit of yak's hair), so I suspect that the real overhead you observed is caused by something happening after parse, where absence of trailing white-space in DOM nodes (ensued by closing tags) helps in some way. I guess something around that white-space or text layout. (Speaking of insignificant white-space, you could probably gain some more microseconds if you'd stuck paragraphs together (`..</p>\n\n<p>..` -> `..</p><p>..`), however such minification seems like a nuisance.)

        Tested only on Windows, in browser consoles.

        Numbers:

        Firefox (Nightly) (performance.now is clamped to miliseconds)

            total; median; average; snippet
            2279.0; 4.0; 4.558; '<p>_'
            2652.0; 4.0; 5.304; '<p>_</p>'
            2471.0; 4.0; 4.942; '<p>_abcd'
            2387.0; 4.0; 4.774; '<p>_\n'
            3615.0; 5.0; 7.230; '<p>_</p>\n'
            2380.0; 4.0; 4.760; '<p>_abcd\n'
            3093.0; 5.0; 6.186; '<p>_\n</p>\n'
            3107.0; 5.0; 6.214; '<p>_</p>\n\n'
            2317.0; 4.0; 4.634; '<p>_abcd\n\n'
            2344.0; 4.0; 4.688; '<p>_\n\n'
        
        Google Chrome (performance.now is sub-milisecond)

            total; median; average; snippet
            2870.4; 5.2; 5.741; '<p>_'
            2895.2; 5.4; 5.790; '<p>_</p>'
            2684.7; 5.2; 5.369; '<p>_abcd'
            2845.4; 5.2; 5.690; '<p>_\n'
            3836.7; 7.3; 7.673; '<p>_</p>\n'
            2837.8; 5.2; 5.676; '<p>_abcd\n'
            4022.5; 7.4; 8.045; '<p>_\n</p>\n'
            4044.3; 7.3; 8.089; '<p>_</p>\n\n'
            2928.4; 5.2; 5.857; '<p>_abcd\n\n'
            2805.3; 5.2; 5.611; '<p>_\n\n'
        
        Test config

            Snippets per document: 5000
            Rounds: 500
            Wrap: '<!doctype html>(items-paragraphs)'
            Content each item (_): bunch of random digits chunks, something like '1943965927 52 27 5 51664138859173 5161 7226 5 15 2 55679 6553712585'
        
        Code: https://gist.github.com/myfonj/57a6a8fcb1c5686527412543a897c...

        (Before realizing I can use synthetic domparser I made something what measures document load time in iframe (http://myfonj.github.io/tst/html-parsing-times.html) but it gives quite unconvincing results, although probably closer to the real world. Understandably, synthetic domparser can crunch much more code than visible iframe.)

  • toqy 4 years ago

    > For some prettier mark up.

    But then if you run it through Prettier it'll add all the closing tags for you :)

    • throwaway894345 4 years ago

      If you’re running it through a processor, why it just write markdown and call it a day?

      • galaxyLogic 4 years ago

        Is there a standard definition for the "Markdown" -language?

        There are several for HTML different versions and it is standardized that you can omit some closing tags and some tags altogether.

        The benefit of writing in a standardized language is that later you or anybody can run tools against your sources that check for conformity.

        So that is why I prefer HTML. But I would like to hear your opinion on what is the best mark-down dialect currently?

        • throwaway894345 4 years ago

          Yes, CommonMark is a standard with implementations in many different languages.

          • galaxyLogic 4 years ago

            That is an interesting development.

            From their Github page I read: "The spec contains over 500 embedded examples which serve as conformance tests."

            So it's not so simple any more, is it?

            (https://github.com/commonmark/commonmark-spec)

            • jmalicki 4 years ago

              Less than a 1000 conformance tests for a standard? Sounds Pretty simple to me, no way you could make an HTML compliance suite that small.

            • throwaway894345 4 years ago

              > So it's not so simple any more, is it?

              I claimed the specification existed, I didn’t claim it was a simple specification.

              • galaxyLogic 4 years ago

                I'm not claiming you claimed it was a simple specification :-)

                I just find it interesting. This would indicate to me that there are 500 "features" in the language. I thought mark-down languages just provided a few shortcuts for producing the most commonly needed HTML features and then provide a fallback to HTML. So if you cannot do it in the markdown language, use HTML instead.

                • Thiez 4 years ago

                  I can't really be bothered to take a look at the tests, but I strongly doubt there are actually 500 features. A large part of those tests are probably trying combinations of features. E.g. suppose markdown only had tables as a feature, and nothing else. That feature alone deserves several several tests (for tables of various sizes, edge cases such as having only the header, having rows with an incorrect number of columns, etc.).

                  But let's assume we can get away with just a single test for tables. And then we introduce the features "section headers" and "bold" and "underline". All these features can interact (e.g. underlined bold section headers), so we want to test combinations of all those features, and have a nice combinatorial explosion.

                  • galaxyLogic 4 years ago

                    I see, combinations. But also the ability to use different combinations of "basic" features in a sense are a specific feature too. Like you can mark text bold and you can mark text as representing a table. But can you mark text within tables bold? If you can that would to me be a "feature" too. If you can not, then that "feature" is missing.

      • hombre_fatal 4 years ago

        Well, one simply formats the source file as you write it. The other requires a infile -> outfile build step that's more complex.

        Whether the latter is worth it tends to depend on other things than parse time.

        • throwaway894345 4 years ago

          Why would I care if one is merely “formatting” or not? If I have to run a tool either way, I would prefer one that accepts a user-friendly input language and decouples content from presentation.

          • hombre_fatal 4 years ago

            Because transforming an .md file into an .html file is a lot more invasive (though taken for granted here I think) than just writing the .html file. It's a build step where there wasn't one before.

            I'm not saying it's never worth it.

          • anjbe 4 years ago

            How does Markdown decouple content from presentation?

  • jokoon 4 years ago

    Are more strict html parsers/renderers, and aren't they faster?

    • hombre_fatal 4 years ago

      Lenient parsers still benefit from strict input because it lets them avoid lookaround/backtracking.

      • vbezhenar 4 years ago

        What do you mean by lookaround/backtracking? You're inside <p>. You encounter another <p>. You can't nest one <p> inside another <p>, so you close current <p> and open new <p>. That's about it. I fail to see where do you need any kind of backtracking.

        • hombre_fatal 4 years ago

          Well, even in this one example, imagine parser combinators which often mean backtracking the inner <p> so that you can commit to the `openTag('p')` parser. Or your logic may be 'consume all tags that aren't <p>` which is a lookahead.

          A better example here is whether you are lenient and accept unescaped html entities like "<" vs "&lt;". If you require it to be escaped "&lt;" or if all entities in your inputs are always escaped, then your text parser never has to backtrack. But if you are lenient, your text parser can do catastrophic levels of backtracking if there is a single "<" somewhere (unless you are careful). Imagine input that starts off "<a small mouse once said". If could be quite a while before your parser knows it's not an anchor open tag.

    • shakna 4 years ago

      > Are more strict html parsers/renderers, and aren't they faster?

      Are what more strict? You're missing a subject there.

      At a guess, you're referencing the differences between Chrome/Firefox rendering times? And are surprised that Chrome is always slower?

      In the same completely unscientific stat taking, I found that Chrome was significantly faster at parsing the HTML head element of a document than Firefox, and that difference was enough for Chrome to pull ahead of Firefox in overall rendering times for smaller pages. (Chrome was about 30% of Firefox's time spent in the head.)

      However, Firefox was faster at parsing the body, and as I had a larger-than-usual body (50k words is not your average webpage), Firefox was overall faster.

    • chrismorgan 4 years ago

      To you and all that have responded: there is no variation in HTML parsing between browsers. All engines are using precisely the same exhaustively-defined algorithm. There is no leniency or strictness. Their performance characteristics may differ outside of parsing, which includes what they do with the result of parsing, but in the parsing itself there should be basically no difference between engines or parsers.

  • hsbauauvhabzb 4 years ago

    That’s interesting, but surely relying on user agent to ‘fill in the gaps’ is error prone? Surely transpiling prior or during render would be more resilient than trusting browser behaviour

    • lolinder 4 years ago

      If you're in a situation where resilience against odd browser quirks matters, you probably shouldn't be writing HTML like this anyway. This style is fine for writing HTML for a blog. For any kind of application, it would be a nightmare to try to maintain.

      Every time the author introduced a shorthand, they had to clarify that it works only in specific situations. The result of those qualifiers is that you will have to have some code written in the more verbose style anyway. Context switching between those styles and having to decide whether the shorthand works in any given case just isn't worth it on a large project that you'll be making changes to over time.

    • chrismorgan 4 years ago

      HTML parsing is exhaustively defined, so there’s not any filling of gaps, but only rules to be aware of. If you don’t know those rules, this may be error-prone, but if you do, it’s not, and things like the start and end tag omissions discussed in the article are quite straightforward rules to learn.

myfonj 4 years ago

Although, as the article correctly points out, omitting the HTML tag is technically fine, there is one rather important argument for its inclusion: it can and should have a LANG attribute:

    <html lang=en-GB>
It's not verbose after all, and IIUC may be omitted if and only if the document is served with corresponding information in `Content-Language:` HTTP header, but nasty (or rather annoying) things may happen if that fails [1], so when it comes to "right HTML", following this advice sounds reasonable.

[1] https://adrianroselli.com/2015/01/on-use-of-lang-attribute.h...

timw4mail 4 years ago

No thanks. With the full markup you can see where things end, not just where they start.

I think this is similar to semicolons in Javascript: with semicolons at the end of each statement there is no ambiguity, but if you do not have semicolons, you have to know about edge cases, like if a line starts with a square bracket or paren.

  • exyi 4 years ago

    You can't disable this "feature", so you still don't know where things end / begin. Some tags can't be nested in <p> while you could expect that they can:

      <p>
         Paragraph with a list won't work as you could think
         <ul> <li> Test </li> </ul>
         Something else
      </p>
    
    Parses to:

      <p>
        Paragraph with a list won't work as you could think
      </p>
      <ul> <li> Test </li> </ul>
      Something else
      <p></p>
    
    
    Similarly, in JS you are paying the price for optional semicolons even if you decide to use them.

       return
       {
          x: 1
       };
    
    Will still not work even if you use semicolons elsewhere. So I don't see any advantage to actually using semicolons. JS is not worse than Python with it's basic inference, and yet in Python people will almost yell at you if you attempt to use a semicolon :)

    I'd much prefer these features to be opt-in (yea, give me XHTML back for generated content). But when I can't can't disable them, why not embrace them ;)

    • minitech 4 years ago

      > JS is not worse than Python with [its] basic inference

      JS semicolon insertion is worse, because it depends on the following line. In Python, an unescaped newline outside of brackets always ends the statement, but in JavaScript, parentheses, brackets, binary operators, and template literals on the following line change that. The Python rule also makes a dangling operator outside of brackets a syntax error, which is a potential source of unintentional introduction of ASI when making changes to code in JavaScript.

    • Ontonator 4 years ago

      On the point about semicolons in JavaScript, the logic I’ve heard is that if you consistently use semicolons, you can have a linter warn you if there is an inferred semicolon, so you know if you have made a mistake. If you don’t use semicolons and accidentally produce code with an inferred semicolon that should not be there, then there is no way for any tool to warn you. (Well, no general way; in your example with the return, many linters would warn you about unreachable code.)

      • epolanski 4 years ago

        I never use semicolons and I never have these issues.

        Even in the rarest cases I maybe had them like when copy pasting in the wrong place they were so rare that I don't think it's worth the additional noise of semicolons.

        • leonsegal 4 years ago

          There are 3 major footguns with automatic semicolon insertion iirc (one involves having the return statement on its own line. As long as you know them all it's fine I guess, but not my taste.

    • progval 4 years ago

      > give me XHTML

      You can still use XHTML; just send "Content-Type: application/xhtml+xml". You can express the same things as an HTML document, but with a saner parser mode.

      • chrismorgan 4 years ago

        > You can express the same things as an HTML document

        This is not quite true. There are a number of mutual incompatibilities between the XML and HTML syntaxes at both parse and run time.

        At parse time, it’s mostly in the direction of XML syntax making things possible (e.g. nesting paragraphs or links, which the HTML parser prevents), but also in the other direction (e.g. <noscript> has no effect in XML syntax since it’s essentially an HTML parser instruction); you’ve also got case sensitivity which matters for SVG; and there’s the matter of the contents of <script> and <style> elements and their handling of <>&, where the best but still imperfect solution is a crazy mix of XML comments, JavaScript/CSS comments and XML CDATA markers. (See https://www.w3.org/TR/html-polyglot/ for more details of all this kind of stuff.)

        At run time, behaviour changes in such a way that it will break some JavaScript libraries, due to differences like .tagName being lowercase instead of uppercase, and .innerHTML requiring and producing XML syntax.

      • epolanski 4 years ago

        What is saner parser mode?

        • anjbe 4 years ago

          In this context (although I would dispute calling it “saner,” as someone who was fully on board the XHTML train a decade ago), an XML parser, which among other things enforces that the markup is “well‐formed” by the XML definition, thus prohibiting implicit closing tags and unquoted attributes.

  • iamben 4 years ago

    Agree 100%. It's also about a thousand times easier for people with a very basic HTML understanding to parse (if you open something, with pretty much the exception of an image, you gotta close it).

    Periodically I have to send code to people who then make some of their own changes inline. God forbid trying to explain "yeah, they don't need to be closed, but that does because it's nested and..." Disaster (/hours of extra support) waiting to happen.

  • currysausage 4 years ago

    You have to know HTML in order to know where things end. Otherwise, you will see nested paragraphs here:

      <p>Hello <p>World</p>!</p>
    
    when it’s actually two consecutive paragraphs, an exclamation mark outside of any paragraph, and a closing p tag without an opening counterpart.

    And when you do know HTML, you might as well omit optional tags.

    If you think that HTML syntax is crazy, I won’t blame you, and you might consider XHTML instead, but you should be prepared for different woes.

  • mst 4 years ago

    I have a tendency to forget ASI in JS exists when I've only been looking at my own code rather than other people's for a while.

    I remain unconvinced it was a wise idea.

robgibbons 4 years ago

This works for blog posts, where the body of the document is one long block of paragraphs, but I suspect this style would quickly become untenable for complex apps. Indentation _is_ information, which is lost here.

  • clairity 4 years ago

    it doesn't work for even slightly complex documents either. there's been a little meme-fad lately around minimalistic html like this, but to claim it's the "right" way to write html is pompous at best.

    not closing tags for instance is really asking for future headaches. sure, it works for a simple text list, but not when it gets even a little complicated (add links, images, buttons, etc.). even worse are p tags, where you have to memorize a whole matrix of what it can contain and what breaks out implicitly. with every insertion/deletion, you need to check the list. it's needless mental drag.

    • niconii 4 years ago

      You have to know about what breaks out of <p> tags regardless of whether or not you leave off the end tag, though.

      <p><div></div></p> is invalid HTML because <div> ends the paragraph, resulting in an unpaired </p>.

      • anjbe 4 years ago

        And not just because of that. In XHTML‐as‐XML, where <div> does not implicitly end the paragraph, what you posted is still invalid because <p> cannot contain <div>.

  • aparks517OP 4 years ago

    I've been using this style - with some tweaks - for web apps too. I don't think I have it completely figured out yet, but it's promising so far. You can view the source of http://lofi.limo/ to see how it's working out.

    • jaywalk 4 years ago

      I feel like this style just makes it harder to read and understand the HTML. But hey, if it works for you, great.

      • sph 4 years ago

        This is the output of an app/templating system, i.e. not a single HTML page. Have you ever read the HTML of any dynamically generated page? It's unreadable.

        • jaywalk 4 years ago

          > This is the output of an app/templating system, i.e. not a single HTML page.

          I don't think that's correct. The article is literally talking about how to write HTML, and explaining the benefits of writing it in this style.

        • julianlam 4 years ago

          > Have you ever read the HTML of any dynamically generated page? It's unreadable.

          Not with that attidude... if you write consistently and with intention, it turns out just fine.

          Check out the source for https://try.nodebb.org, for example. Dynamically generated, (mostly) syntactically correct, (mostly) human readable.

        • ryanbrunner 4 years ago

          All the HTML code in the app I maintain is pretty readable. At some point of complexity any HTML is difficult to parse, but if I hand-wrote a page in my app I think the HTML would be largely the same.

  • egeozcan 4 years ago

    > Indentation _is_ information, which is lost here.

    Isn't it a "view" of information? Any sufficiently advanced text editor can recreate it with a simple key combination.

    • robgibbons 4 years ago

      Sure, but the author is advocating that you compose HTML this way. It would quickly become a mess of nested elements with zero visual indication of hierarchy.

      The DOM is a tree, with nested elements. Losing that information doesn't get you anything but tag soup (which is, oddly, what the author suggests this style is supposed to avoid)

      • SkeuomorphicBee 4 years ago

        First and foremost, the author advocates for organising documents in a much flatter DOM tree. In this style all major page elements sit at the same hierarchical level, so there is no "mess of nested elements", the is no need for visual indication of hierarchy if there is no hierarchy to begin with.

        I think that is a very compelling format for a text-first web page, like a blog post or news article. Of course it is a coding style not well suited for complex web apps with deep hierarchy.

      • LeonB 4 years ago

        In a tree you have branches off branches off branches etc.

        You can’t orient yourself - you can’t tell where you are - unless you count the branches. And indenting makes that visible.

        In the examples for TFA, you can tell your location from the names of the elements. Eg <td> is enough for you to know you’re probably inside a tr inside a table.

        And that is the more common case than the general tree example.

        But a method of describing html does have to answer the question of how it represents arbitrarily deep nesting. But I like the answers it’s given for the more common case of structures that are not arbitrarily deep.

        • Jcowell 4 years ago

          What’s a TFA?

          • dec0dedab0de 4 years ago

            The Fine(or Fucking) Article

            • shkkmo 4 years ago

              I always read the 'F' as "Featured"

            • LinuxBender 4 years ago

              Urbandictionary agrees. I reprogrammed myself to read it as "Fabulous" using a text replacement addon.

            • nayuki 4 years ago

              And this terminology (TFA) comes from at least the Slashdot days, about 20 years ago.

              • anjbe 4 years ago

                In turn, probably descended from “RTFM”—on Slashdot people who commented despite obviously not having read the article were told to “RTFA,” which eventually led to “TFA” as a general term to refer to the original article.

            • mlok 4 years ago

              The "F" has always been "Forementioned" for me.

      • nerdponx 4 years ago

        The problem is that HTML has multiple uses. The author is describing the case of authoring content, with HTML used as a markup language. However a lot of websites and web applications use HTML more like a layout and templating engine for a GUI framework.

      • falcolas 4 years ago

        Only if the formatter is unaware of HTML. If it can't handle automatically closing <p> tags, then it's unaware and is trying to treat HTML like XML.

        Or, to put another way, HTML != DOM, even though HTML can be rendered into a DOM.

    • Something1234 4 years ago

      Sometimes I indent in a way that my text editor doesn't exactly understand to better state where complex expressions begin and end.

    • jahewson 4 years ago

      I don’t think the author means the information-theory kind of information. I could gzip the file without a loss in that kind of information.

  • jahewson 4 years ago

    Incorrect indentation is therefore misinformation.

  • pwdisswordfish9 4 years ago

    That’s why HTML is not a language for ‘apps’.

    • Spivak 4 years ago

      Except for the fact that native apps also use SGML or XML inspired markup for their layout engines. A tree of heterogeneous objects maps extremely well to how people think about UI.

      • WillusFredus 4 years ago

        I agree that a tree structure can work well for mapping UIs, but HTML does not. It was specifically design as a textual markup language. Its role has been expanded, but it has been done so poorly.

        What really needs to happen it a separation of HTML from UI markup elements. HTML will be used solely for textual markup and a new markup language can be used for UIs. This would allow us to return to a proper separation of concerns.

        • ryanbrunner 4 years ago

          Sure, but that's an argument for creating new paradigms for having instantly available non-downloaded "apps". Right now, if you want a lot of what a webapp offers (100% cross-compatibility with any platform, instant updates, online syncing for free), you're basically stuck with HTML / Javascript.

        • TedDoesntTalk 4 years ago

          > What really needs to happen it a separation of HTML from UI markup elements

          Do you mean CSS? Using <b>, <I>, <strong>, etc has been “bad form” for a while (maybe not strong though)

          • frosted-flakes 4 years ago

            <strong> and <em> are the recommended ways to semantically bold and italicize text. <b> and <i> don't have any semantics and can still be used where it makes sense.

btrettel 4 years ago

Regarding writing "one-sentence-per-line", I've noticed that style before in LaTeX. While I don't use that style, one advantage that I like is the ability to include comments on the sentence level in LaTeX.

So instead of this:

  First sentence. Second sentence. % Comment on first sentence.
I can write:

  First sentence. % Comment on first sentence.
  Second sentence.
(Of course, one could define a new TeX macro that doesn't display anything to add comments anywhere in-line. That's not as readable, though.)

I've also read that one-sentence-per-line works better with diff programs, but I haven't had any problems with the program meld, so this isn't convincing to me. The advantage the linked article mentions in terms of rearranging sentences also is worth considering, though I haven't found the normal way to be that bad so I'm not convinced by that either.

Some other links on this coding/writing style:

https://rhodesmill.org/brandon/2012/one-sentence-per-line/

https://news.ycombinator.com/item?id=4642395

http://www.uvm.edu/pdodds/writings/2015-05-13better-writing-...

  • pronoiac 4 years ago

    I've been working on turning a pretty massive scanned book into a git repo of markdown files, with multiple collaborators. Using sentence-per-line has been useful (compared to line-per-paragraph) because, even with / despite --word-diff , PRs are far more concise, and merge conflicts are more rare. From memory, with paragraph-per-line, I think a series of paragraphs, each changed, even with minor changes, kinda breaks git diff and GitHub diff.

    • aparks517OP 4 years ago

      Oh, wow... I hadn't even thought of the diff angle, but it makes all the sense in the world. I've heard some authors even start each clause on its own line. I'm not sure I'm ready for that yet.

buzzy_hacker 4 years ago

> A few years ago, I found out I'd been tying my shoes wrong for my entire life. I thought laces came undone easily and didn't usually look very good. At least that's how mine were, and I never paid much attention to anyone else's. It took a couple of weeks to re-train my hands but now I have bows in my laces that look good and rarely come undone.

I’m equally interested in this as the HTML. Any clue what the author is referring to?

kuschku 4 years ago

I appreciate that this blog post itself is written in the exact same style! I really miss being able to read the view-source: version of websites easily, but this blog post does it well :)

  • Legion 4 years ago

    Certainly beats the "you don't need so much JavaScript!" blog posts that load 10 external scripts.

    • account42 4 years ago

      Or the articles about tracking and the ad industry with consent popups asking for permissions to let their ad "partners" track you.

sivers 4 years ago

Thanks to Aaron for posting this. Such a great reminder.

Anyone interested in this subject, check out a series of three very tiny books called “UPGRADE YOUR HTML” by Jens Oliver Meiert.

They give great step-by-step examples for eliminating optional tags and attributes, reducing HTML to its cleanest simplest valid form. The author is a super-expert in this specific subject, working with Google and W3C on this. His bio here: https://meiert.com/en/biography/

From LeanPub: https://leanpub.com/b/upgrade-your-html-123

From Amazon: https://www.amazon.com/gp/product/B08NP4GXY2/

  • xaduha 4 years ago

    > Such a great reminder.

    Reminder of what? To me this reads like satire, even if it wasn't intended as such.

gildas 4 years ago

This is how SingleFile writes HTML by default :). However, it is also the most duplicated issue in the tracker.

  • isp 4 years ago

    Example "issue" (feature) from the tracker: https://github.com/gildas-lormeau/SingleFile/issues/967

    (Also: a huge thank you for creating SingleFile. One of my favourite extensions of all time.)

  • sph 4 years ago

    You can link it: https://github.com/gildas-lormeau/SingleFile

    Pretty neat extension!

    • gildas 4 years ago

      I was hesitating, thanks!

      • pwdisswordfish9 4 years ago

        The remarks by the person who opened #967 are beyond frustrating—and it's frustrating to see your responses to them. People putting stuff into the bugtracker that aren't bugs deserve a harsher response. Don't enable "putting stuff into the bugtracker without clearly articulating a defect [in the form of observed behavior versus expected behavior‡]" to be a viable way to interact with a project. Indulging these kinds of persons' requests for support and freeform banter is harmful in the long run. Giving them the answers that they're looking for even though their questions/comments are out of scope is way too forgiving, and it ends up causing problems for other maintainers when these numbskulls inevitably pop up around other projects and expect the same standard of treatment because they take it as a given that their fripperies are kosher.

        ‡ including sound, solid reasoning for why the former is incorrect and the latter is correct

        • gildas 4 years ago

          At first, I thought people would respect the issue template. In practice, very few do, even when a proper bug is reported. I completely agree with you but it seems to be a losing battle. So I just deal with these kinds of cases according to my mood. Concerning the bug #967, maybe I was not angry enough. Overall, the atmosphere on the bug tracker is fortunately very positive.

iostream24 4 years ago

At this point you are better off making a DSL that compiles to html.

- it will be possible to be consistent with closing tags or not

- you can do other arbitrary things to improve your working experience with it

Ever tried Slang styled templates?

zeven7 4 years ago

I like this idea. As someone who argued vehemently for XHTML a couple decades ago (even wrote a fair amount of XSLT in those XML-crazed days), who's been wandering between different levels of "how strict should I be?" since that time, this article marks the step of my journey where I feel like I can really embrace the goodness that SGML has to offer for the first time. So thank you. This article has changed me.

sylware 4 years ago

Regarding tables, there is one trick: size of borders are actually weighted semantic separators, and should be in HTML, not in CSS.

  • Julesman 4 years ago

    Regarding tables, don't use tables. :)

    • egypturnash 4 years ago

      …for non-tabular data such as “your pretty design elements that frame and organize the text because it is 1995 and CSS doesn’t exist yet and this is the only tool at your disposal for aligning stuff across the page”. Or because it is 2000 and putting stuff where you want it is a hell of CSS2 floats and box models and eventually you just say “fuck it” and assign table-like behavior to a bunch of divs because Tables For Layout Are Considered Harmful.

      If you’ve got stuff that would look good as a table, use a table.

      • temporallobe 4 years ago

        It’s funny you bring this up because while I have joined the Tables For Layout Are Considered Harmful club, I never really have heard a completely convincing argument on why tables have this bad rap. I think it’s mostly because, semantically, tables don’t make sense for layout, but back in the days before frameworks such as Foundation and Bootstrap (and more recently native CSS3 mechanisms), tables with invisible borders were nearly perfect for layout containers.

        • jerf 4 years ago

          The "Tables are Harmful" club largely came from the crew who thinks HTML carries lots of semantics and that if you don't use the Blessed Tags that carry those semantics you're doing Bad Design.

          The rational evidence in favor of this claim has always been weak. The "div" tag basically finished it off. The people who use HTML "semantically" have always been dwarfed by the people just making it look good on the screen, and the number of applications that use those semantics has always been small and on the fringe for something so putatively important.

          However, the idea persists to this day despite its near complete failure to pay off significantly in nearly twenty years, and I'm sure someone will angrily reply to this and list the incredibly useful semantic HTML features that they and fifteen other people have found to be just incredible. Perhaps we'll also get the traditional citation of the Google info boxes, which have nothing to do with the semantic web and everything to do with Google throwing a crapton of machine learning and humans at the problem of parsing distinctly non-semantic HTML until they cracked the problem.

          (An honorable mention to screen readers, which sorta do benefit, but still nowhere near as much as you might casually expect.)

          Today the reason not to use tables is more just that it's inconvenient to do things like have a mobile and desktop layout. I believe they've got all the tools nowadays to tear into a table-based layout, break the tables apart, and treat it like any other CSS-styled content, but that's relatively recent, and still a silly way to operate when you could just use normal layout elements ("div" if nothing else) like a sane person and not have to undo the table layout before you can manipulate them properly.

          • spread_love 4 years ago

            > honorable mention to screen readers

            Too limited, and deserving of much more than an honorable mention.

            Accessibility should be a fundamental consideration of any reasonably sized app, using <table>s to markup tables is part of that.

            Assistive devices are not limited to screen readers, and it's just good practice to use tables for tables.

            CSS Grid has landed in all major browsers, if you want a grid layout, use grids for layout.

          • WillusFredus 4 years ago

            > the idea persists to this day despite its near complete failure to pay off significantly in nearly twenty years

            I am not clear on exactly what "the idea" refers to, perhaps you could clarify. Also, how has the idea "completely failed"? And what would complete success look like?

            • jerf 4 years ago

              The idea is the "semantic web". Success would look like almost everyone here having to know a lot more about the "semantic web" to do their jobs, such that I wouldn't have to explain to anyone what it was because it would just be how things worked, because it would be that important, and they couldn't operate without it because they wouldn't be able to compete against other websites without the staggering benefits that super-careful, expert semantic design brings them. Rather than just learning the layout and adding a few extra accessibility tags as needed.

              As it stands now, it's very practical to just slap some <div>s down and do some CSS and be done.

          • pwdisswordfish9 4 years ago

            Obnoxiously bad take.

            > Google info boxes[...] have nothing to do with the semantic web and everything to do with Google throwing a crapton of machine learning and humans at the problem of parsing distinctly non-semantic HTML until they cracked the problem

            This is verging on /r/SelfAwarewolves material.

            • jerf 4 years ago

              I'm pretty sure you're misinterpreting it. Google did not simply write a web scraper that pulls a <business_hours> or a <dc:business_hours> tag out of the web. They wrote a web scraper that super, super intelligently examines the HTML and looks for "anything that looks like business hours"; maybe it's in a table, maybe it's days of the week separated by &nbsp; and <br>, maybe it's in <div>s or <span>s with suggestive CSS class names, maybe it's just in a pile of other HTML. The exact promise of the Semantic Web was that we could just load up a page and get a <business_hours> out of it. Google had to extract the "semantics" with everything but the "semantic web", because the "semantic web" is a no-show. Throwing a crapton of machine learning and humans at extracting semantically useful information from a page is precisely what the Semantic Web isn't.

              Which is why it is bizarrely unselfaware when Semantic Web advocates almost inevitably cite that as their biggest success. It isn't. It's their biggest failure.

              • pwdisswordfish9 4 years ago

                > I'm pretty sure you're misinterpreting it

                You should be more sure of the things you're pretty sure of before saying you're sure of them.

                There was no misinterpretation—from this end, that is. Your comment wasn't particularly sophisticated. It didn't require explanation.

                > Google did not simply write a web scraper that pulls a <business_hours> or a <dc:business_hours> tag out of the web. They wrote a web scraper that super, super intelligently examines the HTML and[...]

                No shit. The value proposition of the semantic web follows from how the world would be much better off if that weren't necessary. It has always been the case that, without the "semantic" half of "semantic web", attaining Google-level mastery over the Web's messy inputs is really, really difficult and requires Google-level resources. This isn't news. Yet you presented it as if it were in insightful observation wrapped in sage wisdom.

                In your attempt to "prove" by counterexample what's Wrong with the semantic web, you just end up undergirding its very premise.

                > Which is why it is bizarrely unselfaware when Semantic Web advocates almost inevitably cite that as their biggest success.

                You cited them. You are literally the only person who mentioned them here, at all. You brought them up.

                Saddling someone who advocates for X with the burden of defending position Y that you yourself have pulled from thin air is a textbook example of a bad argument. If you defeat some easily take-downable opponent (a 6-year-old, let's say—and one who is made of straw, for good measure) and then plan to enter the ring in subsequent matches having only bothered yourself with the thought that you will face the threat of another strawchild, that's not wise. It's stupid.

        • eyelidlessness 4 years ago

          The semantics being wrong is convincing enough to me. That’s fundamental to accessibility. But if that doesn’t convince you: they’re not even good at layout. It’s much harder to make a page even minimally responsive with a table than with more semantically appropriate markup; you’d effectively have to revert all of the table styling, at which point why bother?

        • kixiQu 4 years ago

          https://eev.ee/blog/2020/02/01/old-css-new-css/ <-- Sounds like it depends on how complicated you want things to get within those containers; Eevee here mentions nesting three levels of tables, which... Ew.

          Also, not being able to rearrange blocks for different size displays is kind of a non-starter relative to the mobile internet, which I'd guess was more important than frameworks.

          In my own hobbyist stuff, though, there's just something a little gross about putting layout in the HTML -- I want HTML to represent semantic structure, in ways I'd be okay with Lynx displaying, and I want CSS to do all the lunatic nonsense to make me happy with how it looks on a modern browser. I wonder how much it's this aesthetic principle of separation that motivates others as well.

        • giantrobot 4 years ago

          Tables don't lend to responsive layouts. If you've got a stereotypical layout with the middle row having a main content and sidebar column you can't really reflow that sidebar below the main content on mobile. With block elements (divs or semantic blocks) and CSS it's super simple to collapse multiple columns down to a single column for mobile. It's also simple to redo the same layout to handle super wide displays as well.

          Tables for layout were fine back when everyone was browsing the web on SVGA, XGA, or even SXGA screens at 96dpi (72dpi on the Mac). Now a visitor might be on a high DPI display in portrait orientation, full screen on a 4K monitor, or anywhere in between I think it's a bit disrespectful to visitors not have have a responsive page layout. Tables are a liability for responsiveness and should only be used for tabular data.

        • marcosdumay 4 years ago

          Doing layout with tables creates a mess of non-semantic cells, with spans everywhere. That is hard to read, hard to write, brittle on changes, and obscure the actual content. If you take a random page from the tables era, the odds are great that you won't be able to tell what text goes next to what other text.

          Divs that follow the document's semantic hierarchy and are positioned on your CSS have none of those issues.

          Anyway, a lot of ways to use Bootstrap and other grid-based frameworks introduce the same problems back. And if you want to really display things in a table, well, a table fits quite well your requirements.

        • clairity 4 years ago

          you can still use table layout, just not tables for layout, via `display: table` and it's many cousins (table-row, table-cell, etc.). it's a bit cumbersome, so not something to use everywhere like the old days of tables plus spacer.gif's.

          https://developer.mozilla.org/en-US/docs/Web/CSS/display#int...

        • frosted-flakes 4 years ago

          ...and absolutely unreadable for anyone using a screen reader.

          • layer8 4 years ago

            This was never true, I believe, and only a theoretical issue invented by the semantic-HTML obsessed.

            From [1]: “It is sometimes suggested, even by some accessibility advocates, that layout tables are bad for accessibility. In reality, layout tables do not pose inherent accessibility issues.”

            [1] https://webaim.org/techniques/tables/

        • layer8 4 years ago

          There was never a well-substantiated argument for the alleged harm of layout tables. Demonizing them mostly just stemmed from the cult of wanting to completely confine layout to CSS vs. expressing semantics with HTML. In the end that “CSS zen” was never really achieved, because the dependencies between HTML structure and styling are just too many and too strong.

          • WillusFredus 4 years ago

            > There was never a well-substantiated argument for the alleged harm of layout tables.

            What was the best argument that you can recall? What were some of the bad ones? What does "harm" mean in this context?

            > because the dependencies between HTML structure and styling are just too many and too strong.

            Which dependencies? What would a structure/styling language combination look like that that lacked or had weak dependencies?

            • layer8 4 years ago

              > What was the best argument that you can recall?

              I mostly recall “layout belongs into CSS files” (so a matter of principle) and “layout tables are bad for accessibility”, which while in theory could be an important point, in practice screenreaders had already adapted, and had (still have) very practical heuristics to distinguish layout tables from data tables (see e.g. https://webaim.org/techniques/tables/).

              The thing is, at the time, using CSS to achieve the equivalent of layout tables was an exercise in frustration and futility, in that the results were exceedingly brittle and very often broke when either the table content, the surrounding elements or the browser window size changed too much.

              Nowadays we have CSS grid and flexbox of course, but I imagine that in some cases a layout table could still be the most straightforward solution today.

              > Which dependencies?

              The fantasy back then was that it would be possible to define the HTML content and structure completely independently from layout and styling considerations, and then a separate CSS file could be used to specify any conceivable styling and layout for that content. While that is true to a certain extent, it usually breaks down as soon as you need HTML elements to be in a different order or nesting relation, or when you need additional intermediate nesting DIVs, etc.

              In reality the HTML structure and the CSS structure (bound to each other by IDs, class names, hierarchical selectors etc.) is so closely intertwined, and the mapping points (i.e. IDs, class name combinations, etc.) are so many that, for the most part, only superficial changes can be made to one side without having to also make some adjustment on the other side. Ideally, it would be possible for an HTML author and a CSS author for the same web page to mostly work independently from each other. In reality this is almost impossible, except for the case where the HTML remains basically unchanged and the CSS can change within the constraints of the existing HTML structure.

              Banning layout tables was never going to be a major factor in coming substantially closer to the imagined ideal here.

              > What would a structure/styling language combination look like that that lacked or had weak dependencies?

              I think it’s inherently difficult, because you will always need to specify which styles/classes should apply to which element in a rather fine-grained manner, which just means there will always be a lot of dependencies between the two sides.

              One thing you’d need in order to realize arbitrary layout is a way of mapping structured content into a different structure. That basically means having a functional programming language to define the mapping, if you want to have full flexibility.

    • temporallobe 4 years ago

      Joking aside, tables are perfectly acceptable and actually the most appropriate markup for tabular data; in addition, accessibility tools know how to read them (IF they are coded correctly, but that goes for any HTML). I use tables where needed, but of course never for layout.

    • Rygian 4 years ago

      Except, you know, for actual tables. :-)

dasil003 4 years ago

I like the aesthetic though I'm not sure how sustainable it is beyond basic content documents. On a side note though, I clicked around and big props to Aaron on the lofi.limo project, this is very cool.

  • aparks517OP 4 years ago

    Thank you for the kind words! I've been working on adapting this style for web apps, but I haven't got it figured out well enough to write an article about. Yet...

    I wouldn't mind if we had a bunch more basic content documents on the web.

mekster 4 years ago

HTML can't be fixed with a small trick like that.

Just use templating engine like Pug and get away with most of the annoyances.

It's concise about what part of the text is covered by a certain tag due to forced indentation, not to mention you'll never need to close any tag and you never write "class=" but are all turned into CSS selector notation among many other tricks.

https://github.com/pugjs/pug#syntax

Unless the HTML I'm composing will be touched by people like designers who would get scared of new syntax, in which case I'll use Twig or Nunjucks, I'll never write plain HTML for myself.

There's also a very solid implementation in PHP as well.

https://github.com/pug-php/pug

You can either let server side (node.js or PHP) compile that on demand or let your editors compile them as you edit if you're working on a static file.

I really think the language humans write should deviate from the language the runtimes understand to get all the convenience while never breaking how runtimes/crawlers interpret your output. Same goes for Stylus against CSS.

account42 4 years ago

> However, any content which cannot go in a p element (most other block-display elements, for example) implies the end of its content, so we can usually leave off the end tag.

Note however that this means that the whitespace between paragraphs will be part of the paragraph which can be annoying if someone tries to copy the text on your website and gets an additional space after each paragraph which wouldn't have happened if you explicitly closed the </p> directly after the text.

Also, you should keep the opening <html> and specify the language of your document even for english since e.g. automatic hyphenation does not work if you don't specify a language.

Otherwise really like this condensed HTML style and have recently converted my personal website to it.

andrew_ 4 years ago

The lack of closing tags is giving me severe anxiety. I know it's valid non-xml syntax but all the hairs on my neck are at attention.

  • pineconewarrior 4 years ago

    I agree, and unless someone has a better reason than the ones I have seen, (saving tiny amount of bytes, less keystrokes, dx) I am convinced it's a bad idea to omit the end tags.

    It causes way more trouble than those benefits are worth

    • MrVandemar 4 years ago

      I use an aggressively minnimal set of (valid) HTML because I prefer to write in HTML rather than Markdown-flavour-x.

      Omitting the closing tags where possible is less about saving keystrokes than minnimising interruptions to my writing flow.

      But I wouldn't advocate it for published documents, just my local scribblings.

  • nayuki 4 years ago

    To solve your anxiety, may I suggest XHTML? I use it on my website in practice and it works really well.

jacobsenscott 4 years ago

If you must write html by hand this seems nice. But I would never actually write html by hand anymore. For most web apps you write more tags than text. I love slim because it was designed with that in mind. There is no overhead to writing tags, and just a little for writing text. Which is the right way to go for web apps.

spread_love 4 years ago

omitting <html> works fine in browsers but breaks a lot of other developer tooling in my experience. It's nice to save 6B I guess, but compared to the behemoth webapp it's wrapping it's not much of an optimization.

JasonFruit 4 years ago

Why does it matter? A good HTML editor ought to be able to take in HTML, display it and edit it according to the user's preferences, and save it in a size-minimizing way. Why should we have to choose only one way?

movedx 4 years ago

Author: write HTML right Me: this green on black background is terrible to read, I'll use reader mode Chrome: this author did not write their HTML correctly, so there is no reader mode available

How ironic.

  • exyi 4 years ago

    Firefox's reader mode works just fine. You need a right browser for the right HTML.

    ... anyway, it bothers me sometimes that I'm not aware of any spec for "reader mode compatibility", did anyone see anything like that?

  • zzo38computer 4 years ago

    I use (a old version of) Firefox and can select "View > Page Style > No Style" to disable CSS, and this works OK for me (it is better than some web pages, where this does not work very good, but this one it works good).

    I do not know what criteria are needed for the reader mode in Chrome. (The HTML code looks OK to me?)

  • frosted-flakes 4 years ago

    I think Reader mode looks for a <main> section. When it's not present it either guesses or doesn't work at all.

moreati 4 years ago

Is there a tool to convert an existing HTML document into this style? E.g. strip out optional closing tags, without doing full minimisation/whitespace stripping.

DustinBrett 4 years ago

I've been using https://github.com/terser/html-minifier-terser to get this kind of HTML for my personal site for a while. It passes W3C so I'm happy.

After reading the connected blog post http://perfectionkills.com/experimenting-with-html-minifier/

epolanski 4 years ago

Slightly off topic but I'd like to point out that paragraphs in HTML are grouping not textual elements. They are like divs or headers, not like span or b.

They are mistakenly and traditionally associated with literature-type paragraphs but that is not correct. You generally use them in forms to split different groups or inputs, that has nothing with paragraphs of a written form and even less with textual paragraphs.

I think there is really a lot of confusion about them in this whole thread.

  • niconii 4 years ago

    Although there are some other uses for <p>, it is perfectly valid to use <p> tags for textual paragraphs and that has been the main use for <p> for as long as HTML has existed. I'm not sure why you believe otherwise.

    Take a look at the source code for http://info.cern.ch/hypertext/WWW/MarkUp/Future.html for instance, which was written by the creator of HTML, Tim Berners-Lee.

    You can also look at the source code for any page of the current HTML spec (e.g. https://html.spec.whatwg.org/multipage/introduction.html) where, again, <p> is used for each paragraph in the text.

    • epolanski 4 years ago

      I didn't say it's not a valid use, I said that it's not it's primary use.

      Paragraphs relate to grouping content[1], not textual one. There's no logic in paragraphs.

      I quote here the official spec, which makes various examples of how paragraphs are not related to logical paragraphs:

      > The solution is to realize that a paragraph, in HTML terms, is not a logical concept, but a structural one. In the fantastic example above, there are actually five paragraphs as defined by this specification: one before the list, one for each bullet, and one after the list.

      And I'll quote also the definition on MDN:

      > The <p> HTML element represents a paragraph. Paragraphs are usually represented in visual media as blocks of text separated from adjacent blocks by blank lines and/or first-line indentation, but HTML paragraphs can be any structural grouping of related content, such as images or form fields.

      Failing to realize that paragraphs are grouping rather than logical content leads to frequent misuses of paragraphs and this comment section is literally filled by bad paragraphs examples which suggests the community is largely ignorant on html.

      [1]https://html.spec.whatwg.org/multipage/grouping-content.html...

      • niconii 4 years ago

        In this comment section? Are you talking about stuff like the example I used earlier?

            <p><div></div></p>
        
        Yes, obviously this is bad and nonsensical HTML. Under no circumstances does it make sense to have a div inside a p. In fact, the above doesn't even work, being parsed as

            <p></p><div></div></p>
        
        But the intention of this example is not to show good HTML. The point is that many people have only a very basic understanding of HTML syntax, under the impression that

            <foo><bar></bar></foo>
        
        works for any elements, because there's a <foo> and a </foo> so clearly anything inside it must be inside the foo element, right? But this is not the case for all elements. HTML's syntax is more complicated than that. My example was only intended to correct this misconception, not to demonstrate semantically-correct HTML, and that goes for other similar examples made by other people in the comments too.
layer8 4 years ago

> gator

What do they mean here?

nhooyr 4 years ago

> It used to be the case that URL parsers would remove newlines and tabs, so we could split long URLs across lines and even format their query parameters nicely with tabs. Unfortunately, this was taken advantage of for data exfiltration via HTML injection and we no longer have this nice thing as URL parsers have been made more strict to prevent this kind of attack.

Does anyone have a source/reference for this?

Kazkans 4 years ago

Why dont just use groff/troff and output to html?

Voeid 4 years ago

Figure 2. showing the "common style" is something I've never used or seen before.

What is the "right" way? Perhaps it is to use style from both of these extreme examples and write code that is easy to read and edit for the person that is working with it.

Or perhaps the right way is to never imply the way you are doing things is the only correct way and then try to pass it on as facts?

JJMcJ 4 years ago

I too found out I'd been doing my shoelaces wrong. YouTube set me straight.

For HTML, these are good recommendations.

Sometimes, like for technical writing where there are

    various
distinct and important formatting choices, it's just hard work to get it the way you want it even with a WYSIWYG editor.
exodust 4 years ago

Closing li tags is the right thing to do! I always close the kitchen drawer too after putting the scissors back. But I rarely write HTML as content anyway, it's mostly templates for the CMS, where it's best to close the tags.

  • recursive 4 years ago

    I too close my kitchen drawers. But not my li tags. Unless I'm using the bastardization known as jsx. The next li closes it automatically, as it's specified to do.

    • hinkley 4 years ago

      "everybody knows" doesn't scale, because

      1) not everybody knows and

      2) you're relying on memorization for people to read your code, which means you're smashing the ladder rungs behind you

      Software on a team is a performance art. People are either watching you and copying your behavior, or watching you and getting confused.

      And if you've ever felt overbooked on a project while other people are idle? It's stuff like that that put you into that situation. And since you're the one who did the 'stuff like that', it's at least partly your fault you're in this situation. Stop being a ball hog, and you'll get fewer bruises.

      • recursive 4 years ago

        > "everybody knows" doesn't scale

        Agreed. That's why I prefer to have things written down. In this case, WHATWG and W3C already did the work for us.

        > And if you've ever felt overbooked on a project while other people are idle?

        I've seen what you're talking about, but I'm not the one getting overbooked. I'm not generally the one fighting over this stuff. If I get feedback on a PR telling me to add li close tags, I'll probably just do it.

        If you're using a technology on a daily basis, it will pay big dividends to spend a little time learning how it actually works.

martin_a 4 years ago

While I'm no big fan of SEO and all that surrounds it: Will this open-tag-thing here influence how crawlers handle your site and index/rank it?

  • exyi 4 years ago

    I have no idea what Google does, but expect their parsers to be quite robust. I tried doing some web scraping, and so many pages are not even valid HTML (most often invalid nested tags, like a table inside span, missing closing tags even when required, random unopened closing tags, ...). Not closing <p> and <td> tags is quite common, I have not seen omitted <html> <head> and <body> yet.

  • aparks517OP 4 years ago

    I don’t expect it to as long as the mark-up is valid. Perhaps someone with more SEO knowledge will stop by to correct me.

    • martin_a 4 years ago

      You're right. HTML5 does not work with DTDs anymore, so unclosed tags are not a violation of the document schema and therefore probably not "punishable" by search engines.

      • anjbe 4 years ago

        Implicit end tags as described in the article have been allowed by every HTML DTD not named XHTML.

tiffanyh 4 years ago

One of the easiest ways to improve SEO is to just properly use existing HTML tags (instead of using a custom DIV for everything).

jokoon 4 years ago

I'm curious if a more strict html parser would actually be faster.

Browsers are not really fast on my Android, and I wish they were fast.

  • exyi 4 years ago

    I have yet to see a slow HTML-only website ;) (which is not 10MB single file spec or entire book). Really, I don't think html parsing is a huge bottleneck and these few parser exceptions don't seem to be that hard to implement - just close a tag if opening one of a predefined list, no backtracking or something expensive.

  • mst 4 years ago

    Depending on what sites you're mostly accessing, it may be worth experimenting with Firefox Mobile plus uBlock Origin plus perhaps one or more of the extra anti-(ad|bloat)ware extensions. Chrome is definitely faster in a straight line but once I've got Firefox configured it's (to me) significantly more pleasant to use (and I like the current UI better than Chrome's though that's -definitely- not a universal opinion, mileage may vary as ever).

_glass 4 years ago

love the troff reference. I wrote my first CV in troff. mostly because it was available on my linux machine, and working.

iLoveOncall 4 years ago

In 2022 how often do you actually write text by hand in your HTML files? I find that beside the few buttons here and there (and that's if you don't have i18n), text is always going to be served by a server.

In 2022 we also all use text editors or IDEs that can collapse entire blocks of tags, to improve readability.

I'm not sure I can see a clear benefit here outside of very few edge cases, and I am sure it comes with its lot of disadvantages.

  • nojs 4 years ago

    Static site generators (Jekyll, Hugo) are one example. Sometimes you can get away with markdown but often you end up marking up pages of text.

    • pineconewarrior 4 years ago

      Even when you need to write actual HTML you still should use shorthand tools like emmet to write your markup faster and with less mistakes.

EugeneOZ 4 years ago

XML is beautiful and clean, and I prefer to write full closing tags.

  • anjbe 4 years ago

    It’s funny how people’s aesthetic sensibilities can differ. Making use of HTML’s standard features to drop unnecessary elements and closing tags is very much in line with my own idea of “beautiful” and “clean.”

    Do you consider any table that doesn’t explicitly declare <tbody> “unclean”? That’s an implicit element in every <table>, according to the spec.

  • enriquto 4 years ago

    Of course, of course; but here they are talking about HTML (i.e., about HTML5), not about XML.

    • tannhaeuser 4 years ago

      I've given up to try and educate XML heads that XML is just a proper subset of SGML, just as HTML is originally, and mostly still, an SGML vocabulary. Idk what people are talking about in this thread (seems to be about each one's personal preferences and wildly speculative assumptions about backtracking when in reality both SGML and WHATWG are deterministic); while there is exactly one reference to WHATWG at this time.

    • nayuki 4 years ago

      HTML has a dialect in XML called XHTML. It is obscure but actually works. My website is a living example.

irrational 4 years ago

This post needs an OCD trigger warning.

  • ThatIsntOCD 4 years ago

    "OCD" as in "I don't like clutter" or real OCD as in "if I don't clear away the clutter my family will die in a car crash, I know that's illogical, and yet I'm still encumbered with the intrusive thought?"

    • irrational 4 years ago

      OCD as in “not having closing tags matching open tags is driving me insane”. Maybe OCD isn’t the proper term, but I don’t know of a better one.

      • ThatIsntOCD 4 years ago

        Respectfully, please try to refrain from using OCD casually. It's not like you're the only one, but it's a debilitating disease.

        • irrational 4 years ago

          What is a better term to use that means when things are not perfectly matched it drives me so insane that I can't function until I go in there and fix it so that everything is exactly right?

eatsyourtacos 4 years ago

Rite HTML Wright

(sorry)

math_dandy 4 years ago

Keep calm and Prettier on.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection