What is the issue with the HTML Standard?
Blink and Webkit (Chrome and Safari) do not follow the distinction between missing and empty public and system identifiers.
An issue was filed with the Chromium project on 21 August 2024. The issue was verified, and one response was:
(It's also possible we should just change the spec and ask Firefox to change; that might be a less risky path forward.)
I believe this issue has always been present in Chrome and Safari (it appears to remain in Blink and Webkit). Given that it's gone virtually unnoticed during the lifetime of the Chromium project, the suggestion to remove this distinction from the specification and seems reasonable.
DOCTYPE tokens have a name, a public identifier, a system identifier, and a force-quirks flag. When a DOCTYPE token is created, its name, public identifier, and system identifier must be marked as missing (which is a distinct state from the empty string)
The parsing rules for DOCTYPE tokens in the "initial" insertion mode include the following:
- The system identifier is missing and the public identifier starts with: "-//W3C//DTD HTML 4.01 Frameset//"
- The system identifier is missing and the public identifier starts with: "-//W3C//DTD HTML 4.01
- The system identifier is not missing and the public identifier starts with: "-//W3C//DTD HTML 4.01 Frameset//"
- The system identifier is not missing and the public identifier starts with: "-//W3C//DTD HTML 4.01 Transitional//"
These rules are used to set quirks mode or limited-quirks mode on the document.
The discrepancy in the behavior can be seen in examples like the following:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Frameset//" "">The public identifier is "-//W3C//DTD HTML 4.01 Frameset//" and the system identifier is the empty string (not missing).
Firefox (correctly) reports document.compatMode === 'CSS1Compat'.
Chrome and Safari both report document.compatMode === 'BackCompat'. This does not comply with the specification. They treat the empty string "" the same as a missing system identifier like in the following example:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Frameset//">Notice the difference:
-<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Frameset//" ""> +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Frameset//">
All major browsers correctly handle the missing system identifier.