I was recently working on better support for .properties files in Jar.Tools. Probably every Java developer has worked with this format and knows that it's a simple one. So did I until I started implementing it. This is a kind of list of quirks and interesting cases I faced during this journey.
There are three separators (and one of them is whitespace)
Most people think .properties means key=value. In reality:
key=valuekey:valuekey␠value(one or more spaces or tabs)
All three are valid. That means the following are different lines with the same meaning:
server.port=8080 server.port:8080 server.port 8080
What I validate
Missing separator: if a non‑comment, non‑blank line has no
=,:, or whitespace separator, that’s an error.Empty key: a line that’s just
=or:(or just whitespace before value) is an error for an empty key.=value # ⟵ error: empty key :value # ⟵ error: empty key
What is allowed
Explicit empty values are fine with any separator:
empty.key= empty.key: empty.key␠
All three parse as
empty.keywith an empty string value.
Continuations: odd vs even backslashes, and trailing whitespace
A line ending with a continuation backslash \ joins with the next line. This is where bugs hide:
- Odd number of trailing backslashes → continuation.
- Even number → the last backslash is escaped, so no continuation.
# Continues (odd backslashes at EOL) sql.query=SELECT * FROM users \ # Does NOT continue (even backslashes at EOL) literal.backslash=path ends with \\ # value ends with a single '\'
Trailing whitespace matters
A backslash followed by trailing spaces still behaves as a continuation marker in practice. If the file ends right after that whitespace (no next line), it’s a broken continuation error.
broken.continuation=this ends with a backslash \␠␠␠ # EOF here → error: “Line ends with continuation backslash but file ended.”
Multiline values done right
sql.query=SELECT id, name, email \ FROM users \ WHERE active = true \ ORDER BY name
When parsed, this becomes a single value:
SELECT id, name, email FROM users WHERE active = true ORDER BY name
Duplicates are subtle (case‑sensitive keys)
I treat keys as case‑sensitive and flag all occurrences when the same key appears multiple times:
duplicate.key=first duplicate.key=second duplicate.key=third
All three lines receive a warning that includes the index of every occurrence (e.g., “Duplicate key ‘duplicate.key’ found at: line 2, line 5, line 8”). By contrast:
myKey=one MyKey=two myKey=three
Only the two myKey entries get flagged; MyKey is distinct.
Why warn and not error? Real configs sometimes rely on “last one wins,” but it’s almost never intentional. A warning keeps you honest without breaking builds.
Unicode: \uXXXX escapes, surrogate pairs, and “garbage‑in” behavior
Properties files support \uXXXX escapes. That opens a whole Unicode can: invalid lengths, non‑hex digits, surrogate pairs for emoji, and “unknown” escapes.
Invalid escape sequences
Things like \u123 or \u12G4 show up in the wild. I parse them gracefully—no exceptions—and keep values as close as possible to what the user typed. The validator focuses on not crashing; it doesn’t over‑correct malformed text.
Surrogate pairs for emoji
Escaped emoji like \uD83D\uDE80 (🚀) decode correctly. In UTF‑8 mode I emit a warning (“Unicode escape sequence detected”) because direct Unicode is usually clearer. In ISO‑8859‑1 mode, escapes are often necessary, so I emit no warning.
Standard escapes “just work”
The usual suspects decode as expected:
\t,\n,\r,\f,\\- escaped separators and specials:
\,\:,\=,\#,\!
Unknown single‑letter escapes like \q or \z are treated literally (the backslash disappears, the letter stays). Again: avoid surprising the user.
Encoding modes: UTF‑8 vs ISO‑8859‑1
Historically, Java treated .properties as Latin‑1 (ISO‑8859‑1), with \uXXXX for anything beyond that range. Many modern tools use UTF‑8. To make intent explicit, I let the validator run in either mode.
ISO‑8859‑1 mode
Error on characters outside Latin‑1.
unicode.chinese=你好世界 # error (outside ISO-8859-1) unicode.emoji=🎉🚀 # error valid.iso=café # fine (é is Latin‑1)
\uXXXXfor Latin‑1 letters like\u00e9(é) is allowed and not warned.
UTF‑8 mode
- Direct Unicode is preferred and not warned.
\uXXXXescapes are warned as unnecessary (but still decoded). That includes escapes for ASCII:\u0041→ “A” with a warning.
Pick the mode that matches your runtime, and you’ll get the right balance of errors vs. guidance.
Comments and structure: preserve intent, don’t rewrite history
Lines starting with # or ! are comments. During validation, I:
- Attach leading comments to the next property as
leadingComments. - Keep
rawtext for each entry exactly as read. - Do not escape or normalize anything during validation.
During formatting, I:
Preserve comments as‑is.
Add a consistent
key = valuespacing.Escape
=,:, and spaces inside values so the output remains parsable:# original key=value with = and : chars # formatted key = value with \= and \: chars
This “no touching during validation” rule prevents a whole class of “the linter changed my config” surprises.
Lines that look empty… but aren’t
A sneaky category:
A line that’s only
=or:→ empty key error.A line that’s
key␠␠␠→ a valid key with an explicit empty value (whitespace is the separator).Whitespace around separators with empty values is fine:
A practical checklist (aka mini‑linter rules)
Flag lines with no
=,:, or whitespace separator (error).Flag empty keys (error) but allow explicit empty values.
Handle continuation logic: odd vs even trailing backslashes; treat trailing whitespace after a continuation backslash as continuation; error if EOF cuts it off.
Treat keys as case‑sensitive; warn on duplicates and list all occurrences.
Decode standard escapes; treat unknown escapes literally without crashing.
Support UTF‑8 and ISO‑8859‑1 modes:
- UTF‑8: warn on
\uXXXXas unnecessary. - ISO‑8859‑1: error on out‑of‑range chars; allow
\uXXXXfreely.
- UTF‑8: warn on
Keep validation read‑only; do formatting in a separate step.
Preserve comments and attach them to following entries for context.
Represent multiline values as a single logical value; track start/end lines for tooling.
Closing thoughts
I was planning to be done with .properties files validation in few days tops, but after one week of debugging I realized, that even though it looks simple, real‑world examples mixes legacy encoding rules, permissive separators, escape sequences, and multiline values. I will not touch this format again :)