Temporenc, comprehensive binary encoding format for dates and times

This section describes how the components and types of the temporenc model are encoded into a byte string. Encoding is done in two stages: encoding individual components, followed by packing the encoded components together to construct the encoded value as a byte string.

Encoding individual components

In the first stage, each component is encoded separately, resulting in an array of bits. The rules for encoding components are the same for all types. For representing numbers as bit strings, temporenc always uses unsigned big-endian notation, e.g. encoding the number 13 into 5 bits results in the bit string 01101 (8 + 4 + 1).

Date component (`D`)

The date component (D) always uses 21 bits, divided in three groups:

Year (12 bits)

An integer in the range 0–4094 (both inclusive); the special value 4095 means no value is set.
Month (4 bits)

An integer in the range 0–11 (both inclusive); the special value 15 means no value is set. January is encoded as 0, February as 1, and so on. Note that this is off-by-one compared to human month numbering.
Day (5 bits)

An integer in the range 0–30 (both inclusive); the special value 31 means no value is set. The first day of the month is encoded as 0, the next as 1. Note that this is off-by-one compared to human day numbering.

Examples:

Format	Value	Year	Month	Day
year, month, day	1983-01-15	`011110111111`	`0000`	`01110`
year, month	1983-01	`011110111111`	`0000`	`11111`
year	1983	`011110111111`	`1111`	`11111`
month, day	01-15	`111111111111`	`0000`	`01110`

Time component (`T`)

The time component (T) always uses 17 bits, divided in three groups:

Hour (5 bits)

An integer in the range 0–23 (both inclusive); the special value 31 means no value is set.
Minute (6 bits)

An integer in the range 0–59 (both inclusive); the special value 63 means no value is set.
Second (6 bits)

An integer in the range 0–60 (both inclusive); the special value 63 means no value is set. Note that the value 60 is supported because it is required to correctly represent leap seconds.

Examples:

Format	Value	Hour	Minute	Second
hour, minute, second	18:25:12	`10010`	`011001`	`001100`
hour, minute	18:25	`10010`	`011001`	`111111`

Sub-second precision time component (`S`)

The sub-second time precision component (S) is expressed as either milliseconds (ms), microseconds (µs), or nanoseconds (ns). Each precision requires a different number of bits of storage space. This means that unlike the other components, this component uses a variable number of bits, indicated by a 2-bit precision tag, referred to as P.

Milliseconds (10 bits value, 2 bits tag, 12 bits in total)

An integer in the range 0–999 (both inclusive) represented as 10 bits. The precision tag P is 00.
Microseconds (20 bits value, 2 bits tag, 22 bits in total)

An integer in the range 0–999999 (both inclusive) represented as 20 bits. The precision tag P is 01.
Nanoseconds (30 bits value, 2 bits tag, 32 bits in total)

An integer in the range 0–999999999 (both inclusive) represented as 30 bits. The precision tag P is 10.
Empty sub-second precision (0 bits value, 2 bits tag, 2 bits in total)

The precision tag P is 11, and no additional information is encoded. Note that if no sub-second precision time component is required, using a type that does not include this component at all is more space efficient, e.g. DTZ instead of DTSZ.

Examples:

Precision	Value	Precision tag	ms/µs/ns
milliseconds	123 ms	`00`	`0001111011`
microseconds	123456 µs	`01`	`00011110001001000000`
nanoseconds	123456789 ns	`10`	`000111010110111100110100010101`
none	(not set)	`11`	(nothing)

Time zone offset component (`Z`)

The time zone offset component (Z) always uses 7 bits. When a temporenc type with a time zone offset component is used, the date (D) and time (T) components are stored in UTC. This means that implementations must convert a date/time value to its UTC equivalent first. This ensures that the encoded values can be sorted properly, regardless of their time zone.

Temporenc uses UTC offsets (usually written as ±HH:MM) to represent time zone information. The UTC offset is expressed as the number of 15 minute increments from UTC, with the constant 64 added to it to produce a positive integer, i.e. (offset_in_minutes / 15) + 64. The resulting number must be in the range 0–125 (both inclusive). The special value 127 means no value is set.

The special value 126 means that this value does carry time zone information, but that it is not expressed as an embedded UTC offset. This makes it possible to use more elaborate time zone handling with temporenc values, for example using geographical identifiers from the tzdata project. The actual inclusion of additional time zone information is outside the scope of temporenc; the value 126 is just an indicator that time zone information is handled externally.

Examples:

Offset	Offset	Encoded value	Encoded value
(±hh:mm)	(15m increments)	(decimal)	(bits)
+00:00	0	64	`1000000`
+01:00	4	68	`1000100`
−06:00	−24	40	`0101000`

Packing encoded components

The second encoding stage is about packing the encoded components into the final byte string. An encoded temporenc value is basically a concatenation of the bit strings for each component. The exact packing format depends on the type, which means each type has its own bit packing rules. Each type is assigned a unique type tag, which is a short identifying bit string included in the first byte of the encoded value. The advantages of this approach are:

Encoded values are self-describing.
The total size of encoded values is very small.
Encoded values of the same type (and precision) can be sorted lexicographically.
A decoder needs only the first byte to determine the total size and layout of the complete value, which allows for decoding values from a stream without the need for framing (specifying the length).

The table below specifies the type tag for each type, and the order used for the concatenation of the encoded components:

Type	Type tag	`P`	`D`	`T`	`S`	`Z`	Padding
`D`	`100`		✓
`T`	`1010000`			✓
`DT`	`00`		✓	✓
`DTZ`	`110`		✓	✓		✓
`DTS`	`01`	✓	✓	✓	✓		✓ (if needed)
`DTSZ`	`111`	✓	✓	✓	✓	✓	✓ (if needed)

The general approach for creating the final byte strings, as detailed in the next subsection, is as follows:

Start with an empty bit array.
Concatenate the type tag.
Concatenate each included component, including the sub-second precision tag P (if any).
Pad the bit array with zeroes to align it to the next multiple of 8, i.e. to the next byte boundary (only for types with sub-second precision, and only if needed).
Return the bit array as a byte string.

The remainder of this section specifies the exact byte layout for each encoded temporenc type, including examples showing both bit strings and bytes (hexadecimal notation).

Type `D` (date)

The type tag is 100. Encoded values use 3 bytes in this format:

100DDDDD DDDDDDDD DDDDDDDD

Example: 1983-01-15 is encoded as 10001111 01111110 00001110 (bits) or 8f 7e 0e (hex bytes).

Type `T` (time)

The type tag is 1010000. Encoded values use 3 bytes in this format:

1010000T TTTTTTTT TTTTTTTT

Example: 18:25:12 is encoded as 10100001 00100110 01001100 (bits) or a1 26 4c (hex bytes).

Type `DT` (date + time)

The type tag is 00. Encoded values use 5 bytes in this format:

00DDDDDD DDDDDDDD DDDDDDDT TTTTTTTT
TTTTTTTT

Example: 1983-01-15T18:25:12 is encoded as 00011110 11111100 00011101 00100110 01001100 (bits) or 1e fc 1d 26 4c (hex bytes).

Type `DTZ` (date + time + time zone offset)

The type tag is 110. Encoded values use 6 bytes in this format:

110DDDDD DDDDDDDD DDDDDDDD TTTTTTTT
TTTTTTTT TZZZZZZZ

Note that the D and T components must be stored as UTC.

Example: 1983-01-15T18:25:12+01:00 is encoded as 11001111 01111110 00001110 10001011 00100110 01000100 (bits) or cf 7e 0e 8b 26 44 (hex bytes).

Type `DTS` (date + time with sub-second precision)

The type tag is 01, followed by the precision tag P. Values are zero-padded on the right up to the first byte boundary.

For millisecond (ms) precision, encoded values use 7 bytes in this format:
```
01PPDDDD DDDDDDDD DDDDDDDD DTTTTTTT
TTTTTTTT TTSSSSSS SSSS0000
```
Example: 1983-01-15T18:25:12.123 (millisecond precision) is encoded as 01000111 10111111 00000111 01001001 10010011 00000111 10110000 (bits) or 47 bf 07 49 93 07 b0 (hex bytes).
For microsecond (µs) precision, encoded values use 8 bytes in this format:
```
01PPDDDD DDDDDDDD DDDDDDDD DTTTTTTT
TTTTTTTT TTSSSSSS SSSSSSSS SSSSSS00
```
Example: 1983-01-15T18:25:12.123456 (microsecond precision) is encoded as 01010111 10111111 00000111 01001001 10010011 00000111 10001001 00000000 (bits) or 57 bf 07 49 93 07 89 00 (hex bytes).
For nanosecond (ns) precision, encoded values use 9 bytes in this format:
```
01PPDDDD DDDDDDDD DDDDDDDD DTTTTTTT
TTTTTTTT TTSSSSSS SSSSSSSS SSSSSSSS
SSSSSSSS
```
Example: 1983-01-15T18:25:12.123456789 (nanosecond precision) is encoded as 01100111 10111111 00000111 01001001 10010011 00000111 01011011 11001101 00010101 (bits) or 67 bf 07 49 93 07 5b cd 15 (hex bytes).
In case the sub-second precision component has no value, encoded values use 6 bytes in this format:
```
01PPDDDD DDDDDDDD DDDDDDDD DTTTTTTT
TTTTTTTT TT000000
```
Example: 1983-01-15T18:25:12 (no precision) is encoded as 01110111 10111111 00000111 01001001 10010011 00000000 (bits) or 77 bf 07 49 93 00 (hex bytes).

Type `DTSZ` (date + time with sub-second precision + time zone offset)

The type tag is 111, followed by the precision tag P. Values are zero-padded on the right up to the first byte boundary. Note that the D and T components must be stored as UTC.

For millisecond (ms) precision, encoded values use 8 bytes in this format:
```
111PPDDD DDDDDDDD DDDDDDDD DDTTTTTT
TTTTTTTT TTTSSSSS SSSSSZZZ ZZZZ0000
```
Example: 1983-01-15T18:25:12.123+01:00 (millisecond precision) is encoded as 11100011 11011111 10000011 10100010 11001001 10000011 11011100 01000000 (bits) or e3 df 83 a2 c9 83 dc 40 (hex bytes).
For microsecond (µs) precision, encoded values use 9 bytes in this format:
```
111PPDDD DDDDDDDD DDDDDDDD DDTTTTTT
TTTTTTTT TTTSSSSS SSSSSSSS SSSSSSSZ
ZZZZZZ00
```
Example: 1983-01-15T18:25:12.123456+01:00 (microsecond precision) is encoded as 11101011 11011111 10000011 10100010 11001001 10000011 11000100 10000001 00010000 (bits) or eb df 83 a2 c9 83 c4 81 10 (hex bytes).
For nanosecond (ns) precision, encoded values use 10 bytes in this format:
```
111PPDDD DDDDDDDD DDDDDDDD DDTTTTTT
TTTTTTTT TTTSSSSS SSSSSSSS SSSSSSSS
SSSSSSSS SZZZZZZZ
```
Example: 1983-01-15T18:25:12.123456789+01:00 (nanosecond precision) is encoded as 11110011 11011111 10000011 10100010 11001001 10000011 10101101 11100110 10001010 11000100 (bits) or f3 df 83 a2 c9 83 ad e6 8a c4 (hex bytes).
In case the sub-second precision component has no value, encoded values use 7 bytes in this format:
```
111PPDDD DDDDDDDD DDDDDDDD DDTTTTTT
TTTTTTTT TTTZZZZZ ZZ000000
```
Example: 1983-01-15T18:25:12+01:00 (no precision) is encoded as 11111011 11011111 10000011 10100100 11001001 10010001 00000000 (bits) or fb df 83 a2 c9 91 00 (hex bytes).

Encoding individual components

Date component (D)

Time component (T)

Sub-second precision time component (S)

Time zone offset component (Z)

Packing encoded components

Type D (date)

Type T (time)

Type DT (date + time)

Type DTZ (date + time + time zone offset)

Type DTS (date + time with sub-second precision)

Type DTSZ (date + time with sub-second precision + time zone offset)

Date component (`D`)

Time component (`T`)

Sub-second precision time component (`S`)

Time zone offset component (`Z`)

Type `D` (date)

Type `T` (time)

Type `DT` (date + time)

Type `DTZ` (date + time + time zone offset)

Type `DTS` (date + time with sub-second precision)

Type `DTSZ` (date + time with sub-second precision + time zone offset)