JXL Art is the practice of using
JPEG XL’s prediction tree to
generate art. If you have questions, you can join the #jxl-art channel on
the JPEG XL Discord.
The oversimplified summary is that JPEG XL has a modular mode that divides the image it encodes into squares called groups, up to 1024x1024 each. JPEG XL uses a prediction tree to make a prediction what the value of each pixel is in such a square, based on neighboring pixels and their gradients. As a result only the difference (or error) between the actual image and the prediction needs to be encoded. The better the predictions, the smaller the error, the more compressible the data, the smaller the file size. Profit.
In the context of JXL art, however, the error is always assumed to be zero, which makes the image consist of only the prediction tree. That means the predictions effectively generate the image. The process of creating JXL art is writing that prediction tree. A prediction tree is a tree of if-else statements, branching on different properties and selecting a predictor for each leaf of the tree. The flexibility of these prediction trees is intentionally limited because they need to be small and execution needs to be fast, as it is part of the image decoding process. The prediction tree is run for every channel (red, green and blue) and for every pixel. Some predictors incorporate the values of the current pixel’s neighbors, specifically to the left, top-left, top and top-right, which dictates in which order the pixels have to be predicted: Row-by-row, top-to-bottom, left-to-right — just like Westerners read text.
A program starts with an optional header that specifies image properties and transformations to apply. The default header looks like this (everything is optional):
Width 1024: Width of the image (frame)Height 1024: Height of the image (frame)-
RCT 0: Reversible Color Transform.- 0 is no transform, i.e. RGB.
- 6 is YCoCg.
- Higher numbers can be used (up to 42 excluded).
-
Orientation 0: Image rotation/flip as specified by EXIF.- = 0 degrees
- = 0 degrees, mirrored
- = 180 degrees
- = 180 degrees, mirrored
- = 90 degrees
- = 90 degrees, mirrored
- = 270 degrees
- = 270 degrees, mirrored
-
XYB: Use XYB color space instead of RGB (not enabled by default). -
CbYCr: Use YCbCr color space. Channel 0 becomes Cb, channel 1 is Y, channel 2 is Cr (not enabled by default). -
GroupShift 3: Set the group size to128 << GroupShift. Values 0-3 are valid. -
Bitdepth 8: Self-explanatory. Other bit depths can be used, from 1 to 31. -
FloatExpBits 3: Numbers are interpreted as IEEE floats with this many bits for the exponent (not enabled by default). Alpha: Add a fourth channel (c == 3) for Alpha (not enabled by default).-
Squeeze: Apply the Squeeze transform (not enabled by default). Weird things will happen and the image gets many channels; keep predictor values low to avoid blocky images. -
FramePos 0 0: The frame position is set to this (x0, y0) position. The image canvas size also gets adjusted so the bottom right corner of the frame remains in the bottom right corner of the image. This is mostly useful with negative values, e.g.FramePos -100 -200, which has the effect of hiding the first 100 columns and the first 200 rows. -
NotLast: This is not the last frame/layer (not enabled by default). This flag can be used to do multi-layer images. After encoding this layer, another layer will be encoded, which gets alpha-blended over the first layer and which can itself also have the NotLast option (there is no limit on the number of layers you can create this way). Every layer gets its own tree, so when this flag is used, you should specify not just one tree, but (at least) two. You can change the RCT between layers (they are local transforms), but not the Bitdepth, XYB, Orientation or presence of Alpha, since those are file/global properties/transforms. (without Alpha it is not very useful at the moment to do layering, since only alpha-blending can currently be done, though this will change when we add support for other blend modes and/or animation) -
Spline [4 * 32 coefficients] x0 y0 x1 y1 ... xn yn EndSpline: Draws a spline that goes through the points (x0,y0), (x1,y1), ..., (xn,yn) and that has a color and thickness given by 4 times 32 numbers. These 32 numbers are 1D-DCT32 coefficients and floating point numbers, where the first number is the DC (i.e. the average value) and the next numbers correspond to increasing frequencies. The first series of 32 numbers corresponds to the color of the first channel (e.g. Red), where 1.0 is the maximum value. The second series corresponds to the second channel (e.g. Green), the third to the third channel (e.g. Blue). The final series of 32 numbers defines the 'thickness' in pixels (note that it's more like a blur radius than a thickness of a solid line). The spline gets 'added' to the frame, so negative numbers (both in colors and in thickness) correspond to darkening while positive numbers correspond to brightening.
It is then followed by a tree description, which starts with a decision node — an if-else-like statement. Technically you can also just give a single predictor, but that’s rarely interesting. A decision node looks like this:
if [property] > [value:int] (THEN BRANCH) (ELSE BRANCH)
Both the THEN branch and the ELSE branch can
either be another decision node or a leaf node.
The following properties can be used in a decision node:
-
c: the channel number, where 0=R, 1=G, 2=B, 3=A (if Alpha was enabled in the header) -
g: the group number (useful in case the image is larger than one group). Modular group numbers usually start with 21. x,y: coordinatesN: value of pixel above (north)W: value of pixel to the left (west)|N|: absolute value of pixel above (north)|W|: absolute value of pixel to the left (west)-
W-WW-NW+NWW: basically the error of the gradient predictor for the pixel on the left W+N-NW: value of gradient predictor (before clamping)-
W-NW: left minus topleft, i.e. error of theNpredictor for the pixel on the left -
NW-N: topleft minus top, i.e. error ofWpredictor for the pixel above -
N-NE: top minus topright, i.e. error ofWpredictor for pixel on top right -
N-NN: top minus toptop, i.e. error ofNpredictor for pixel above -
W-WW: left minus leftleft, i.e. error ofWpredictor for the pixel on the left WGH: signed max-absval-error of the weighted predictor-
Prev: the pixel value in this position in the previous channel -
PPrev: the pixel value in this position in the channel before the previous channel -
PrevErr: the difference between pixel value and theGradient-predicted value in this position in the previous channel -
PPrevErr: likePrevErr, but for the channel before that -
PrevAbs, PPrevAbs, PrevAbsErr, PPrevAbsErr: same as the above, but the absolute value (not the signed value)
Leaf nodes are of the following form:
- [predictor] +-[offset:int]
The following predictors are supported in leaf nodes:
-
Set: always predicts zero, so effectively sets the pixel value to [offset] W: value of pixel on the leftN: value of pixel aboveNW: value of topleft pixelNE: value of topright pixel-
WW: value of pixel to the left of the pixel on the left Select: predictor from lossless WebP-
Gradient:W+N-NW, clamped tomin(W,N)..max(W,N) -
Weighted: weighted sum of 4 self-correcting subpredictors based on their past performance (warning: not clamped so can get out of range) -
AvgW+N,AvgW+NW,AvgN+NW,AvgN+NE: average of two pixel values -
AvgAll: weighted sum of various pixels:(6 * top - 2 * toptop + 7 * left + 1 * leftleft + 1 * toprightright + 3 * topright + 8) / 16
Edge cases:
-
If
x=y=0,Wis set to 0. Otherwise, ifx=0,Wis set toN. - If
y=0,Nis set toW. -
If
x=0ory=0,NWis set toW. -
Similarly,
NEandNNfall back toNandWWfalls back toW.