Hi,
HAProxy 2.8.0 was released on 2023/05/31. It added 27 new commits
after version 2.8-dev13.
Only a small minor issues were addressed this time, the rest was
mostly doc polishing and cleanups. 2.8 is entering LTS status and will
be supported till 2028-Q2, and 2.9-dev0 was just created to pursue the
development, with an expected release around end of November this year.
Let's try to summarize the changes from 37 participants in the 1382
commits that were merged since 2.7.0 from a high level perspective:
- Lua/Mailers: there's now a full-Lua implementations of the mailers
subsystem. It's provided as a Lua script (examples/lua/mailers.lua)
which relies on the new internal event notification API. As such the
script subscribes to server state change events and emits mails when
the defined criteria are matched. It continues to rely on the
"mailers" section, but being a Lua script, it's totally customizable.
You can imagine to change the contents, change the notification
conditions, send to multiple destinations etc. With this change, the
internal Lua view of the servers was made fully dynamic so that added
or removed servers are always seen in their current state. In fact the
new event notification API goes way beyond this but better read the Lua
API documentation to know more. The next step will be to completely
deprecate the old Mailers subsystem in 2.9 and 3.0 and to remove it in
3.1.
- HTTP/2 is advertised by default in ALPN on TLS listeners. It was about
time, 5 years have passed since it was introduced, it's been enabled by
default in clear text as an HTTP/1 upgrade for 4 years, yet some users
do not know how to enable it. From now on, ALPN defaults to "h2,http/1.1"
on TCP and "h3" on QUIC so that these protocol versions work by default.
It's still possible to set/reset the ALPN to disable them of course. The
old concern some users were having about window sizes was addressed by
having a setting for each side (front vs back).
- Threading: thread groups are now usable by default by "bind" lines
without requiring to replicate these lines once per thread group. This
means that by default a bind line is bound to all threads, regardless
of the number of groups (up to 64 groups of 64 threads or 4096 threads
total). As such it becomes possible to enable multiple groups on a large
system to benefit from all the processing power available if you're
running heavy rules, Lua, compression, SSL or whatever. We still default
to a single NUMA node because the cases where it brings solid benefits
are not frequent enough, compared to the cost of having more listening
sockets. Note that on systems with non-uniform L3 caches like AMD EPYC,
this can bring important performance gains with only one setting in the
config. We noticed a doubling of the request rate on a 24-core EPYC 74F3
by enabling 8 groups instead of the default 1, to map to the L3 cache
topology. The maximum tested so far was 224 threads with 4 & 8 groups on
a dual-socket intel Sapphire Rapids system. That was blazingly fast :-)
- SSL: there are quite a bunch of updates on the SSL front in this release:
- it's possible to adjust the signature algorithms to improve
interoperability with some other TLSv1.2/1.3 clients. These
algorithms are used to sign the ephemeral keys used during the
handshake. Changing these algorithms are useful for buggy clients
that negociate algorithms they don't support. Though the usage is
very specific. It's also possible to adjust this parameter for
Client Authentication.
- SSL hanshake failure logs now dump the OpenSSL error string by
default. No need to configure an error-log-format anymore to show
details on the handshake error. It can be helpful to debug SSL
problems (e.g. you'll now see "tlsv1 alert unknown ca" instead
of just "SSL handshake failure").
- OCSP: in 2.8 the OCSP responses for certificates can be automatically
updated by a background task (by default every 5 minutes) so that it is
no longer necessary to feed them over the CLI from an external script.
Of course, this requires that your load balancers have outgoing HTTP
access. This is enabled in crt-list files by adding "ocsp-update on" on
the certificate's line. All this is observable on the CLI via
"show ssl-ocsp-update" and "show ssl-ocsp-response".
- LetsEncrypt: there's an acme.sh script in admin/acme.sh that can be used
with your existing deployments (pull request for upstream still pending).
It will permit to handle the renewal of LE certificates in stateless mode
with no hassle (no need to proxy to a local port anymore).
- OpenSSL: version 3.1 is now supported. It's less slow than 3.0 but still
significantly slower than 1.1.1, but might be usable for most users with
a low enough traffic.
- wolfSSL: we've worked quite a bit with the wolfSSL team to make sure
their latest version works well with HAProxy. As expected with such
type of integration, there have been some rough edges at the beginning
but we've now reached a point where their current release (5.6.0) works
for simple setups, and their latest development branch (some PRs still
under review) covers most of HAProxy's features. We're sufficiently
confident in the fact that the last adjustments to be made will be in
the lib (we're still working hand-in-hand with them to polish everything)
and that the HAProxy side will not change for this. That's particularly
important because it means that as new wolfSSL releases will appear
in the next few weeks/months, stable HAProxy 2.8 releases will continue
to work with it, or maybe even work better. From our testing, there are
two nice aspects of this lib compared to OpenSSL:
- it's fast and scales really well on multi-processor machines
(2.5 times OpenSSL 3.1's performance on a 24-core machine)
- it natively supports QUIC
For these two reasons alone we do expect to encounter it increasingly
frequently as users start to migrate from distros based on OpenSSL
1.1.1 to distros based on 3.0 with no option to rollback to 1.1.1
after they discover they need to multiply the number of LBs by 4 just
to compensate for design flaws in a security library.
- QUIC: it has been running almost flawlessly for a year on haproxy.org,
and totally flawlessly over the last 6 months. We also owe @Tristan971
a huge kudos for deploying it live on significantly more traffic, and
reporting countless issues. The internal architecture experienced the
last few changes that we estimated were necessary, and we're confident
that it's in a totally maintainable form now. Does it mean it's totally
free of bugs ? Of course not, but in my opinion it reached the same
level of stability as H2 had in 2.0 or 2.2, which is already pretty
good. At this point we're only aware of a case which affects a small
but non-negligible percentage of users' response time for Tristan,
without being able to reproduce it out of his infrastructure. We're
still on it of course, but despite this minor glitch we now consider
it production-ready, which means that we're not seeing a good reason
to stay away from it now if it brings benefits to your web site (e.g.
visitors over lossy networks etc). For sure the SSL dependencies are
still a constraint for the vast majority of those relying on OpenSSL,
but with 3.0's performance ruined, even non-QUIC users have to rebuild
anyway, so OpenSSL is no more a QUIC-only problem nowadays. What 2.8
brings to QUIC is a lot of stuff (mostly backported to 2.7), the
support for reloads by default, and a global kill-switch to disable
it entirely in case of doubt, issue or just to confirm whether or not
an observed issue comes from it or not.
- Stick-tables: the maximum number of parallel stick-counters used to be
set at build time (default 3). Now it can be changed in the configuration
using global.tune.stick-counters.
- HTTP compression: now HTTP request body can be compressed. This is
useful when you deal with many POSTs and your origin servers are on
a different hosting area that makes your traffic pass over paid links!
- HTTP "Forwarded" header field (RFC7239): this header that aims at
replacing X-forwarded-for and friends is now supported, in input and
output. It means we can complement it with certain parts (host, by,
by_port, for, for_port). The benefit of using this one instead of the
other is not always obvious, until you start to mix different products
in your edge access and figure that they don't all add the same set of
headers, and that for the application to figure which instance goes
with which one, it's a nightmare. "Forwarded" conveys an ordered list
of items so the ordering becomes as easy as it was when dealing with
X-forwarded-for alone.
- JWT now supports the RSA-PSS algorithm
- There are a few reliability improvements:
- Lua now has a burst-timeout setting which controls how long it can
run a loop in non-yieldable context (e.g. converter function) and it
will abort past this delay
- binding errors faced during a reload could sometimes fail to resume
on the old process (e.g. UNIX sockets). Now the mechanism was made
more reliable, with the new process taking more care of old sockets
until it manages to bind everything, and being able to roll them back
entirely on error.
- new metrics in show info to report the number of config warnings,
the boot time and the number of times the global maxconn was reached.
- the internal clock now wraps 20s after the boot, and not just every
49.7 days. This makes sure that developers have a better chance of
facing clock-wrapping related bugs before they hit your production.
And it worked, we found something like 8 of them, most likely all
in fact.
- the internal connection handling was revisited so that low-level
errors are more accurately reported through the layers. There should
be less case where some termination codes will be reported for a
different condition when errors arrive together.
- There were some performance improvements as well:
- those mixing short and long connections might end up with unequal thread
loads because incoming connections assigned to the least loaded thread
could be off after short connections are gone and long ones are left on
only some threads. A new queue load balancing algorithm "fair" resolves
this by applying a round-robin to the threads.
- rings used by traces are being used increasingly as a debugging aid by
both users and developers. They're now much faster (2-3x). The support
for the "trace" keyword in the global section is still marked
experimental because some forthcoming changes are envisionned for 2.9
to almost completely remove the locking, and it may slightly affect the
on-disk format for file-backed maps.
- sometimes an old stopping process making heavy use of stick-tables
could consume insane amounts of CPU almost entirely spent in the libc's
malloc_trim() function (or in free/malloc due to locking contention).
This was addressed and stick-table memory releasing on stopping will
no happen in small, almost unnoticeable batches.
- We know that users love troubleshooting tools (developers do as well),
so here's some new stuff to play with:
- "show quic" is to QUIC what "netstat" or "ss" are to TCP. It also
supports a detailed format.
- "show fd" can now filter on certain types (e.g. dump front sockets
only, or UNIX sockets only)
- H2 traces can at last show the received HTTP headers!
- the CLI supports the process' uptime in the prompt. There's little
use for this except for those who want to instantly spot when their
LBs have rebooted (or failed to).
- thread dumps in the panic output and "show activity" are now unlimited
in length. That was becoming critical with buffers filling around 60
threads...
- crashes when facing a bogus condition ("BUG_ON") will now produce an
"illegal instruction" instead of "segmentation fault" on architectures
supporting this (i386, x86_64, arm64 for now). This will improve the
ability to diagnose what happened and the quality of bug reports.
- There were a few updates to the configuration (cpu-map now supports
commas, http-after-response supports more actions, sc-add-gpc() to
increment a GPC by a fixed value, ability to ignore case when fetching
a request parameter, httpclient supports disabling resolvers, enabled()
preprocessor macro to enable config blocks only when features are
supported)
- There are also a few unlikely but possibly breaking changes:
- option httpclose in the frontend no longers triggers a close in the
backend and conversely.
- fixed typo in "show info" ("TotalSplicdedBytesOut" is now properly
spelled "TotalSplicdedBytesOut"). Only affects the CLI, not Prometheus.
- ALPN as mentioned above is now presented by default in HTTP to enable
HTTP/2 over TCP+SSL and HTTP/3 over QUIC.
- For packagers, the build system is more flexible now with every single
build option supporting its own CFLAGS and LDFLAGS (e.g. convenient when
trying to force to use a static version of a lib).
And as usual, this summary doesn't do justice to all those having worked
hard on invisible things to make all this possible, nor those who spend a
lot of time helping users who report issues and ask for help, and those
who take the time to report cleanly documented issues as well! Thanks to
them for their efforts!
Please find the usual URLs below :
Site index : https://www.haproxy.org/
Documentation : https://docs.haproxy.org/
Wiki : https://github.com/haproxy/wiki/wiki
Discourse : https://discourse.haproxy.org/
Slack channel : https://slack.haproxy.org/
Issue tracker : https://github.com/haproxy/haproxy/issues
Sources : https://www.haproxy.org/download/2.8/src/
Git repository : https://git.haproxy.org/git/haproxy-2.8.git/
Git Web browsing : https://git.haproxy.org/?p=haproxy-2.8.git
Changelog : https://www.haproxy.org/download/2.8/src/CHANGELOG
Dataplane API :
https://github.com/haproxytech/dataplaneapi/releases/latest
Pending bugs : https://www.haproxy.org/l/pending-bugs
Reviewed bugs : https://www.haproxy.org/l/reviewed-bugs
Code reports : https://www.haproxy.org/l/code-reports
Latest builds : https://www.haproxy.org/l/dev-packages
Willy
---
Complete changelog since 2.8-dev13:
Amaury Denoyelle (7):
CLEANUP: mux-quic: remove unneeded fields in qcc
MINOR: mux-quic: remove nb_streams from qcc
MINOR: quic: fix stats naming for flow control BLOCKED frames
BUG/MEDIUM: mux-quic: only set EOI on FIN
DOC: quic: remove experimental status for QUIC
CLEANUP: mux-quic: rename functions for mux_ops
CLEANUP: mux-quic: rename internal functions
Aurelien DARRAGON (2):
BUILD: init: print rlim_cur as regular integer
DOC: config: fix rfc7239 converter examples
Christopher Faulet (2):
MINOR: compression: Improve the way Vary header is added
DOC: config: Fix bind/server/peer documentation in the peers section
Frédéric Lécaille (1):
MINOR: quic: Add QUIC connection statistical counters values to "show
quic"
Patrick Hemmer (1):
MINOR: init: pre-allocate kernel data structures on init
William Lallemand (2):
DOC: install: add details about WolfSSL
DOC: install: specify the minimum openssl version recommended
Willy Tarreau (10):
BUILD: makefile: search for SSL_INC/wolfssl before SSL_INC
BUG/MEDIUM: threads: fix a tiny race in thread_isolate()
BUG/MINOR: mux-h2: refresh the idle_timer when the mux is empty
BUILD: Makefile: use -pthread not -lpthread when threads are enabled
CLEANUP: doc: remove 21 totally obsolete docs
DOC: install: mention the common strict-aliasing warning on older
compilers
DOC: install: clarify a few points on the wolfSSL build method
EXAMPLES: update the basic-config-edge file for 2.8
MINOR: quic/cli: clarify the "show quic" help message
MINOR: version: mention that it's LTS now.
eaglegai (2):
BUG/MINOR: ssl_sock: add check for ha_meth
BUG/MINOR: thread: add a check for pthread_create
---