blog - git - desktop - contact
2026-05-05
Two services running on the same machine. One of them opens a listening
TCP socket bound to localhost, the other one connects to that. They
exchange data. Every now and then, the service that initiated the
connection gets an ECONNRESET while reading data from the socket -- but
no other errors show up in the logs, no crashes, nothing. What's going
on?
Go on to part 2.
A reproducer in the "lab"
Let's start with the "server", i.e. the service that opens the listening socket.
The following program does just that: Create a new TCP socket, wait for
connections, fork into a new process for each request. There's not much
of a "request", though: The server simply dumps 600'000 x bytes to the
client upon connection.
The number 600'000 bears some meaning: It needs to be large enough to trigger the behavior that I want to show. 600 bytes, for example, probably won't work.
Now on to the client: It connects to port 8125 on localhost where our
server is waiting, and then it calls recv() until EOF or error. We'll
get to the --spam flag in a second.
Also:
Let's run the two programs:
[terminal1]$ ./server
[terminal2]$ ./client
Read 600000 bytes, final return value was 0, errno was 0
Nothing spectacular.
But let's use the --spam flag:
$ ./client --spam
Read 600000 bytes, final return value was -1, errno was 104
Connection reset by peer
$ ./client --spam
Read 256000 bytes, final return value was -1, errno was 104
Connection reset by peer
$ ./client --spam
Read 351232 bytes, final return value was -1, errno was 104
Connection reset by peer
$ ./client --spam
Read 351232 bytes, final return value was -1, errno was 104
Connection reset by peer
$ ./client --spam
Read 351232 bytes, final return value was -1, errno was 104
Connection reset by peer
$ ./client --spam
Read 256000 bytes, final return value was -1, errno was 104
Connection reset by peer
$ ./client --spam
Read 600000 bytes, final return value was -1, errno was 104
Connection reset by peer
--spam causes the client to first send some data to the server before
it tries to receive data. And appararently this causes the connection to
break at some point: The client's recv() sees a return value of -1
and errno gets set to 104 = Connection reset by peer.
What tcpdump sees
First, what's "on the wire"?
Okay. So there actually is a TCP RST. Could have also been a programming error or misinterpretation on my part.
What strace ./server sees
The RST originates from the server, so let's attach strace and see what we get:
19:59:03.420432 accept(3, NULL, NULL) = 4
19:59:05.652715 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe43484fa10) = 239546
[pid 239546] 19:59:05.652831 ...
[pid 239546] 19:59:05.652959 mmap(NULL, 602112, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
[pid 239546] 19:59:05.652980 <... mmap resumed>) = 0x7fe43456d000
[pid 239546] 19:59:05.653235 sendto(4, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 600000, 0, NULL, 0) = 600000
[pid 239546] 19:59:05.653474 close(4) = 0
[pid 239546] 19:59:05.653553 exit_group(0) = ?
[pid 239546] 19:59:05.653667 +++ exited with 0 +++
No crash. It forked and used sendto() to dump all the data to the
client. Then it quit.
Also note that sendto() returned the full 600000, so from the
perspective of this program, "all data got sent" (there's a footnote,
obviously, as the manpage explains: "Successful completion of a call to
sendto() does not guarantee delivery of the message. A return value of
-1 indicates only locally-detected errors.").
In fact, there is no difference here whether you use --spam on the
client or not.
What strace ./client --spam sees
19:59:05.652518 connect(3, {sa_family=AF_INET, sin_port=htons(8125), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
19:59:05.652649 sendto(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 100, 0, NULL, 0) = 100
19:59:05.652805 recvfrom(3, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 4096, 0, NULL, NULL) = 4096
19:59:05.653382 recvfrom(3, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 4096, 0, NULL, NULL) = 4096
...
19:59:05.654440 recvfrom(3, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 4096, 0, NULL, NULL) = 4096
19:59:05.654473 recvfrom(3, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 4096, 0, NULL, NULL) = 1024
19:59:05.654506 recvfrom(3, 0x55d23c5b5010, 4096, 0, NULL, NULL) = -1 ECONNRESET (Connection reset by peer)
19:59:05.654575 write(1, "Read 128000 bytes, final return "..., 60) = 60
19:59:05.654694 write(4, "Connection reset by peer\n", 25) = 25
19:59:05.654725 close(4) = 0
19:59:05.654750 close(3) = 0
19:59:05.654783 exit_group(0) = ?
19:59:05.654864 +++ exited with 0 +++
Nothing out of the ordinary here, either. We ran recvfrom() until one
of them returned with -1 / ECONNRESET.
A first hypothesis
Let's test when we get the ECONNRESET. Because, if you scroll back
up, some of the invocations read the full 600'000 bytes and some
returned other values. So there's probably some timing issue here.
Let's do what is most obvious in hindsight: Delay the close() in the
server. Because if anything legitimately creates a RST, it'll probably
be that.
In serve_client(), simply add a sleep(1) before the call to
close():
sleep(1);
if (close(s) == -1)
{
perror("close");
return 1;
}
And there you go, we can clearly see a one second delay now. A tcpdump shows it best:
So, without having dug deeper, here is some speculation:
- The server sees the incoming data on its socket, but it doesn't read them.
- When we call
close()in the server, the socket is "dirty" (because of pending data), so a RST gets fired to (hopefully) tell the client that not all data has been read.
This makes sense to me at the moment, although I haven't found any definitive sources yet to confirm this behavior.
(Initially, one of the hypotheses was that some buffer was filling up,
hence spamlen is set to 100 and it originally was 100'000 -- but that
doesn't matter, a single byte is enough.)
The real-life scenario
gunicorn
serving a
flask
app was running behind
nginx
as a reverse proxy. Sometimes, nginx got these ECONNRESET from
gunicorn. There were also cloud hosters involved and firewalls and lots
of routers and parallel requests and what not, getting derailed by
ioctls and non-blocking sockets, so boiling this down took a while.
Essentially, what nginx does is pass the HTTP request on to gunicorn, but it does so with two syscalls:
09:11:31.254489 writev(29, [{iov_base="POST /reveal/d48z/iha4A9MOMuLW40"..., iov_len=392}], 1) = 392
09:11:31.255435 writev(29, [{iov_base="compat=lynx+needs+this", iov_len=22}], 1) = 22
392 bytes HTTP headers, 22 bytes HTTP body, 414 bytes in total.
gunicorn then reads this data from the socket:
09:11:31.593968 recvfrom(6, "POST /reveal/gEJh/bIoAUdWrSV47mI"..., 8192, 0, NULL, NULL) = 414
However, every now and then, it only got the data from the first
writev() call:
09:11:31.251229 recvfrom(6, "POST /reveal/d48z/iha4A9MOMuLW40"..., 8192, 0, NULL, NULL) = 392
gunicorn (and/or the application running inside of it) was happy to just
get the headers, it didn't care about the body. I assume that this part
of the software stack is "lazy": When there's nothing accessing the body
in the application, it won't bother even recv()-ing it.
Problem is, gunicorn ends these transactions like that:
09:11:31.583979 sendto(6, "HTTP/1.1 200 OK\r\nServer: gunicor"..., 212, 0, NULL, 0) = 212
09:11:31.584225 sendto(6, "\312\205]"..., 614400, 0, NULL, 0) = 614400
09:11:31.590869 close(6) = 0
Send off the data and close the socket. If there's data still pending to be read, this will cause a RST, I think.
The workaround was to have the Python app do some dummy operation on the
HTTP body, to make sure that it got fully read from the socket. So
far, I haven't seen any more ECONNRESET. (Depending on your
application, this can open up the possibility of a DoS, though: If
someone POSTs 10 GB of data on your server and you only have 1 GB of
memory, then you might be in trouble. client_max_body_size in nginx
can probably prevent that.)
Next steps
-
Verify that the
close()call really is the cause of the TCP RST and find a good source for that.-
It might be because of RFC 1122:
A host MAY implement a "half-duplex" TCP close sequence, so that an application that has called CLOSE cannot continue to read data from the connection. If such a host issues a CLOSE call while received data is still pending in TCP, or if new data is received after CLOSE is called, its TCP SHOULD send a RST to show that data was lost.
-
-
Figure out who's to "blame" on the Python side: gunicorn, flask, or the actual flask app. Report upstream.
- It might be gunicorn, where it already had been reported.
To be continued ...
Go on to part 2.

