Troubleshooting slow writes to a Samba share

I host my research/test Windows VMs on Linux, using Samba to share files between systems. One day, while debugging a problem in WinDbg, the debugger froze on loading symbols for combase.dll. I knew that combase.pdb is a big file (contains private symbols), but usually downloading it was not that slow. To make things worse, when I tried to stop the loading, the whole VM hung. I initially suspected Microsoft symbol servers, but trying the combase.pdb URL directly in the browser worked flawlessly. So, the next suspect was my symbols folder which is a symbolic link to a Samba share with the same name (my _NT_SYMBOL_PATH is set to SRV*C:\symbols\dbg*https://msdl.microsoft.com/download/symbols). And, indeed, copying any bigger file from my Windows 11 machine to a Samba share was taking ages. As usual, in such cases, I collected a Wireshark trace of a copy operation and, to my horror, that’s what I saw in the scrollbar:

So many black bars in the timeline are never a good sign. On the other hand, they explained the slow write operation. The first failures were appearing immediately after the SMB2 WriteRequest call (you can see that I was trying to copy a comon (database :)):

Interestingly, there were no issues with reads from the share. I also need to add that I use QEMU/KVM and Virtual Machine Manager to run my VMs. The VMs use a network bridge to connect with my home network and Internet. And this bridge was my next suspect, so during the next copy operation, I collected a Wireshark trace on both Windows 11 (as before) and Linux, tracing the bridge interface. It was time to meticulously analyze the TCP requests. After disabling the NBSS (NetBIOS Session Service) in Wireshark dissectors, the first erroneous packets on Windows 11 (IP: 192.168.88.199) looked as follows:

In the first successful packet (length 162), the bridge (IP: 192.168.88.200) acknowledges receiving a TCP packet with a sequence number 2554, and in the second packet (length 124), Windows 11 acknowledges receiving a TCP packet with the sequence number 1596. Then, we can see that the VM sends two additional packets with sequence numbers 1596 and 19116. The latter has the length of 1460, so the expected Ack should be 20576 (19116 + 1460). Instead, the bridge resent a packet with Ack set to 1596. If we now look at the network trace collected on the bridge, we can immediately spot the problem:

Apparently, the bridge did not receive the packet with the 1596 sequence number. The “TCP Previous segment not captured” error message informs us about the dropped packets. The bridge tried to recover the TCP connection by resending Ack for the last successful packet. Windows then resent the missing bytes in packets of smaller lengths. However, after a second or two, the same problem reappeared. Seasoned network engineers probably already know what’s happening, but it took me a while to realize that the dropped packets have lengths much higher than MTU (usually 1500 bytes on Ethernet). With this information, I could narrow my searches for solution. Finally, I stumbled upon this post on serverfault.com. As suggested there, I disabled Large Send Offload in PowerShell (Disable-NetAdapterLso) and that was it! The writes to share were instantaneous again.

As I’m happy with the network transfer speed, I haven’t looked further, but I believe that the problem on the Linux side are large-receive-offloads being off on my bridge:

$ ethtool -k br0

Features for br0:
...
generic-segmentation-offload: off [requested on]
generic-receive-offload: on
large-receive-offload: off [fixed]
...

I couldn’t easily turn them on with ethtool (maybe it’s because I use Network Manager to configure this interface?) and I’m unable to verify this hypothesis. Please leave a comment if you have any insights or suggestions.

Another fact which I can’t explain is why those dropped packets were not accounted in the bridge network statistics (ip -s -s link show br0). Finally, it’s also strange that Windows was constantly retrying to send those big packets, even though the other side had issues handling them. Unfortunately, my network knowledge is quite limited, but thanks to issues like this one, I always learn something new 🙂

I hope you enjoyed reading this post and until the next time!