Windows file system supports scatter/gather IO.(Of course, other platform does)
But I don't know when do I use the IO mechanism.
Could you explain me a proper case?
And what benefit can we get from using the I/O mechanism?(Just a little IO request?)
You use Scatter/Gather IO when you are doing lots of random (i.e. non-sequential) reads / writes, and you want to save on context switches / syscalls - Scatter/Gather is a form of batching in this sense. However, unless you've got a very fast disk (or more likely, a large array of disks), the syscall cost is negligible.
If you were writing a Database server, you might care about this, but anything less than a big-iron machine handling thousands or millions of requests a second won't see any benefit.
4 Comments
Now in 2017 it’s not uncommon to see 100k IOPS SSD in a mid-range laptop. Does it mean we’re effectively using the big machines you’re talking about, and should therefore implement vectorized IO for random reads?
It only seems appropriate to answer a comment posted 7 years after the original answer, with a comment posted 7 years after the original comment :-) The answer is NO. Scatter/Gather largely took advantage of the mechanics of HHDs (rotating discs). SSDs work differently and don't generally benefit from this technique.
Actually it does, because while you are right that there is no seek time, SSDs are so fast that keeping them saturated requires you to keep their command queue full and the only way to do that is via asynchronous I/O, and syscall costs do start to add up here. In 2024, we indeed do have incredibly fast I/O.
Paul -- one extra note: one additional advantage is that you hand multiple requests to the disk driver at the same time. The driver then can sort the requests and issue them in the optimal order. While syscall time is small, seek time (many milliseconds) can be punitive (that's less than 1000 I/O's/sec).
Chris's comment about demonstrating the efficiency is pragmatic. Mother nature never lies. Well, almost never.
2 Comments
Currently scattered I/O in NT doesn't actually do anything special besides map in different pages in one contiguous segment, and drivers don't know about it. So no, the drivers don't "sort the requests and issue them in the optimal order".
I would imagine that you would use scatter gatehr IO when you (a) suspected your application had a performance bottleneck, and (b) you built a performance analysis framework that could show significant improvment using it.
Unless you can show a provable improvement, the additional code complexity is just a risk, and theres no magic recipe that says that, when some condition is met, and application will automatically benefit in a significant way from some programming cleverness.
Or - to put it another way - dont base major architectural decisions based on the statements of 'some guy on an internet forum'. Create a test, and find out.
Comments
in posix, readv and writev read from or write to discontinuous memory but to read and write discontinuous file ranges from discontinuous memory in one go you want readx and writex which were one of the proposed posix additions
doing a readx is faster then doing a lot of reads as it's only one system call and it lets the disk scheduler have the most io's to reorder i remember some one saying that for the ext2/3/.. fsck program that they wanted this as it knows what ranges it wants
Comments
Explore related questions
See similar questions with these tags.
