On Thu, Apr 03, 2008 at 09:20:37PM +0100, Peter Grandi wrote:
> >>> On Wed, 2 Apr 2008 23:13:15 +0200, Keld Jørn Simonsen
> >>> <keld@dkuu...> said:
> [ ... slow RAID reading ... ]
> >> That could be the usual issue with apparent pauses in the
> >> stream of IO requests to the array component devices, with
> >> the usual workaround of trying 'blockdev --setra 65536
> >> /dev/mdN' and see if sequential reads improve.
> keld> Yes, that did it!
> But that's as usual very wrong. Such a large readhead has
> negative consequences, and most likely is the result of both
> some terrible misdesign in the Linux block IO subsystem (from
> some further experiments it is most likely related to "plugging")
> and integration of MD into it.
> However I have found that on relatively fast machines (I think)
> much lower values of read-ahead still give reasonable speed,
> with some values being much better than others. For example with
> another RAID10 I get pretty decent speed with a read-ahead of
> 128 on '/dev/md0' (but much worse with say 64 or 256). On others
> 1000 sectors read-ahead is good.
> The read-ahead needed also depends a bit on the file system
> type, don't trust tests done on the block device itself.
> So please experiment a bit to try and reduce it, at least until
> I find the time to figure out the (surely embarrasing) reason
> why it is needed and how to avoid it, or the Linux block IO and
> MD maintainers confess (they almost surely already know why)
> and/or fix it already.
I did experiment and I noted that a 16 MiB readahead was sufficient.
And then I was wondering if this had negative consequences, eg on random
I then had a test with reading 1000 files concurrently, and Some strange
things happened. Each drive was doing about 2000 transactions per
second (tps). Why? I thought a drive could only do about 150 tps, given
t5hat it is a 7200 rpm drive.
What is tps measuring?
Why is the fs not reading the chunk size for every IO operation?