On 04/11/2012 01:07, Andrey Zonov wrote:
> On 10.04.2012 20:19, Alan Cox wrote:
>> On 04/09/2012 10:26, John Baldwin wrote:
>>> On Thursday, April 05, 2012 11:54:31 am Alan Cox wrote:
>>>> On 04/04/2012 02:17, Konstantin Belousov wrote:
>>>>> On Tue, Apr 03, 2012 at 11:02:53PM +0400, Andrey Zonov wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I open the file, then call mmap() on the whole file and get pointer,
>>>>>> then I work with this pointer. I expect that page should be only
>>>>>> once
>>>>>> touched to get it into the memory (disk cache?), but this doesn't
>>>>>> work!
>>>>>>
>>>>>> I wrote the test (attached) and ran it for the 1G file generated
>>>>>> from
>>>>>> /dev/random, the result is the following:
>>>>>>
>>>>>> Prepare file:
>>>>>> # swapoff -a
>>>>>> # newfs /dev/ada0b
>>>>>> # mount /dev/ada0b /mnt
>>>>>> # dd if=/dev/random of=/mnt/random-1024 bs=1m count=1024
>>>>>>
>>>>>> Purge cache:
>>>>>> # umount /mnt
>>>>>> # mount /dev/ada0b /mnt
>>>>>>
>>>>>> Run test:
>>>>>> $ ./mmap /mnt/random-1024 30
>>>>>> mmap: 1 pass took: 7.431046 (none: 262112; res: 32; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 2 pass took: 7.356670 (none: 261648; res: 496; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 3 pass took: 7.307094 (none: 260521; res: 1623; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 4 pass took: 7.350239 (none: 258904; res: 3240; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 5 pass took: 7.392480 (none: 257286; res: 4858; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 6 pass took: 7.292069 (none: 255584; res: 6560; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 7 pass took: 7.048980 (none: 251142; res: 11002; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 8 pass took: 6.899387 (none: 247584; res: 14560; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 9 pass took: 7.190579 (none: 242992; res: 19152; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 10 pass took: 6.915482 (none: 239308; res: 22836; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 11 pass took: 6.565909 (none: 232835; res: 29309; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 12 pass took: 6.423945 (none: 226160; res: 35984; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 13 pass took: 6.315385 (none: 208555; res: 53589; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 14 pass took: 6.760780 (none: 192805; res: 69339; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 15 pass took: 5.721513 (none: 174497; res: 87647; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 16 pass took: 5.004424 (none: 155938; res: 106206; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 17 pass took: 4.224926 (none: 135639; res: 126505; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 18 pass took: 3.749608 (none: 117952; res: 144192; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 19 pass took: 3.398084 (none: 99066; res: 163078; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 20 pass took: 3.029557 (none: 74994; res: 187150; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 21 pass took: 2.379430 (none: 55231; res: 206913; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 22 pass took: 2.046521 (none: 40786; res: 221358; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 23 pass took: 1.152797 (none: 30311; res: 231833; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 24 pass took: 0.972617 (none: 16196; res: 245948; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 25 pass took: 0.577515 (none: 8286; res: 253858; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 26 pass took: 0.380738 (none: 3712; res: 258432; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 27 pass took: 0.253583 (none: 1193; res: 260951; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 28 pass took: 0.157508 (none: 0; res: 262144; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 29 pass took: 0.156169 (none: 0; res: 262144; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 30 pass took: 0.156550 (none: 0; res: 262144; super:
>>>>>> 0; other: 0)
>>>>>>
>>>>>> If I ran this:
>>>>>> $ cat /mnt/random-1024> /dev/null
>>>>>> before test, when result is the following:
>>>>>>
>>>>>> $ ./mmap /mnt/random-1024 5
>>>>>> mmap: 1 pass took: 0.337657 (none: 0; res: 262144; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 2 pass took: 0.186137 (none: 0; res: 262144; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 3 pass took: 0.186132 (none: 0; res: 262144; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 4 pass took: 0.186535 (none: 0; res: 262144; super:
>>>>>> 0; other: 0)
>>>>>> mmap: 5 pass took: 0.190353 (none: 0; res: 262144; super:
>>>>>> 0; other: 0)
>>>>>>
>>>>>> This is what I expect. But why this doesn't work without reading
>>>>>> file
>>>>>> manually?
>>>>> Issue seems to be in some change of the behaviour of the reserv or
>>>>> phys allocator. I Cc:ed Alan.
>>>> I'm pretty sure that the behavior here hasn't significantly changed in
>>>> about twelve years. Otherwise, I agree with your analysis.
>>>>
>>>> On more than one occasion, I've been tempted to change:
>>>>
>>>> pmap_remove_all(mt);
>>>> if (mt->dirty != 0)
>>>> vm_page_deactivate(mt);
>>>> else
>>>> vm_page_cache(mt);
>>>>
>>>> to:
>>>>
>>>> vm_page_dontneed(mt);
>>>>
>>>> because I suspect that the current code does more harm than good. In
>>>> theory, it saves activations of the page daemon. However, more often
>>>> than not, I suspect that we are spending more on page reactivations
>>>> than
>>>> we are saving on page daemon activations. The sequential access
>>>> detection heuristic is just too easily triggered. For example, I've
>>>> seen it triggered by demand paging of the gcc text segment. Also, I
>>>> think that pmap_remove_all() and especially vm_page_cache() are too
>>>> severe for a detection heuristic that is so easily triggered.
>>> Are you planning to commit this?
>>>
>>
>> Not yet. I did some tests with a file that was several times larger than
>> DRAM, and I didn't like what I saw. Initially, everything behaved as
>> expected, but about halfway through the test the bulk of the pages were
>> active. Despite the call to pmap_clear_reference() in
>> vm_page_dontneed(), the page daemon is finding the pages to be
>> referenced and reactivating them. The net result is that the time it
>> takes to read the file (from a relatively fast SSD) goes up by about
>> 12%. So, this still needs work.
>>
>
> Hi Alan,
>
> What do you think about attached patch?
>
>
Sorry for the slow reply, I've been rather busy for the past couple of
weeks. What you propose is clearly good for sequential accesses, but
not so good for random accesses. Keep in mind, the potential costs of
unconditionally increasing the read window include not only wasted I/O
but also increased memory pressure. Rather than argue about which is
more important, sequential or random access, I think it's more
productive to replace the sequential access heuristic. The current
heuristic is just not that sophisticated. It's easy to do better.
The attached patch implements a new heuristic, which starts with the
same initial read window as the current heuristic, but arithmetically
grows the window on sequential page faults. From a stylistic
standpoint, this patch also cleanly separates the "read ahead" logic
from the "cache behind" logic.
At the same time, this new heuristic is more selective about performing
cache behind. It requires three or four sequential page faults before
cache behind is enabled. More precisely, it requires the read ahead
window to reach its maximum size before cache behind is enabled.
For long, sequential accesses, the results of my performance tests are
just good as unconditionally increasing the window size. I'm also
seeing fewer pages needlessly cached by the cache behind heuristic.
That said, there is still room for improvement. We are still not
achieving the same sequential performance as "dd", and there are still
more pages being cached than I would like.