opensubscriber
   Find in this group all groups
 
Unknown more information…

f : freebsd-hackers@freebsd.org 7 February 2012 • 12:01AM -0500

Re: [RFT][patch] Scheduling for HTT and not only
by Alexander Best

REPLY TO AUTHOR
 
REPLY TO GROUP




On Mon Feb  6 12, Alexander Motin wrote:
> Hi.
>
> I've analyzed scheduler behavior and think found the problem with HTT.
> SCHED_ULE knows about HTT and when doing load balancing once a second,
> it does right things. Unluckily, if some other thread gets in the way,
> process can be easily pushed out to another CPU, where it will stay for
> another second because of CPU affinity, possibly sharing physical core
> with something else without need.
>
> I've made a patch, reworking SCHED_ULE affinity code, to fix that:
> http://people.freebsd.org/~mav/sched.htt.patch
>
> This patch does three things:
>  - Disables strict affinity optimization when HTT detected to let more
> sophisticated code to take into account load of other logical core(s).
>  - Adds affinity support to the sched_lowest() function to prefer
> specified (last used) CPU (and CPU groups it belongs to) in case of
> equal load. Previous code always selected first valid CPU of evens. It
> caused threads migration to lower CPUs without need.
>  - If current CPU group has no CPU where the process with its priority
> can run now, sequentially check parent CPU groups before doing global
> search. That should improve affinity for the next cache levels.
>
> I've made several different benchmarks to test it, and so far results
> look promising:
>  - On Atom D525 (2 physical cores + HTT) I've tested HTTP receive with
> fetch and FTP transmit with ftpd. On receive I've got 103MB/s on
> interface; on transmit somewhat less -- about 85MB/s. In both cases
> scheduler kept interrupt thread and application on different physical
> cores. Without patch speed fluctuating about 103-80MB/s on receive and
> is about 85MB/s on transmit.
>  - On the same Atom I've tested TCP speed with iperf and got mostly the
> same results:
>    - receive to Atom with patch -- 755-765Mbit/s, without patch --
> 531-765Mbit/s.
>    - transmit from Atom in both cases 679Mbit/s.
> Fluctuating receive behavior in both tests I think can be explained by
> some heavy callout handled by the swi4:clock process, called on receive
> (seen in top and schedgraph), but not on transmit. May be it is
> specifics of the Realtek NIC driver.
>
>  - On the same Atom tested number of 512 byte reads from SSD with dd in
> 1 and 32 streams. Found no regressions, but no benefits also as with one
> stream there is no congestion and with multiple streams all cores congested.
>
>  - On Core i7-2600K (4 physical cores + HTT) I've run more then 20
> `make buildworld`s with different -j values (1,2,4,6,8,12,16) for both
> original and patched kernel. I've found no performance regressions,
> while for -j4 I've got 10% improvement:
> # ministat -w 65 res4A res4B
> x res4A
> + res4B
> +-----------------------------------------------------------------+
> |+                                                                |
> |++                                          x    x              x|
> |A|                                        |______M__A__________| |
> +-----------------------------------------------------------------+
>     N        Min        Max      Median           Avg        Stddev
> x   3    1554.86    1617.43     1571.62     1581.3033     32.389449
> +   3    1420.69     1423.1     1421.36     1421.7167     1.2439587
> Difference at 95.0% confidence
>         -159.587 ± 51.9496
>         -10.0921% ± 3.28524%
>         (Student's t, pooled s = 22.9197)
> , and for -j6 -- 3.6% improvement:
> # ministat -w 65 res6A res6B
> x res6A
> + res6B
> +-----------------------------------------------------------------+
> |  +                                                              |
> |  +       +                             x                 x x    |
> ||_M__A___|                                |__________A____M_____||
> +-----------------------------------------------------------------+
>     N        Min        Max     Median           Avg        Stddev
> x   3    1381.17    1402.94     1400.3     1394.8033     11.880372
> +   3     1340.4    1349.34    1341.23     1343.6567     4.9393758
> Difference at 95.0% confidence
>         -51.1467 ± 20.6211
>         -3.66694% ± 1.47842%
>         (Student's t, pooled s = 9.09782)
>
> Who wants to do independent testing to verify my results or do some more
> interesting benchmarks? :)

i don't have any benchmarks to offer, but i'm seeing a massive increase in
responsiveness with your patch. with an unpatched kernel, opening xterm while
unrar'ing some huge archive could take up to 3 minutes!!! with your patch the
time it takes for xterm to start is never > 10 seconds!!!

well done. :) really looking forward to seeing this commited.

cheers.
alex

btw: i couldn't verify a decrease in my mouses input rate. nothing was lagging!
however i'm not running moused(8). i can only advise anyone to turn it off in
connection with usb mice. i was having massive problems with moused(8) and
hald(8) (i.e. input rates < 1 Hz during heavy disk i/o). disabling moused(8)
and relying on hald(8) completely (removing any mouse specific entry from my
xorg.conf and disabling moused(8) in my rc.conf) solved the issue entirely.

>
> PS: Sponsored by iXsystems, Inc.
>
> --
> Alexander Motin
_______________________________________________
freebsd-hackers@free... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@free..."

Bookmark with:

Delicious   Digg   reddit   Facebook   StumbleUpon

Related Messages

opensubscriber is not affiliated with the authors of this message nor responsible for its content.