View previous topic :: View next topic |
Author |
Message |
micmac l33t
Joined: 28 Nov 2003 Posts: 996
|
Posted: Sat Mar 31, 2007 11:51 am Post subject: Tickless system/High Res Timers - care to explain? |
|
|
Hi there,
in the upcoming 2.6.21 kernel there will be changes that I don't really get yet.
There's a new option called NO_HZ for the tickless system. It's about timers
(wake up calls) being generated only when needed instead of a static wake up
time. Here are my questions so far:
1. Will CONFIG_HZ still be used when NO_HZ is enabled? The name NO_HZ
kind of implies CONFIG_HZ won't be used, but on the other hand CONFIG_HZ
isn't hidden when NO_HZ is enabled.
2. In case CONFIG_HZ is still used when NO_HZ is enabled can a high value
like 1000 HZ damage desktop hardware (i386/amd64/ia64)? What's the point
of a high HZ value? Is it about response time only (like 1 ms vs. 10 ms)? I
tried both 100HZ and 1000HZ and I wasn't able to tell the difference.
3. High Resolution Timers: Say your machine has one (on my nforce2 board
it says 'Switched to high resolution mode on CPU 0' so I guess this one does).
Has this an impact on the tickless system? Or on anything else?
I guess that's it for now. I know these are pretty vague questions, but I read
about tickless systems and high res timers and the articles were a little over
my head. I guess I would like to know what's in it for the users.
Thanks! |
|
Back to top |
|
|
rmh3093 Advocate
Joined: 06 Aug 2003 Posts: 2138 Location: Albany, NY
|
Posted: Sat Mar 31, 2007 12:25 pm Post subject: |
|
|
High res timers is exactly what it sounds like, im pretty sure it allows for nanosecond resolution for timeslices.
On a system with out dynticks the kernel allocates out fixed timeslices so that info can get processed. Many times the info that needs to get processed only requires a faction of its timeslice to complete its task... On a system with out dynticks the timeslices get wasted and always consumed. On a system with dynticks however, the kernel can give out smaller timeslices to tasks that only needs small time slices. This allows you to process more info with out impacting system performance too much. It wont make your computer run cooler because the computer still has to do the same amount of work to keep your computer idle. It it designed to help out really bogged down systems (eg. multiple virtual machines, heavy multitasking). _________________ Do not meddle in the affairs of wizards, for they are subtle and quick to anger. |
|
Back to top |
|
|
aysther n00b
Joined: 05 Sep 2005 Posts: 68 Location: Charlotte, NC
|
Posted: Sun Jun 03, 2007 3:15 pm Post subject: |
|
|
I'm interested in hearing specific answers to the above questions, also. I'm trying to configure this, but I'm not sure if it's as simple as turning on CONFIG_NO_HZ, and that's that, or if you need to configure or disable several other options to get the "full effect" as well. I know this probably sounds like a rather uninformed question, but that's exactly why I'm asking. Thanks! _________________ Microsoft Ceo "I'm Going to F'ing Kill Google." |
|
Back to top |
|
|
Vlad.Sharp Guru
Joined: 08 Dec 2004 Posts: 337 Location: Cambridgeshire, UK
|
Posted: Sun Jun 03, 2007 8:43 pm Post subject: |
|
|
Anyone care to elaborate further? I've read about the benefits of having dynamic ticks, however are there any **special** options to achieve the full extent of the benefits? (as asked above). - or is enabling it enough? |
|
Back to top |
|
|
micmac l33t
Joined: 28 Nov 2003 Posts: 996
|
Posted: Thu Jun 07, 2007 1:07 pm Post subject: Re: Tickless system/High Res Timers - care to explain? |
|
|
Hello all,
after playing around a bit I think I'm fit to answer _some_ of my questions myself.
micmac wrote: |
1. Will CONFIG_HZ still be used when NO_HZ is enabled? The name NO_HZ
kind of implies CONFIG_HZ won't be used, but on the other hand CONFIG_HZ
isn't hidden when NO_HZ is enabled.
|
CONFIG_HZ is still used when NO_HZ=y is selected. I enabled NO_HZ and
CONFIG_HZ=100. Afterwards I changed CONFIG_HZ to 300. I started tvtime
and used its debug screen ('d') to check for timing problems. With 300 there
were none whereas with 100 tvtime had its issues displaying frames at the right
time.
micmac wrote: |
2. In case CONFIG_HZ is still used when NO_HZ is enabled can a high value
like 1000 HZ damage desktop hardware (i386/amd64/ia64)? What's the point
of a high HZ value? Is it about response time only (like 1 ms vs. 10 ms)? I
tried both 100HZ and 1000HZ and I wasn't able to tell the difference.
|
I don't know if it can damage hardware. But as you can see from what I just
wrote above, there are certain treats using higher values for CONFIG_HZ. For video
and audio playback 300 seems perfect.
micmac wrote: |
3. High Resolution Timers: Say your machine has one (on my nforce2 board
it says 'Switched to high resolution mode on CPU 0' so I guess this one does).
Has this an impact on the tickless system? Or on anything else?
|
I don't know. I guess it's wise to enable High Res Timers in case your box
has one of them. Why not put it to use when it's there anyway.
mic |
|
Back to top |
|
|
depontius Advocate
Joined: 05 May 2004 Posts: 3525
|
Posted: Thu Jul 12, 2007 6:38 pm Post subject: |
|
|
I've been playing with tickless systems for a bit now, and I'll throw in a bit of conjecture about CONFIG_HZ and NO_HZ. I might remember reading something to the effect that the kernel is only tickless when there's nothing to do. But when it's busy, it still ticks along at CONFIG_HZ, using that as a scheduling interval. Then when all of the work is done, and it's in wait-for-keypress or wait-for-packet mode, the ticks turn off and it waits for the next timer to expire, a keypress, or a packet.
Does anyone here understand the whole "deferred timers" thing? My impression is that there are various places with kernel timers used as timeouts, and by the time you've got a real-world quantity of them the kernel is waking too often. My impression is that since most of the timeouts really aren't critical, the deferred timers allow the value to be specified as non-critical, and they get fudged together. Then the system wakes once, services a bunch of timeouts, then quits ticking again.
Is special attention required to get a deferred timer instead of the old-style? I note on my laptop that at the moment the single largest cause of wake-ups is "afs_rxevent : schedule_timeout (process_timeout)", almost twice as often as the number 2 cause of wakeups, cpufreq-set. (No, I haven't gotten around to slowing the sampling time on this in /etc/conf.d/local.start.) In other words, after deferred timers get added, will we have to hammer on projects individually to get them used, or will they kind of slip in automagically? _________________ .sigs waste space and bandwidth |
|
Back to top |
|
|
Rob1n l33t
Joined: 29 Nov 2003 Posts: 714 Location: Cambridge, UK
|
Posted: Thu Jul 12, 2007 8:33 pm Post subject: |
|
|
depontius wrote: | Does anyone here understand the whole "deferred timers" thing? My impression is that there are various places with kernel timers used as timeouts, and by the time you've got a real-world quantity of them the kernel is waking too often. My impression is that since most of the timeouts really aren't critical, the deferred timers allow the value to be specified as non-critical, and they get fudged together. Then the system wakes once, services a bunch of timeouts, then quits ticking again. |
That's basically right, yes. Normally timers are scheduled to occur after a given time (e.g. in 250ms time). In most cases this accuracy isn't needed, if it's actually woken in 240ms or 260ms then there's no issues. So, everything asking for a timer within a given range can be woken at the same time (so the system only has to wake up once instead of 4 or 5 times).
Quote: | Is special attention required to get a deferred timer instead of the old-style? I note on my laptop that at the moment the single largest cause of wake-ups is "afs_rxevent : schedule_timeout (process_timeout)", almost twice as often as the number 2 cause of wakeups, cpufreq-set. (No, I haven't gotten around to slowing the sampling time on this in /etc/conf.d/local.start.) In other words, after deferred timers get added, will we have to hammer on projects individually to get them used, or will they kind of slip in automagically? |
I believe these are purely internal timers, so it's only the kernel which needs to be updated to use them where applicable. |
|
Back to top |
|
|
depontius Advocate
Joined: 05 May 2004 Posts: 3525
|
Posted: Thu Jul 12, 2007 9:10 pm Post subject: |
|
|
Rob1n wrote: | depontius wrote: | Does anyone here understand the whole "deferred timers" thing? My impression is that there are various places with kernel timers used as timeouts, and by the time you've got a real-world quantity of them the kernel is waking too often. My impression is that since most of the timeouts really aren't critical, the deferred timers allow the value to be specified as non-critical, and they get fudged together. Then the system wakes once, services a bunch of timeouts, then quits ticking again. |
That's basically right, yes. Normally timers are scheduled to occur after a given time (e.g. in 250ms time). In most cases this accuracy isn't needed, if it's actually woken in 240ms or 260ms then there's no issues. So, everything asking for a timer within a given range can be woken at the same time (so the system only has to wake up once instead of 4 or 5 times).
Quote: | Is special attention required to get a deferred timer instead of the old-style? I note on my laptop that at the moment the single largest cause of wake-ups is "afs_rxevent : schedule_timeout (process_timeout)", almost twice as often as the number 2 cause of wakeups, cpufreq-set. (No, I haven't gotten around to slowing the sampling time on this in /etc/conf.d/local.start.) In other words, after deferred timers get added, will we have to hammer on projects individually to get them used, or will they kind of slip in automagically? |
I believe these are purely internal timers, so it's only the kernel which needs to be updated to use them where applicable. |
In the case of AFS, it wouldn't surprise me if the kernel module needs updating too, and it's out-of-tree. But I guess that's better than having to update userspace. _________________ .sigs waste space and bandwidth |
|
Back to top |
|
|
Paapaa l33t
Joined: 14 Aug 2005 Posts: 955 Location: Finland
|
|
Back to top |
|
|
depontius Advocate
Joined: 05 May 2004 Posts: 3525
|
Posted: Fri Jul 13, 2007 12:39 pm Post subject: |
|
|
Thanks for the pointers. I'd already read the more recent one, and may have read the earlier one, but that was quite a while back. But the more recent one contains a link to deferrable timers that might be of interest, here: http://lwn.net/Articles/228143/ Still looks as if it isn't picked up for free, though.
As a side note, my laptop and deskside have been running 2.6.20 ~x86, and today they're running 2.6.20-r4 x86. In the process I discovered that my deskside wasn't running tickless, and neither machine had hi-res timers enabled. Both of those situations are corrected, now.
But things are a little odd on my laptop. Sometimes all is well and it spends most of its time in C3, but sometimes (like today) it never gets to C3.
Code: | PowerTOP version 1.2 (C) 2007 Intel Corporation
Cn Avg residency (10s) Long term residency avg
C0 (cpu running) ( 0.2%)
C1 0.0ms ( 0.0%) 0.0ms
C2 24.5ms (99.8%) 6.5ms
C3 0.0ms ( 0.0%) 0.0ms
Wakeups-from-idle per second : 40.8
Top causes for wakeups:
24.4% ( 2.0) afs_rxevent : schedule_timeout (process_timeout)
15.9% ( 1.3) cpufreq-set : queue_delayed_work_on (delayed_work_timer_
12.2% ( 1.0) ifplugd : schedule_timeout (process_timeout)
11.0% ( 0.9) <interrupt> : ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4,
9.8% ( 0.8) xfsbufd : schedule_timeout (process_timeout)
6.1% ( 0.5) <kernel core> : queue_delayed_work_on (delayed_work_timer_
6.1% ( 0.5) : e1000_intr (e1000_watchdog)
2.4% ( 0.2) runscript.sh : __netdev_watchdog_up (dev_watchdog)
2.4% ( 0.2) <kernel core> : page_writeback_init (wb_timer_fn)
2.4% ( 0.2) automount : do_setitimer (it_real_fn) |
Incidentally, the "Top causes for wakeups" are pretty much the same as they were yesterday, when it was spending about 90% of the time in C3. Any ideas?
[EDIT] Next hunk of oddity... I've now got the laptop doing something. I find it mildly annoying that it's supposed to be doing something straightforward, and can only seem to apply 50% CPU to it, but that's another matter. What's odd and relevant to the current topic is what PowerTop says while it's "working":
Code: | PowerTOP version 1.2 (C) 2007 Intel Corporation
Cn Avg residency (5s) Long term residency avg
C0 (cpu running) (41.7%)
C1 0.0ms ( 0.0%) 0.0ms
C2 0.5ms (58.3%) 11.7ms
C3 0.0ms ( 0.0%) 0.0ms
Wakeups-from-idle per second : 1059.4
Top causes for wakeups:
97.5% (1065.4) <interrupt> : ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4,
1.5% (16.6) <interrupt> : libata
0.2% ( 2.0) afs_rxevent : schedule_timeout (process_timeout)
0.1% ( 1.2) cpufreq-set : queue_delayed_work_on (delayed_work_timer_
0.1% ( 1.0) ifplugd : schedule_timeout (process_timeout)
0.1% ( 1.0) tail : do_nanosleep (hrtimer_wakeup)
0.1% ( 1.0) clsbd : schedule_timeout (process_timeout)
0.1% ( 0.8) xfsbufd : schedule_timeout (process_timeout)
0.1% ( 0.6) : e1000_intr (e1000_watchdog)
0.0% ( 0.4) top : schedule_timeout (process_timeout) |
For some reason, the fact that the CPU is actually doing work, which of course puts it in C0, is being chalked up as USB wakeups! By the way, this laptop is in a port replicator, cover closed, and all access is over the network. The network card is a mini-PCI hardwired, not USB, and the wireless is currently not being used. _________________ .sigs waste space and bandwidth |
|
Back to top |
|
|
Bones McCracker Veteran
Joined: 14 Mar 2006 Posts: 1611 Location: U.S.A.
|
Posted: Tue Jul 17, 2007 9:45 pm Post subject: |
|
|
rmh3093 wrote: | It wont make your computer run cooler because the computer still has to do the same amount of work to keep your computer idle. |
I don't understand the above. I keep reading that Tickless/Highres operation dramatically reduces power consumption. Very nearly all Power-In is converted to Heat-Out. Therefore, it should make the machine run cooler.
I have a couple other questions to toss into the mix. (Yes, I'm googling, but the answers to these might benefit others as well.)
Also, what have people seen in terms of benchmarks? Here's at least one test that shows virtually zero effect.
http://www.phoronix.com/scan.php?page=article&item=651&num=1
Without running Powertop, are there some common-sense rules that apply? For example, are there certain daemons that are so demanding in timer access that it obviates any potential benefit (e.g., ntpd or whatever). (There's one list at http://www.linuxpowertop.org/known.php).
What acpi versions and features are required to take advantage of it?
Thanks. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|