Windows下缺省能够获得的定时器及时钟的精度只能到10ms级别。如果需要1ms或更高分辨率的高精度时钟及定时器,需要使用特别的方法。
本文介绍了如何使用timeBeginPeriod / timeEndPeriod来获得精度到1ms(毫秒)的时钟,另外用QueryPerformanceFrequency & QueryPerformanceCounter可以提高精度到大约10us(微秒)。
本文介绍了如何使用timeBeginPeriod / timeEndPeriod来获得精度到1ms(毫秒)的时钟,另外用QueryPerformanceFrequency & QueryPerformanceCounter可以提高精度到大约10us(微秒)。
Results of some quick research on timing in Win32 by Ryan Geiss - 16 August 2002 (...with updates since then) You might be thinking to yourself: this is a pretty simple thing to be posting; what's the big deal? The deal is that somehow, good timing code has eluded me for years. Finally, frustrated, I dug in and did some formal experiments on a few different computers, testing the timing precision they could offer, using various win32 functions. I was fairly surprised by the results! I tested on three computers; here are their specs: Gemini: 933 mhz desktop, win2k Vaio: 333 mhz laptop, win98 HP: 733 mhz laptop, win2k Also, abbreviations to be used hereafter: ms: milliseconds, or 1/1,000 of a second us: microseconds, or 1/1,000,000 of a second timeGetTime - what they don't tell you First, I tried to determine the precision of timeGetTime(). In order to do this, I simply ran a loop, constantly polling timeGetTime() until the time changed, and then printing the delta (between the prev. time and the new time). I then looked at the output, and for each computer, took the minimum of all the delta's that occured. (Usually, the minimum was very solid, occuring about 90% of the time.) The results: Resolution of timeGetTime() Gemini: 10 ms Vaio: 1 ms HP: 10 ms For now, I am assuming that it was the OS kernel that made the difference: win2k offers a max. precision of 10 ms for timeGetTime(), while win98 is much better, at 1 ms. I assume that WinXP would also have a precision of 10 ms, and that Win95 would be ~1 ms, like Win98. (If anyone tests this out, please let me know either way!) (Note that using timeGetTime() unfortunately requires linking to winmm.lib, which slightly increases your file size. You could use GetTickCount() instead, which doesn't require linking to winmm.lib, but it tends to not have as good of a timer resolution... so I would recommend sticking with timeGetTime(). Next, I tested Sleep(). A while back I noticed that when you call Sleep(1), it doesn't really sleep for 1 ms; it usually sleeps for longer than that. I verified this by calling Sleep(1) ten times in a row, and taking the difference in timeGetTime() readings from the beginning to the end. Whatever delta there was for these ten sleeps, I just divided it by 10 to get the average duration of Sleep(1). This turned out to be: Average duration of Sleep(1) Gemini: 10 ms (10 calls to Sleep(1) took exactly 100 ms) Vaio: ~4 ms (10 calls to Sleep(1) took 35-45 ms) HP: 10 ms (10 calls to Sleep(1) took exactly 100 ms) Now, this was disturbing, because it meant that if you call Sleep(1) and Sleep(9) on a win2k machine, there is no difference - it still sleeps for 10 ms! "So *this* is the reason all my timing code sucks," I sighed to myself. Given that, I decided to give up on Sleep() and timeGetTime(). The application I was working on required really good fps limiting, and 10ms Sleeps were not precise enough to do a good job. So I looked elsewhere. UPDATE: Matthijs de Boer points out that the timeGetTime function returns a DWORD value, which will wraps around to 0 every 2^32 milliseconds, which is about 49.71 days, so you should write your code to be aware of this possibility. timeBeginPeriod / timeEndPeriod HOWEVER, I should not have given up so fast! It turns out that there is a win32 command, timeBeginPeriod(), which solves our problem: it lowers the granularity of Sleep() to whatever parameter you give it. So if you're on windows 2000 and you call timeBeginPeriod(1) and then Sleep(1), it will truly sleep for just 1 millisecond, rather than the default 10! timeBeginPeriod() only affects the granularity of Sleep() for the application that calls it, so don't worry about messing up the system with it. Also, be sure you call timeEndPeriod() when your program exits, with the same parameter you fed into timeBeginPeriod() when your program started (presumably 1). Both of these functions are in winmm.lib, so you'll have to link to it if you want to lower your Sleep() granularity down to 1 ms. How reliable is it? I have yet to find a system for which timeBeginPeriod(1) does not drop the granularity of Sleep(1) to 1 or, at most, 2 milliseconds. If anyone out there does, please let me know (e-mail: ); I'd like to hear about it, and I will post a warning here. Note also that calling timeBeginPeriod() also affects the granularity of some other timing calls, such as CreateWaitableTimer() and WaitForSingleObject(); however, some functions are still unaffected, such as _ftime(). (Special thanks to Mark Epstein for pointing this out to me!) some convenient test code The following code will tell you: 1. what the granularity, or minimum resolution, of calls to timeGetTime() are, on your system. In other words, if you sit in a tight loop and call timeGetTime(), only noting when the value returned changes, what value do you get? This granularity tells you, more or less, what kind of potential error to expect in the result when calling timeGetTime(). 2. it also tests how long your machine really sleeps when you call Sleep(1). Often this is actually 2 or more milliseconds, so be careful! NOTE that these tests are performed after calling timeBeginPeriod(1), so if you forget to call timeBeginPeriod(1) in your own init code, you might not get as good of granularity as you see from this test! #include <stdio.h> #include "windows.h" int main(int argc, char **argv) { const int count = 64; timeBeginPeriod(1); printf("1. testing granularity of timeGetTime()...\n"); int its = 0; long cur = 0, last = timeGetTime(); while (its < count) { cur = timeGetTime(); if (cur != last) { printf("%ld ", cur-last); last = cur; its++; } } printf("\n\n2. testing granularity of Sleep(1)...\n "); long first = timeGetTime(); cur = first; last = first; for (int n=0; n<count; n++) { Sleep(1); cur = timeGetTime(); printf("%d ", cur-last); last = cur; } printf("\n"); return 0; } RDTSC: Eh, no thanks On the web, I found several references to the "RDTSC" Pentium instruction, which stands for "Read Time Stamp Counter." This assembly instruction returns an unsigned 64-bit integer reading on the processor's internal high-precision timer. In order to get the frequency of the timer (how much the timer return value will increment in 1 second), you can read the registry for the machine's speed (in MHz - millions of cycles per second), like this: // WARNING: YOU DON'T REALLY WANT TO USE THIS FUNCTION bool GetPentiumClockEstimateFromRegistry(unsigned __int64 *frequency) { HKEY hKey; DWORD cbBuffer; LONG rc; *frequency = 0; rc = RegOpenKeyEx( HKEY_LOCAL_MACHINE, "Hardware\\Description\\System\\CentralProcessor\\0", 0, KEY_READ, &hKey ); if (rc == ERROR_SUCCESS) { cbBuffer = sizeof (DWORD); DWORD freq_mhz; rc = RegQueryValueEx ( hKey, "~MHz", NULL, NULL, (LPBYTE)(&freq_mhz), &cbBuffer ); if (rc == ERROR_SUCCESS) *frequency = freq_mhz*1024*1024; RegCloseKey (hKey); } return (*frequency > 0); } Result of GetPentiumClockEstimateFromRegistry() Gemini: 975,175,680 Hz Vaio: FAILED. HP: 573,571,072 Hz <-- strange... Empirical tests: RDTSC delta after Sleep(1000) Gemini: 931,440,000 Hz Vaio: 331,500,000 Hz HP: 13,401,287 Hz However, as you can see, this failed on Vaio (the win98 laptop). Worse yet, however, is that on the HP, the value in the registry does not match the MHz rating of the machine (733). That would be okay if the value was actually the rate at which the timer ticked; but, after doing some empirical testing, it turns out that the HP's timer frequency is really 13 MHz. Trusting the registry reading on the HP would be a big, big mistake! So, one conclusion is: don't try to read the registry to get the timer frequency; you're asking for trouble. Instead, do it yourself. Just call Sleep(1000) to allow 1 second (plus or minus ~1%) to pass, calling GetPentiumTimeRaw() (below) at the beginning and end, and then simply subtract the two unsigned __int64's, and voila, you now know the frequency of the timer that feeds RDTSC on the current system. (*watch out for timer wraps during that 1 second, though...) Note that you could easily do this in the background, though, using timeGetTime() instead of Sleep(), so there wouldn't be a 1-second pause when your program starts. int GetPentiumTimeRaw(unsigned __int64 *ret) { // returns 0 on failure, 1 on success // warning: watch out for wraparound! // get high-precision time: __try { unsigned __int64 *dest = (unsigned __int64 *)ret; __asm { _emit 0xf // these two bytes form the 'rdtsc' asm instruction, _emit 0x31 // available on Pentium I and later. mov esi, dest mov [esi ], eax // lower 32 bits of tsc mov [esi+4], edx // upper 32 bits of tsc } return 1; } __except(EXCEPTION_EXECUTE_HANDLER) { return 0; } return 0; } Once you figure out the frequency, using this 1-second test, you can now translate readings from the cpu's timestamp counter directly into a real 'time' reading, in seconds: double GetPentiumTimeAsDouble(unsigned __int64 frequency) { // returns < 0 on failure; otherwise, returns current cpu time, in seconds. // warning: watch out for wraparound! if (frequency==0) return -1.0; // get high-precision time: __try { unsigned __int64 high_perf_time; unsigned __int64 *dest = &high_perf_time; __asm { _emit 0xf // these two bytes form the 'rdtsc' asm instruction, _emit 0x31 // available on Pentium I and later. mov esi, dest mov [esi ], eax // lower 32 bits of tsc mov [esi+4], edx // upper 32 bits of tsc } __int64 time_s = (__int64)(high_perf_time / frequency); // unsigned->sign conversion should be safe here __int64 time_fract = (__int64)(high_perf_time % frequency); // unsigned->sign conversion should be safe here // note: here, we wrap the timer more frequently (once per week) // than it otherwise would (VERY RARELY - once every 585 years on // a 1 GHz), to alleviate floating-point precision errors that start // to occur when you get to very high counter values. double ret = (time_s % (60*60*24*7)) + (double)time_fract/(double)((__int64)frequency); return ret; } __except(EXCEPTION_EXECUTE_HANDLER) { return -1.0; } return -1.0; } This works pretty well, works on ALL Pentium I and later processors, and offers AMAZING precision. However, it can be messy, especially working that 1-second test in there with all your other code, so that it runs in the background. UPDATE: Ross Bencina was kind enough to point out to me that rdtsc "is a per-cpu operation, so on multiprocessor systems you have to be careful that multiple calls to rdtsc are actually executing on the same cpu." (You can do that using the SetThreadAffinityMask() function.) Thanks Ross! QueryPerformanceFrequency & QueryPerformanceCounter: Nice There is one more item in our bag of tricks. It is simple, elegant, and as far as I can tell, extremely accurate and reliable. It is a pair of win32 functions: QueryPerformanceFrequency and QueryPerformanceCounter. QueryPerformanceFrequency returns the amount that the counter will increment over 1 second; QueryPerformanceCounter returns a LARGE_INTEGER (a 64-bit *signed* integer) that is the current value of the counter. Perhaps I am lucky, but it works flawlessly on my 3 machines. The MSDN library says that it should work on Windows 95 and later. Here are some results: Return value of QueryPerformanceFrequency Gemini: 3,579,545 Hz Vaio: 1,193,000 Hz HP: 3,579,545 Hz Maximum # of unique readings I could get in 1 second Gemini: 658,000 (-> 1.52 us resolution!) Vaio: 174,300 (-> 5.73 us resolution!) HP: 617,000 (-> 1.62 us resolution!) I was pretty excited to see timing resolutions in the low-microsecond range. Note that for the latter test, I avoided printing any text during the 1-second interval, as it would drastically affect the outcome. Now, here is my question to you: do these two functions work for you? What OS does the computer run, what is the MHz rating, and is it a laptop or desktop? What was the result of QueryPerformanceFrequency? What was the max. # of unique readings you could get in 1 second? Can you find any computers that it doesn't work on? Let me know, and I'll collect & publish everyone's results here. So, until I find some computers that QueryPerformanceFrequency & QueryPerformanceCounter don't work on, I'm sticking with them. If they fail, I've got backup code that will kick in, which uses timeGetTime(); I didn't bother to use RDTSC because of the calibration issue, and I'm hopeful that these two functions are highly reliable. I suppose only feedback from readers like you will tell... =) UPDATE: a few people have written e-mail pointing me to this Microsoft Knowledge Base article which outlines some cases in which the QueryPerformanceCounter function can unexpectedly jump forward by a few seconds. UPDATE: Matthijs de Boer points out that you can use the SetThreadAffinityMask() function to make your thread stick to one core or the other, so that 'rdtsc' and QueryPerformanceCounter() don't have timing issues in dual core systems. Accurate FPS Limiting / High-precision 'Sleeps' So now, when I need to do FPS limiting (limiting the framerate to some maximum), I don't just naively call Sleep() anymore. Instead, I use QueryPerformanceCounter in a loop that runs Sleep(0). Sleep(0) simply gives up your thread's current timeslice to another waiting thread; it doesn't really sleep at all. So, if you just keep calling Sleep(0) in a loop until QueryPerformanceCounter() says you've hit the right time, you'll get ultra-accurate FPS readings. There is one problem with this kind of fps limiting: it will use up 100% of the CPU. Even though the computer WILL remain quite responsive, because the app sucking up the idle time is being very "nice", this will still look very bad on the CPU meter (which will stay at 100%) and, much worse, it will drain the battery quite quickly on laptops. To get around this, I use a hybrid algorithm that uses Sleep() to do the bulk of the waiting, and QueryPerformanceCounter() to do the finishing touches, making it accurate to ~10 microseconds, but still wasting very little processor. My code for accurate FPS limiting looks something like this, and runs at the end of each frame, immediately after the page flip: // note: BE SURE YOU CALL timeBeginPeriod(1) at program startup!!! // note: BE SURE YOU CALL timeEndPeriod(1) at program exit!!! // note: that will require linking to winmm.lib // note: never use static initializers (like this) with Winamp plug-ins! static LARGE_INTEGER m_prev_end_of_frame = 0; int max_fps = 60; LARGE_INTEGER t; QueryPerformanceCounter(&t); if (m_prev_end_of_frame.QuadPart != 0) { int ticks_to_wait = (int)m_high_perf_timer_freq.QuadPart / max_fps; int done = 0; do { QueryPerformanceCounter(&t); int ticks_passed = (int)((__int64)t.QuadPart - (__int64)m_prev_end_of_frame.QuadPart); int ticks_left = ticks_to_wait - ticks_passed; if (t.QuadPart < m_prev_end_of_frame.QuadPart) // time wrap done = 1; if (ticks_passed >= ticks_to_wait) done = 1; if (!done) { // if > 0.002s left, do Sleep(1), which will actually sleep some // steady amount, probably 1-2 ms, // and do so in a nice way (cpu meter drops; laptop battery spared). // otherwise, do a few Sleep(0)'s, which just give up the timeslice, // but don't really save cpu or battery, but do pass a tiny // amount of time. if (ticks_left > (int)m_high_perf_timer_freq.QuadPart*2/1000) Sleep(1); else for (int i=0; i<10; i++) Sleep(0); // causes thread to give up its timeslice } } while (!done); } m_prev_end_of_frame = t; ...which is trivial to convert this into a high-precision Sleep() function. Conclusions & Summary Using regular old timeGetTime() to do timing is not reliable on many Windows-based operating systems because the granularity of the system timer can be as high as 10-15 milliseconds, meaning that timeGetTime() is only accurate to 10-15 milliseconds. [Note that the high granularities occur on NT-based operation systems like Windows NT, 2000, and XP. Windows 95 and 98 tend to have much better granularity, around 1-5 ms.] However, if you call timeBeginPeriod(1) at the beginning of your program (and timeEndPeriod(1) at the end), timeGetTime() will usually become accurate to 1-2 milliseconds, and will provide you with extremely accurate timing information. Sleep() behaves similarly; the length of time that Sleep() actually sleeps for goes hand-in-hand with the granularity of timeGetTime(), so after calling timeBeginPeriod(1) once, Sleep(1) will actually sleep for 1-2 milliseconds, Sleep(2) for 2-3, and so on (instead of sleeping in increments as high as 10-15 ms). For higher precision timing (sub-millisecond accuracy), you'll probably want to avoid using the assembly mnemonic RDTSC because it is hard to calibrate; instead, use QueryPerformanceFrequency and QueryPerformanceCounter, which are accurate to less than 10 microseconds (0.00001 seconds). For simple timing, both timeGetTime and QueryPerformanceCounter work well, and QueryPerformanceCounter is obviously more accurate. However, if you need to do any kind of "timed pauses" (such as those necessary for framerate limiting), you need to be careful of sitting in a loop calling QueryPerformanceCounter, waiting for it to reach a certain value; this will eat up 100% of your processor. Instead, consider a hybrid scheme, where you call Sleep(1) (don't forget timeBeginPeriod(1) first!) whenever you need to pass more than 1 ms of time, and then only enter the QueryPerformanceCounter 100%-busy loop to finish off the last < 1/1000th of a second of the delay you need. This will give you ultra-accurate delays (accurate to 10 microseconds), with very minimal CPU usage. See the code above. Please Note: Several people have written me over the years, offering additions or new developments since I first wrote this article, and I've added 'update' comments here and there. The general text of the article DOES NOT reflect the 'UPDATE' comments yet, so please keep that in mind, if you see any contradictions.
UPDATE: Matthijs de Boer points out that you should watch out for variable CPU speeds, in general, when running on laptops or other power-conserving (perhaps even just eco-friendly) devices. (Thanks Matthijs!)
This document copyright (c)2002+ Ryan M. Geiss.
下一篇:如何在main之前和之后调用函数
文章评论
共有 1 位网友发表了评论 此处只显示部分留言 点击查看完整评论页面