From its inception, a major selling point of Microsoft®Windows NT® has been reliability. The stability of Windows NT helps promote its use as a reliable enterprise solution. However, anyone who has run Windows NT for any period of time has undoubtedly been exposed to the ever informative and system-stopping blue bug check.
Bug checks, for better or worse, are a fact of life with Windows NT. Bug checks are essentially unhandled exceptions, much like the familiar GPFs that occur in user-mode apps. Unfortunately, unlike user-mode faults, kernel-mode bug checks result in the system as a whole halting executionwith no code getting another chance to execute. Kernel-mode errors are handled this way because continuing system execution of some flawed piece of kernel-mode code that has run amok can result in unrecoverable logical or physical damage to the overall system and its data.
Unfortunately for the userand even most developersthe data presented on the bug check screen is obtuse enough to leave anyone wondering what went wrong. Usually an installed driver, rather than a flawed component of Windows NT itself, causes a bug check. However, readily identifying any single component as the point of failure is difficult. This is compounded by the fact that many bug checks are intermittent in nature, and usually crop up in running systems rather than at development time. Whether Windows NT itself or some installed piece of software is at fault usually remains a mystery. But for the user, it really doesn't matter; the system stopped running, and ultimately Windows NT is left with the blame.
With Windows® 2000, Microsoft has added a number of enhancements that help promote the overall stability of the operating system. This includes resources to help driver developers eradicate system-stopping errors in their code at development and test time. In this column, I'll provide an overview of one of these enhancements, the Driver Verifier, which was provided in Windows 2000 Beta 3 and Release Candidates 1 and 2. The Driver Verifier is one of a number of kernel-mode additions introduced in Windows 2000 that's aimed directly at developers, providing operating-system-level support for testing, debugging, and stressing kernel-mode drivers. The Driver Verifier gives you a fighting chance to shake out driver problems and deficiencies during the development and testing cycles so they don't surprise your users.
Why is this debugging support being built into a retail operating system? A brief review of prior Microsoft operating systems might provide some insight into the benefits of providing development tools and support in retail products. Historically, each version of Microsoft Windows has included operating-system-level tools that provide stability and debugging support for developers. For example, Windows 3.0 included a protected-mode implementation. Windows 3.1 introduced a parameter validation for exported user-mode APIs. Windows 95 and Windows NT provided implementations of Win32® that carved out separate process spaces for each running application, limiting an application's ability to take down other applications or the whole system.
Each enhancement forced certain levels of system stability by defining what an application and installed drivers could do in an unambiguous manner. This made each new operating system and its applications more reliable than the previous one. Kernel-mode support in Windows 2000 is intended to finally correct what has been an architectural free-for-all: kernel mode, where small developer errors can cause big problems.
What is the Driver Verifier?
The Driver Verifier is a separate environment available within both checked and free builds of Windows 2000 where specified target drivers are executed. However, unlike plain old kernel-mode in Windows 2000, drivers that run in the Driver Verifier see a more hostile environment. This environment does things that will usually only occur within resource-taxed systems, such as failing dynamic memory allocations or periodically invalidating paged code and data. In addition, actions that a driver performs, such as memory allocations and execution priority changes, are scrutinized for accuracy. The proper use of many kernel-mode APIs is also checked to ensure that they are being employed within the appropriate conditions. However, the Driver Verifier is not an automated debugging tool! Using it will simply cause potential problems to appear more readily, and should provide the developer with more specific error information.
The functionality of the Driver Verifier represents the culmination of research on system crash data collected by Microsoft from real-world installations, largely through their support infrastructure. The features provided by the Driver Verifier are supplied automatically, without any special recompilation or use of specific code within your driver. This means that you can load any drivereven drivers that you didn't write or that you don't have source code forto localize and identify problems. This is clearly a boon for tracking down elusive crashes in even the most complex installations.
Memory Validation
Kernel-mode development under Windows 2000 is littered with pitfalls. Many APIs and other facilities come with stipulations governing when and where they can be used. A cursory review of just about any kernel-mode API call results in an array of stipulations, such as IRQLs at which an API can be used or specific sections of code where they can (or can't) be utilized. However, improper API use doesn't always result in immediate failure. As with any modern, complex OS, subtle inaccuracies in the use of system facilities usually result in catastrophic problems at some later point, sometimes in seemingly unrelated areas. Nowhere is this more true than in the use of memory. Improper assumptions and bad behavior can lead to elaborate errors, often long after the culprit has left the scene.
The Driver Verifier performs a rigorous set of checks and periodic tests to ensure proper use of the memory facilities provided by Windows 2000. First, all driver memory allocations are performed within the confines of a special memory pool rather than a pool that's shared among all drivers. This pool bounds all memory allocations with no access permissions, which immediately identifies memory overrun and underrun problems. In addition, when allocations are freed, they are explicitly marked as invalid. This will catch any memory accesses that are not properly synchronized with allocating and freeing memory.
Second, the driver's pool allocations that are not marked as NonPagedPoolMustSucceed (which generates a bug check when there is no memory left in a system) are failed randomly. This helps identify improper (or nonexistent) error checking code within a driver.
Third, any paged memory used by the driver, including its own pageable code and data sections, is periodically invalidated by the memory manager. This helps identify problems with pageable memory that is accessed at inappropriate IRQLs. Doing this immediately flushes out problems that usually won't occur in drivers unless the system is extremely taxed for memory resources. This particular issue has been a problem for driver developers because improper memory access can lead to seemingly random and hard-to-track problems in drivers.
Fourth, all of a driver's memory allocations and deallocations are checked to ensure that the driver's code is running at the proper IRQL. Paged allocations are checked to ensure operation at or below APC_LEVEL, and nonpaged allocations are checked to ensure operation at or below DISPATCH_LEVEL.
Finally, various elementary errors are also checked, such as double freeing of memory, zero-sized allocations, and validation of pointers to be freed. (The Driver Verifier makes sure that a pointer to be freed was obtained from a pool allocation function.) Taken together, these validations should shake out the lion's share of memory issues that plague drivers.
Parameter Validation
In prior versions of Windows, parameter checking of supplied APIs has proven to be one of the most effective additions for ensuring stability. Although, on the surface, adding this additional overhead simply because of developer sloppiness is not attractive, the payoff in eradicating improper API use is well worth the minimal processing time consumed.
With the Driver Verifier, parameter validation has been added to spinlock, IRQL manipulation, and the pool allocation and deallocation APIs. Note that unlike other API parameter validation implementations, this check is only performed within the Driver Verifier (although it probably should be in the system as a whole).
First, sanity checks are performed on all arguments passed to the aforementioned APIs to ensure that uninitialized values are not specified. Second, calls to KeRaiseIrql and KeLowerIrql are checked to ensure that a raise or lower is indeed happening based on the current IRQL. Third, spinlocks are checked to ensure that common errors such as double releases do not occur. In addition, the current IRQL is checked to make sure that acquisitions and releases are performed at the right IRQL. Finally, pool allocations are sanity checked to also ensure the proper IRQL. Paged memory allocations are checked to verify that they are performed at APC_LEVEL or below, and that nonpaged allocations are performed at DISPATCH_LEVEL or below.
IRP Checking
The I/O Manager is the heart of kernel mode in Windows. As window messages relate to user-mode applications, the I/O Managerand the IRPs that it uses to direct driver actionsare extremely critical to the operation of Windows. So it is not surprising that the Driver Verifier includes some elementary sanity checks on IRP allocation and utilization. For example, checks are performed to ensure that IRPs contain valid device objects and buffer pointers. Also, checks are performed to determine whether valid status codes are set upon calls to the IoCompleteRequest API.
Driver Unload Sanity Checks
The Driver Verifier checks whether a driver that is being unloaded has any outstanding asynchronous events or objects remaining in the system. Specifically, it checks to make sure that there are no outstanding DPCs, APCs, worker threads, timers, or undeleted queues. Leaving events and objects like this straggling around after a driver has been unloaded is obviously disastrous, and checking for these becomes more critical as the Plug and Play architecture (which brings with it dynamic driver loading) becomes more prominent in Windows 2000.
Enabling the Driver Verifier
Now that you've seen many of the advantages that the Driver Verifier has to offer, you're probably dying to jump in and try it. Fortunately, enabling the Driver Verifier under Windows 2000 is easy.
Early versions of the Driver Verifier required that specific registry keys be modified manually. Now, both Windows 2000 and the DDK include an application called the Driver Verifier Manager (Verifier.exe) that does all the work for you. This application makes setup and management of the Driver Verifier and its target kernel-mode drivers a snap.
You can select any kernel-mode drivers that are installed in the system in the Settings tab (see Figure 1). In addition, verification types to be used can be toggled on this screen. These include features such as IRQL checking and periodic invalidation of pageable memory. These features are toggled for all drivers loaded within the Driver Verifier. IRP validation, which is toggled by the I/O verification checkbox, can be turned on for the system as a whole as well as for the drivers loaded within the Driver Verifier.
|