Hi,
My apologies for the late reply, forums have been going through updates today and I had to actually update my Debugging tools because the kernel wasn't working.
Please note that this will be a fairly long post because *101 bugchecks are very complicated and I'd like to provide as much information as my knowledge permits.
Right, so as per usual, the attached DMP file is of the
CLOCK_WATCHDOG_TIMEOUT (101) bugcheck.
BugCheck 101, {
19, 0,
fffff880017e5180, 6}
19 clock ticks in regards to the timeout.
fffff880017e5180 is the PRCB address of the hung processor, let's keep this address in mind.
Running a !prcb on processor 0:
0: kd> !prcb 0
PRCB for Processor 0 at fffff8007377a180:
Current IRQL -- 13
Threads-- Current fffff800737d4880 Next 0000000000000000 Idle fffff800737d4880
Processor Index 0 Number (0, 0) GroupSetMember 1
Interrupt Count -- 05b0f044
Times -- Dpc 00000742 Interrupt 0000020e
Kernel 000e923f User 0001143e
No match for address, let's try processor 1 this time:
0: kd> !prcb 1
PRCB for Processor 1 at fffff880009bf180:
Current IRQL -- 0
Threads-- Current fffffa8013675040 Next 0000000000000000 Idle fffff880009caf40
Processor Index 1 Number (0, 1) GroupSetMember 2
Interrupt Count -- 05a1e94a
Times -- Dpc 00000004 Interrupt 00000047
Kernel 000f2ca2 User 000079cc
Nope, no match either. I'll spare you the space in the post and tell you that processor #6 is the one we're looking for :+)
0: kd> !prcb 6
PRCB for Processor 6 at
fffff880017e5180:
Current IRQL -- 0
Threads-- Current fffff880017f0f40 Next fffffa8010eb1b00 Idle fffff880017f0f40
Processor Index 6 Number (0, 6) GroupSetMember 40
Interrupt Count -- 06019b1f
Times -- Dpc 000017ef Interrupt 000003d8
Kernel 000ed494 User 0000cf3c
For reference, I did not do !prcb 0 through 6. That would have been very tedious. Instead, you can run the
!running -it command. The "i" argument causes it to display idle procs too, and "t" displays the stack trace for the thread running on each proc.
Hint: At times, the 4th parameter of the bugcheck will show you the responsible processor. For example, in your *101 here, it was correct as the 4th parameter was 6.
As this matches the 3rd parameter of the bugcheck, processor #6 is the responsible processor. Now with the information we have here thus far, we know that processor #6 reached 19 clock ticks without responding, therefore the system
'd. Before we go further, what is a clock tick? A clock interrupt is a form of interrupt which involves counting the the cycles of the processor core, which is running a clock on the processors to keep them
all in sync. A clock interrupt is handed out to all processors and then they must report in, and when one doesn't report in, you then crash.
If we look specifically at processor #6, we can see it did...well... nothing:
6 fffff880017e5180 fffff880017f0f40 ( 0) fffffa8010eb1b00 (15) fffff880017f0f40 ................
Child-SP RetAddr Call Site
00000000`00000000 00000000`00000000 0x0
Now how and why did this take place? First, let's check the IRQL of each one of the processors before the system crash:
0: kd> !irql 0
Debugger saved IRQL for processor 0x0 -- 13
0: kd> !irql 1
Debugger saved IRQL for processor 0x1 -- 0 (LOW_LEVEL)
0: kd> !irql 2
Debugger saved IRQL for processor 0x2 -- 0 (LOW_LEVEL)
0: kd> !irql 3
Debugger saved IRQL for processor 0x3 -- 0 (LOW_LEVEL)
0: kd> !irql 4
Debugger saved IRQL for processor 0x4 -- 0 (LOW_LEVEL)
0: kd> !irql 5
Debugger saved IRQL for processor 0x5 -- 0 (LOW_LEVEL)
0: kd> !irql 6
Debugger saved IRQL for processor 0x6 -- 0 (LOW_LEVEL)
As you can see, the IRQL of the first processor is 13 (which is CLOCK for x64 processors) and the rest are all 0. So we can see that only Processor 0 was at CLOCK level.
Now that we have the IRQL, let's look at the call stack of the different processors for more info. Let's start with Processor 0 (warning, it's large):
fffff800`72396878 fffff800`7365beee
nt!KeBugCheckEx
fffff800`72396880 fffff800`73520774 nt! ?? ::FNODOBFM::`string'+0x14543
fffff800`72396900 fffff800`73438eca
nt!KeUpdateTime+0x2ec
fffff800`72396ae0 fffff800`734d573a
hal!HalpTimerClockInterrupt+0x86
fffff800`72396b10 fffff800`73507fe9 nt!KiInterruptDispatchNoLockNoEtw+0x1aa
fffff800`72396ca0 fffff800`7353708c nt!KeFlushMultipleRangeTb+0x290
fffff800`72396ea0 fffff800`7360ec08 nt!MiFlushPteList+0x2c
fffff800`72396ed0 fffff800`736f47e9 nt!MmFreeSpecialPool+0x2ec
fffff800`72397010 fffff880`01a73b7b nt!ExFreePool+0x6d8
fffff800`723970f0 fffff880`01b701eb ndis!NdisFreeCloneNetBufferList+0x6b
fffff800`72397140 fffff880`01ca1ff6 NETIO!NetioDereferenceNetBufferList+0xcb
fffff800`723971d0 fffff880`01caa115 tcpip!WfpProcessInTransportStackIndication+0xabb
fffff800`723977e0 fffff880`01ca1198 tcpip!InetInspectReceiveDatagram+0x255
fffff800`72397900 fffff880`01c9fd4b tcpip!UdpBeginMessageIndication+0x78
fffff800`72397a50 fffff880`01c9f67e tcpip!UdpDeliverDatagrams+0x18b
fffff800`72397be0 fffff880`01c9c082 tcpip!UdpReceiveDatagrams+0x1a4
fffff800`72397cf0 fffff880`01c9c338 tcpip!IppDeliverListToProtocol+0xf2
fffff800`72397da0 fffff880`01ca03bb tcpip!IppProcessDeliverList+0x68
fffff800`72397e50 fffff880`01c9de11 tcpip!IppReceiveHeaderBatch+0x21b
fffff800`72397f80 fffff880`01c9f253 tcpip!IpFlcReceivePackets+0x641
fffff800`723981b0 fffff880`01caa2d9 tcpip!FlpReceiveNonPreValidatedNetBufferListChain+0x2ce
fffff800`72398280 fffff800`735319a6 tcpip!FlReceiveNetBufferListChainCalloutRoutine+0x119
fffff800`72398380 fffff800`73534405 nt!KeExpandKernelStackAndCalloutInternal+0xe6
fffff800`72398480 fffff880`01caa3ce nt!KeExpandKernelStackAndCalloutEx+0x25
fffff800`723984c0 fffff880`01a72b06 tcpip!FlReceiveNetBufferListChain+0xae
fffff800`72398540 fffff880`01a72560 ndis!ndisMIndicateNetBufferListsToOpen+0x126
fffff800`723985f0 fffff880`01a72843 ndis!ndisInvokeNextReceiveHandler+0x650
fffff800`723986c0 fffff880`050c23c9 ndis!NdisMIndicateReceiveNetBufferLists+0xd3
fffff800`72398770 fffff880`050b1a48
Rt630x64!MpHandleRecvIntPriVLanJumbo+0xb0d
fffff800`72398960 fffff880`01a732ff
Rt630x64!MPHandleMessageInterrupt+0x35c
fffff800`723989d0 fffff880`01a7341c
ndis!ndisMiniportDpc+0xff
fffff800`72398a60 fffff800`73504ca1
ndis!ndisInterruptDpc+0x9c
fffff800`72398af0 fffff800`735048e0
nt!KiExecuteAllDpcs+0x191
fffff800`72398c30 fffff800`735059ba
nt!KiRetireDpcList+0xd0
fffff800`72398da0 00000000`00000000
nt!KiIdleLoop+0x5a
^^ I have underlined what's important here.
Processors 0, 2, 3, and 4 both started with the IdleLoop routine, which is basically the start of the System Idle Process you see in Task Manager. Essentially all of these processors were sitting & waiting to do something.
We can see in Processor 0 went from:
nt!KiIdleLoop+0x5a - Waiting to do something.
to
nt!KiRetireDpcList+0xd0 - Function that will sit in a loop dequeing DPCs from the current processor’s DPC queue and calling the callbacks. I will explain DPC's below.
hal!HalpTimerClockInterrupt+0x86 - We then eventually see that Processor 0 received an interrupt. This interrupt happened to be a clock interrupt.
nt!KeUpdateTime+0x2ec - The clock interrupt then involved updating the system time. This is something that is replicated across all processors so that all the processors update their own timers and things are kept track of. Remember, everything
needs to be in sync!
nt!KeBugCheckEx - We also then finally see that Processor 0 was the processor that performed the bugcheck.
----------------------------------------------------------------------------------------------------------
What is a DPC? That is a Deferred Procedure Call, which is a Microsoft Windows operating system mechanism which allows high-priority tasks (e.g. an interrupt handler) to defer required but lower-priority tasks for later execution. This permits device drivers
and other low-level event consumers to perform the high-priority part of their processing quickly, and schedule non-critical additional processing for execution at a lower priority.
DPCs are implemented by DPC objects which are created and initialized by the kernel when a device driver or some other kernel mode program issues requests for DPC. The DPC request is then added to the end of a DPC queue. Each processor has a separate DPC queue.
DPCs have three priority levels: low, medium and high. By default, all DPCs are set to medium priority. When Windows drops to an IRQL of Dispatch/DPC level, it checks the DPC queue for any pending DPCs and executes them until the queue is empty or some other
interrupt with a higher IRQL occurs.
For example, when the clock interrupt is generated, the clock interrupt handler generally increments the counter of the current thread to calculate the total execution time of that thread, and decrements its quantum time remaining by 1. When the counter drops
to zero, the thread scheduler has to be invoked to choose the next thread to be executed on that processor and dispatcher to perform a context switch. Since the clock interrupt occurs at a much higher IRQL, it will be desirable to perform this thread dispatching
which is a less critical task at a later time when the processor's IRQL drops. So the clock interrupt handler requests a DPC object and adds it to the end of the DPC queue which will process the dispatching when the processor's IRQL drops to DPC/Dispatch level.
----------------------------------------------------------------------------------------------------------
Now, we can see the specific driver that requested the DPC is
Rt630x64.sys which is the
Realtek PCI/PCIe Adapters driver.
So, that definitely starts us somewhere. Now, let's go further!
If we look at the call stack from Processor 5:
Child-SP RetAddr Call Site
fffff880`0e021360 fffff800`7353708c nt!KeFlushMultipleRangeTb+0x2a6
fffff880`0e021560 fffff800`7350aad0 nt!MiFlushPteList+0x2c
fffff880`0e021590 fffff800`735a3cdb nt!MiFreeWsleList+0x386
fffff880`0e0217b0 fffff800`735a3b83 nt!MiEmptyWorkingSetHelper+0xe7
fffff880`0e0217e0 fffff800`7360c0ce nt!MiEmptyWorkingSet+0xcb
fffff880`0e021890 fffff800`73ac313a nt!MiTrimAllSystemPagableMemory+0x266
fffff880`0e0218e0 fffff800`73ad61db nt!MmVerifierTrimMemory+0xca
fffff880`0e021910 fffff800`73ad583a nt!ViKeRaiseIrqlSanityChecks+0xdb
*** ERROR: Symbol file could not be found. Defaulted to export symbols for nvlddmkm.sys -
fffff880`0e021950 fffff880`045672b4 nt!VerifierKeAcquireInStackQueuedSpinLock+0xa6
fffff880`0e021990 fffff880`045bcd59
nvlddmkm+0x852b4
fffff880`0e0219e0 fffff880`04d432da
nvlddmkm+0xdad59
fffff880`0e021f90 fffff880`03ce43fa
nvlddmkm!nvDumpConfig+0x2396b2
fffff880`0e022040 fffff880`03ce30aa
dxgkrnl!DXGCONTEXT::Render+0x41a
fffff880`0e022930 fffff800`734db453
dxgkrnl!DxgkRender+0x26a
fffff880`0e022c40 000007f9`4214118a nt!KiSystemServiceCopyEnd+0x13
000000a6`7a21df18 00000000`00000000 0x000007f9`4214118a
We can see two DirectX Kernel routine calls and then nvlddmkm.sys calls. nvlddmkm.sys is the nVidia video driver. So, let's put this all together now:
- Realtek PCI/PCIe Adapters driver in the stack
- DirectX Kernel in the stack
- nVidia video driver in the stack
From this, we can say:
1. Possible corrupt / buggy video card drivers:
Ensure you have the latest video card drivers. If you are already on the latest video card drivers, uninstall and install a version or a few versions behind the latest to ensure it's not a latest driver only issue. If you have already experimented with the
latest video card driver and many previous versions, please give the beta driver for your card a try.
-- It's also possible that another device driver is corrupting the video card drivers, etc. As you mentioned this started happening right around the time of Silicon's installation, it wouldn't hurt to uninstall that software for temporary puposes.
2. Faulty video card or if integrated video faulty motherboard.
3. Faulty RAM, often a culprit in regards to DirectX kernel and MMS crashes. Run a Memtest for NO LESS than ~8 passes (several hours):
Memtest86+:
Download Memtest86+ here:
http://www.memtest.org/
Which should I download?
You can either download the pre-compiled ISO that you would burn to a CD and then boot from the CD, or you can download the auto-installer for the USB key. What this will do is format your USB drive, make it a bootable device, and then install the necessary
files. Both do the same job, it's just up to you which you choose, or which you have available (whether it's CD or USB).
How Memtest works:
Memtest86 writes a series of test patterns to most memory addresses, reads back the data written, and compares it for errors.
The default pass does 9 different tests, varying in access patterns and test data. A tenth test, bit fade, is selectable from the menu. It writes all memory with zeroes, then sleeps for 90 minutes before checking to see if bits have changed (perhaps because
of refresh problems). This is repeated with all ones for a total time of 3 hours per pass.
Many chipsets can report RAM speeds and timings via SPD (Serial Presence Detect) or EPP (Enhanced Performance Profiles), and some even support changing the expected memory speed. If the expected memory speed is overclocked, Memtest86 can test that memory performance
is error-free with these faster settings.
Some hardware is able to report the "PAT status" (PAT: enabled or PAT: disabled). This is a reference to Intel Performance acceleration technology; there may be BIOS settings which affect this aspect of memory timing.
This information, if available to the program, can be displayed via a menu option.
Any other questions, they can most likely be answered by reading this great guide here:
http://forum.canardpc.com/threads/28864-FAQ-please-read-before-posting
Regards,
Patrick