BSOD from Idle process

My PC, previously stable, has recently started giving me a BSOD. Does anybody have any advice on how to track down what is causing it?

I'm new to analyzing minidumps so feel free to correct me, however from the analyzed minidump (attached to the end of this post) I'm guessing the problem might be one of my drivers but I have no idea how to narrow it down farther than the Idle process.

Here is some random background info in no particular order that may or may not provide additional info:
BSODs are random; I've seen it twice in one evening, and run for weeks without noticing one (using PC on average a few hours per day.).
BSOD will not happen on command. It has happened while using Google Voice (and before that was installed), while using Firefox (which may be meaningless since that's pretty much in constant use) and while the PC was just sitting idle (but possibly with Firefox still running unused in the background, I forget.)
PC has never been overclocked.
Windows & drivers have been reinstalled from scratch more than one time.   BIOS has been reflashed and set appropriatly as well from one of two possibilities (I neglected to record which I had used, and will be trying the other one too.).  BSOD persists.
Memtest86+ shows no memory errors.
Although the applications installed along with updates to Windows, browsers, plugins, etc. change constantly and the BIOS has been reflashed and everything was reinstalled perhaps a month or two before the first BSOD, nothing else jumps out as an obvious major change immediately prior to the first BSOD, and everything came from backups that hopefully remain as they were originally prior to noticing the first crash anyway.

Some systems specs, let me know if anything else is needed:
Microsoft Windows XP Pro SP3 + updates, 32-bit
AMD Athlon 64 X2 Dual Core 4200+
4GB RAM installed
Biostar GeForce 6100-M9 motherboard
Enermax Liberty ELT400AWT power supply

Thanks in advance for any help anybody can provide!

Here's the analyzed minidump:
-----------------------------------------------------------------------------------

Microsoft (R) Windows Debugger Version 6.12.0002.633 X86
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\WINDOWS\Minidump\Mini092710-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are available

Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows XP Kernel Version 2600 (Service Pack 3) MP (2 procs) Free x86 compatible
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 2600.xpsp_sp3_gdr.100427-1636
Machine Name:
Kernel base = 0xe0ba3000 PsLoadedModuleList = 0xe0c29720
Debug session time: Mon Sep 27 18:58:48.968 2010 (UTC - 7:00)
System Uptime: 0 days 1:09:15.659
Loading Kernel Symbols
...............................................................
...............................................
Loading User Symbols
Loading unloaded module list
...............
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 9C, {4, e0c1a5f0, b2000010, 10c0f}

Probably caused by : Unknown_Image ( ANALYSIS_INCONCLUSIVE )

Followup: MachineOwner
---------

0: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

MACHINE_CHECK_EXCEPTION (9c)
A fatal Machine Check Exception has occurred.
KeBugCheckEx parameters;
x86 Processors
If the processor has ONLY MCE feature available (For example Intel
Pentium), the parameters are:
1 - Low 32 bits of P5_MC_TYPE MSR
2 - Address of MCA_EXCEPTION structure
3 - High 32 bits of P5_MC_ADDR MSR
4 - Low 32 bits of P5_MC_ADDR MSR
If the processor also has MCA feature available (For example Intel
Pentium Pro), the parameters are:
1 - Bank number
2 - Address of MCA_EXCEPTION structure
3 - High 32 bits of MCi_STATUS MSR for the MCA bank that had the error
4 - Low 32 bits of MCi_STATUS MSR for the MCA bank that had the error
IA64 Processors
1 - Bugcheck Type
1 - MCA_ASSERT
2 - MCA_GET_STATEINFO
SAL returned an error for SAL_GET_STATEINFO while processing MCA.
3 - MCA_CLEAR_STATEINFO
SAL returned an error for SAL_CLEAR_STATEINFO while processing MCA.
4 - MCA_FATAL
FW reported a fatal MCA.
5 - MCA_NONFATAL
SAL reported a recoverable MCA and we don't support currently
support recovery or SAL generated an MCA and then couldn't
produce an error record.
0xB - INIT_ASSERT
0xC - INIT_GET_STATEINFO
SAL returned an error for SAL_GET_STATEINFO while processing INIT event.
0xD - INIT_CLEAR_STATEINFO
SAL returned an error for SAL_CLEAR_STATEINFO while processing INIT event.
0xE - INIT_FATAL
Not used.
2 - Address of log
3 - Size of log
4 - Error code in the case of x_GET_STATEINFO or x_CLEAR_STATEINFO
AMD64 Processors
1 - Bank number
2 - Address of MCA_EXCEPTION structure
3 - High 32 bits of MCi_STATUS MSR for the MCA bank that had the error
4 - Low 32 bits of MCi_STATUS MSR for the MCA bank that had the error
Arguments:
Arg1: 00000004
Arg2: e0c1a5f0
Arg3: b2000010
Arg4: 00010c0f

Debugging Details:
------------------

NOTE: This is a hardware error. This error was reported by the CPU
via Interrupt 18. This analysis will provide more information about
the specific error. Please contact the manufacturer for additional
information about this error and troubleshooting assistance.

This error is documented in the following publication:

- Bios and Kernel Developers Guid for AMD Athlon(r) 64 and AMD Opteron(r) Processors
Bit Mask:

MA Model Specific MCA
O ID Other Information Error Code Error Code
VV SDP ___________|____________ _______|_______ _______|______
AEUECRC| | | |
LRCNVVC| | | |
^^^^^^^| | | |
6 5 4 3 2 1
3210987654321098765432109876543210987654321098765432109876543210
----------------------------------------------------------------
1011001000000000000000000001000000000000000000010000110000001111


VAL - MCi_STATUS register is valid
Indicates that the information contained within the IA32_MCi_STATUS
register is valid. When this flag is set, the processor follows the
rules given for the OVER flag in the IA32_MCi_STATUS register when
overwriting previously valid entries. The processor sets the VAL
flag and software is responsible for clearing it.

UC - Error Uncorrected
Indicates that the processor did not or was not able to correct the
error condition. When clear, this flag indicates that the processor
was able to correct the error condition.

EN - Error Enabled
Indicates that the error was enabled by the associated EEj bit of the
IA32_MCi_CTL register.

PCC - Processor Context Corrupt
Indicates that the state of the processor might have been corrupted
by the error condition detected and that reliable restarting of the
processor may not be possible.

BUSCONNERR - Bus and Interconnect Error BUS{LL}_{PP}_{RRRR}_{II}_{T}_err
These errors match the format 0000 1PPT RRRR IILL



Concatenated Error Code:
--------------------------
_VAL_UC_EN_PCC_BUSCONNERR_F

This error code can be reported back to the manufacturer.
They may be able to provide additional information based upon
this error. All questions regarding STOP 0x9C should be
directed to the hardware manufacturer.

BUGCHECK_STR: 0x9C_AuthenticAMD

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: DRIVER_FAULT

PROCESS_NAME: Idle

LAST_CONTROL_TRANSFER: from e0b87bfb to e0bc5f43

STACK_TEXT:
e0c1a5c8 e0b87bfb 0000009c 00000004 e0c1a5f0 nt!KeBugCheckEx+0x1b
e0c1a6f4 e0b82c52 e0042000 00000000 00000000 hal!HalpMcaExceptionHandler+0xdd
e0c1a6f4 00000000 e0042000 00000000 00000000 hal!HalpMcaExceptionHandlerWrapper+0x4a


STACK_COMMAND: kb

SYMBOL_NAME: ANALYSIS_INCONCLUSIVE

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: Unknown_Module

IMAGE_NAME: Unknown_Image

DEBUG_FLR_IMAGE_TIMESTAMP: 0

FAILURE_BUCKET_ID: 0x9C_AuthenticAMD_ANALYSIS_INCONCLUSIVE

BUCKET_ID: 0x9C_AuthenticAMD_ANALYSIS_INCONCLUSIVE

Followup: MachineOwner
---------
Answer
Answer

I looked at your dump files and they show the same thing you already saw when you looked at them:

NOTE: This is a hardware error. This error was reported by the CPU
via Interrupt 18. This analysis will provide more information about
the specific error. Please contact the manufacturer for additional
information about this error and troubleshooting assistance.

This error is documented in the following publication:

- Bios and Kernel Developers Guid for AMD Athlon(r) 64 and AMD Opteron(r) Processors
Bit Mask:

(other stuff)

BUSCONNERR - Bus and Interconnect Error BUS{LL}_{PP}_{RRRR}_{II}_{T}_err
These errors match the format 0000 1PPT RRRR IILL

I don't have that in my notes and Googled around a bit, but could not find an exact cause/solution - just a bunch of guessing and trying things tha might work maybe...  that includes a defective or failing CPU, overheating, overclocking, a bad connection, a bad something or another, but some hardware issue seems to be associated with that error 9c.  It looks like you have done some of that searching too now that I look back at your original message.

When I put a heat gun on my CPU to make it overheat, there is no BSOD because the CPU just quits or my BIOS will shut things down at a temperature I determine.   XP would have to keep running in order to display a BSOD, so I never suspect overheating with a BSOD.  I have overheated too many times on purpose to know that.

I think if it was me, I would undo every connection inside the system that (without too much force) I could undo and put it back on one at a time so you don't get mixed up.  Maybe it is a marginal connection.  You could replace the thermal paste under your CPU cooling fan and/or your entire CPU fan, some people report that helps if overheating is suspected.

Think hardware...

Replacing the hal.dll is certainly possible, but there are 7 possibilities on the XP installation CD and during XP installation, the correct hal.dll gets determined when you install XP, it gets expanded and installed as hal.dll - it must match the hardware on your motherboard.  (this is why "replace your hal.dll" suggestions make me want to puke).

I know my hal.dll is the halaacpi.dl_ from the i386 folder on the installation CD because I looked at them all, expanded them all and compared the results to what was currently installed and I know works. It was obvious which one belonged to my system.

You could do that too - figure out what the right one is and (make a copy of your current one first) replace it from Recovery Console.  If you get the wrong one, you will know it right away and then you will have 6 more chances.  It should not be trial and error though - it is obvious when you look at them and there is only one that fits.

So, I regret that I don't have any good ideas other than the ones you probably already read. 

For some reason, I am not getting email alerts from these forums lately (maybe they don't like me), so I hope you will have some good discoveries and results.

 


Do, or do not. There is no try.

I need YOUR votes and points for helpful replies and Propose as Answers. I am saving up for a pony!

Was this reply helpful?

Sorry this didn't help.

Great! Thanks for your feedback.

How satisfied are you with this reply?

Thanks for your feedback, it helps us improve the site.

How satisfied are you with this reply?

Thanks for your feedback.

 
 

Question Info


Last updated March 30, 2018 Views 1,635 Applies to: