I recently got a new computer preloaded with Windows Vista. I must say that when I got the machine I was extremely surprised at how well it worked and how easy it was for me to transfer my data to the new machine (Lenovo T60p). With all of the comments I had been hearing which indicated that many people felt that the Vista operating system was problematic, I was rather surprised at how smooth my transition was.
Alas, this was not to last, and what is rather ironic is that I have been subject to a number of issues in the past month that have been rather disruptive and taken some time to remedy and yet are not related specifically to any flaw in the operating system.
The first sign of trouble was when I shut the computer down after doing some work and found it would not start the next morning. I believe a file was (or files were) corrupted somehow on shutdown (although I do not know for sure) and were only able to get back to an operable state by restoring the OS from a recent rescue and recovery backup, which I do regularly. One nice part about this option is that one can restore the OS to the state in the backup without touching the state of user files. In other words, fix only the OS and leave my files alone. This worked like a charm, so I was pleased to see that the recovery tools worked as they should. But I was not yet through my ordeal. This was only the beginning of a folly of further system errors.
The next thing that happened is that I started to receive visits from the Blue Screen of Death. Now, I am an intense computer user, so when this happened, I naturally assumed it was because I was doing too many things, i.e. playing music, compiling code, having too many files open, etc. Yet, when this continued to occur even at times of minimal activity and began to interfere with system backups. I knew at that point that I really needed to get to the bottom of this.
The next issue occurred, rather ironically, when I was trying to fix this problem and ran the Lenovo system update tool. When I ran the update, the tool informed me of a “critical BIOS update”. Upon reading about this update, I found it interesting that the language Lenovo used to describe what the update was addressing was “a possible processor marginality” and “a potential source of unpredictable system behavior”. Further research on the Microsoft site was more telling: “You may receive a Stop error, or you may experience unpredictable system behavior”. Oh yes, I thought I had found the issue and so proceeded to apply this update. But something odd happened when the update was complete. The computer would not start and the BIOS screen did not come up. I knew immediately that the BIOS was either erased or corrupted, and so contacted Lenovo support, who promptly sent out a technician to replace the system board.
Phew! Well, that’s over, I thought. But it wasn’t. Now, not only was I still getting blue screens, but since they had replaced the system board, all sorts of additional ramifications emerged:
- I had to re-activate Windows by calling Microsoft;
- I found that the technician had not entered the machine type or serial number to my machine, making it impossible to use certain ThinkVantage software such as system update;
- I had previously been using the security chip on the old system board to secure the computer – now the Client Security Solution was caught in a loop and I had to uninstall and reinstall the entire subsystem;
- The antivirus program I had installed no longer worked & required reinstallation;
Then when the technician came out to update the system board he proceeded to tell me that he could not put the serial number in the computer because this was not supported under Windows Vista.
Surprisingly, I found that I did not get upset as I was presented with this continual assault on my computing platform. I just decided I would fix it all, piece by piece.
I called Lenovo and asked them about what the technician had told me. They informed me that he was incorrect and arranged to have another technician who understood that one had to boot from the program using a floppy (How passé! One would think that in this day in age we could at least boot off of USB flash memory for this kind of operation…) that enters the serial number rather than double-click it from the operating system.
Armed with my new serial number - embedded system board, I proceeded to troubleshoot the ongoing bluescreens. One nice feature of Vista is that it keeps a log of all system failures (separate from the event log) and associated files. So all the memory dumps were there and available for me to analyze with the Windows debugger. My research showed that the issue was with the wireless driver for the Lenovo / Atheros chipset on my system. My hypothesis is that either the Lenovo system update or Microsoft update saw the hardware and provided the wrong update to this driver, as I was able to confirm that a system update happened on the 18 of June, which is the date of the updated driver and the date that the bluescreen problem started. I was able to resolve the issue by removing the driver and then reinstalling the driver from a fresh download package.
I then noticed that when I tried to back up the computer using the new Microsoft backup program, I received an inormative message, “An error occurred. The following information may help you to resolve the problem: Catastrophic failure (0x8000FFFF)”. I could just imagine how an average user would respond to that message. Oh, thanks for that, very helpful!
I proceeded to check the event log and found that the NTFS file system reported errors and suggested that I run chkdsk. So I did. Or, at least I tried to. But no matter what I did, on reboot, chkdsk would not run. I am apparently not the only person to experience this issue. I figured out that if I booted into the recovery partition and then to the command prompt that I could run chkdsk and was able to run it successfully. It found quite a lot of errors and corrected these. I hypothesize now that this was the actual source of many of these issues, as the initial problem where the machine would not boot properly may have been caused by a bad write to the drive.
Having repaired the disk to pristine state, I rebooted. Oh dear. Now the system informed me that a Windows dll was corrupted and continuously popped up a message box to remind me. Hmm. Well, at least I was provided with the name of the dll, so back to the recovery console I went and performed a manual copy of that dll from the recover partition to c:\windows\system32. I rebooted, saw no errors, the event log looked clean (and still does several days later).
I have now gotten the machine back to a stable state and am back up on daily backups. Altogether, it has now taken me about one month to re-stabilize this system which started out as being so wonderfully smooth I could not believe it. Maybe that was the problem all along.
In any case, what is clear to me is that no computer is free of problems, and to expect that from anything in life is something I consider to be unrealistic. If a typical user faced these issues, I loathe to think what they would do without a full battery of technical support staff on hand to assist them. The lesson? The compelling illusion of the perfect computer is so pervasive we often forget that there are many possible responsibilities that come with operating a complex piece of machinery. It is ironic that the user interfaces we strive to build often support that illusion. And yes, we do want to make it easier for people to use computers, however, we must not ourselves be seduced by the metaphor.