Top Navigational Bar

Critical Hardware, DOS, And Bios Errors
DocumentID: 659219
Revision Date: 29-Feb-96 8:29:48 PM

The information in this document applies to:
WordPerfect® 5.1 for DOS

Problem

Solutions: Critical Hardware, DOS & BIOS Errors

Parity Error:
Typically, when a parity error occurs during printing or through the use of a modem, it is an indication that information is not viable throughout the transmission to the printer or through the phone lines. This type of parity error will not lock the computer. However, if the user receives a parity error which causes the program to lock or not come up successfully, it is usually referring to memory on the mother board. In the memory bank of every standard IBM-compatible computer, aside from the required amount of memory chips, is an additional chip for each "bank" called the parity chip. This chip has a 1 bit address locale in conjunction with each "Word" or field of memory that the mother board has. (Some non-standard computers, such as some Tandy models, etc., do not use parity checking.)

Whenever information is written to memory the "memory circuitry", which interfaces all memory banks to the CPU, determines whether there is an even or odd amount of "1" bits in the field. If the count in the field is even, the parity bit is set at "1"; if the count is odd the parity bit is "0". This parity "flag" is very important in maintaining the integrity on any given program running in memory.

A typical memory chip needs what is called "refreshing" or "refresh cycle" in order to retain the information stored therein. You can think of a refresh cycle as a recharging process much like a battery or a capacitor. That is why "RAM" (Random Access Memory) is referred to as volatile memory since everything is lost when the computer is turned off. If the user performs a soft-boot, Ctrl-Alt-Del, the computer does not clear memory unless the soft- boot sequence performs such an action as clearing or performing a memory test which is usually not part of a soft-boot; this is also why a soft-boot, otherwise known as a warm-boot, is faster.

This refresh cycle happens very often, about every 15.2 s (micro-seconds) depending on the particular requirements of the memory chips. During a refresh cycle the memory circuitry will temporarily halt any CPU (8088, 286, 386, 486iTM, ...) actions while it quickly refreshes the chips. When a refresh cycle is performed the refresh circuitry first checks whether there is an even or odd number of "1" bits in the field and then compares it to the associated parity flag to test whether one of the fields has changed. If it finds that such a "bit/s failure" has occurred, it will finish the refresh cycle, correct the parity to the proper level of the new field that has erred, enable the CPU and raise the NMI (Non Maskable Interrupt) pin to a high position on the CPU. This causes the CPU to halt whatever program is running and performs Interrupt #2 (NMI), usually a small program on the computers ROM (Read Only Memory), which is not volatile. This routine will write to the screen a message proclaiming that a parity error has occurred, then directing the CPU into an endless loop until the computer is rebooted.


Stack Overflow:
To explain Stack Overflow, one must first review basics in CPU stack operation. Internal to the CPU are two registers used for keeping track of program calls, function requests, interrupts, etc.. These registers are referred to as the Stack Segment and Stack Pointer. These registers combine to effectively address any position currently possible on mother board memory. When a program is running in memory there are several functions or questions within the program that are repeated quite frequently. Because these functions are used continually, it is therefore space saving to have such functions written only once and then given an address whereby those routines or functions can be called upon at any time no matter where the program is currently operating. Depending on the size and complexity of the program running, these subroutines, can become very numerous and are staggered throughout the entire program.

It is common for subroutines to call upon other subroutines which can be several deep. The program, its subroutines, and its interrupts can be thought of as a large cavern with many passages containing more passages etc. down into the earth. Aside from performing functions, or answering questions etc., these oft-recurrent subroutines must also be very careful to restore the CPU's registers and environment to the original state it was in before the subroutine was called upon. This is very important. The CPU uses the stack registers to record the program address position before calling a subroutine. When the subroutine finishes its task, the CPU once again calls upon the stack registers to recall the original positioning address before the subroutine.

You can think of the CPU as a child wandering in a complex maze of dangerous tunnels in a large abandoned mine where the mine is relative to the PC's memory. The stack registers act as a guide for the child. The greater amount of subroutines and interrupts called will result in the child being located deeper in the mine.

The stack registers have an assigned working location in memory where they can store a vast amount of addresses. For every subroutine or interrupt called, the stack registers perform a "Push" operation, pushing an address into memory, which increments the stack registers. For every push operation there must also be a "Pop" operation where those same addresses are popped back out of memory in the same order they were pushed so that the guide can help the child make a careful ascent back up the catacomb mine. However, the stack has memory limitations which becomes even worse at times depending on the programmer who developed the code originally. The stack can also fail as the child's guide if the subroutines along the way tamper with those registers (stack registers) not returning them to the original pointers they were at originally before the subroutine. The stack registers memory area is like a spiral staircase into space. Every time an address is pushed into memory, the stack register climbs a few stairs to store that address; the top of the stair case represents the limitation of the stacks memory. As the stack pops the old addresses back off the staircase it climbs back down, which is somewhat like a video arcade game. If too many interrupts or subroutines are called before others are ended then the stack registers will end up jumping off of the top of the staircase and the child becomes lost. This is usually what happens when a computer locks without any message. Some programs and versions of DOS watch the stack. If a problem occurs it will report a stack overflow (top of the stairs) and halt the machine before the stack jumps off the top leaving the child to wander alone more or less. It could otherwise result in possible data loss on the user disk area.

Many stack overflow problems are prevented by simply removing memory resident programs, TSRs (Terminate and Stay Resident), from memory by running "Vanilla". Vanilla is a process of renaming AUTOEXEC.BAT and CONFIG.SYS, rebooting the computer, and using DOS commands to start the program having the problems. The user then would restore one command at a time back into a new AUTOEXEC.BAT and CONFIG.SYS using the renamed versions until the stack overflow-stack failure-divide overflow was recreated thereby indicating the suspected TSR that caused the problem.

Shell, part of WPCorp's Library program, is a TSR. It monitors the use of Clipboard hotkeys and Shell exiting hotkeys. This TSR is mainly concerned with watching the keyboard for keys before the current program can check the scanned key code. The worst kind of TSR is the kind that occurs every time the system clock is updated which is reoccurring at 55ms (milli-seconds). These types of TSRs check for certain conditions in the PC every 55ms, which is not very often in computer time. Depending on how many and how long the TSRs are, can be interpreted as good indicators of suspected stack overflow problem causers.

Programs loaded into memory are quite often composed of several internal TSRs that are terminated when the program is exited. An example of this is the Novell Menu System where the time is constantly being updated on the windowed screen. The 55ms TSRs will be used to illustrate a stack failure. When a 55ms TSR is installed into RAM, it will always take the existing 55ms TSR's address and save it so that once the new TSR has finished its instructions, it will call the TSR which was originally in it's place. Suppose the original TSR also replaced a previously installed TSR etc. All these TSRs are stacked into memory waiting to be run every 55ms not to mention all the subroutines that comprise each TSR. Each time a TSR comes along, the stack registers climb the memory stairs to save current addresses before starting the TSR and repeats the same pushing operation of the TSR's subroutines while it is operating. If all the TSRs stacked in line aren't completed before 55ms occurs again, the first TSR will start all over before the last one can be completed. This results in the stack registers climbing the stairs but never able to come back down by popping all the last TSR's addresses before 55ms reoccurs resulting with the stack register jumping off the top of the stairs somewhat like a victim of suicide. The child therefore is lost and cannot hope to return from the bowels of the mine and should be halted by the CPU before loss of critical data could transpire. Occasionally, two subroutines keep calling upon each other in turn causing a stack overflow which is either a fluke or considered bad source code.

Some programs rely on the stack created and used by DOS rather than creating a new stack environment. This is why a user will often try to overcome a stack overflow problem by increasing the stack environment using the external DOS "STACK/S" command in AUTOEXEC.BAT. In the problem just mentioned, this idea of increasing the stack size will only delay the error for a very short time, although, under normal circumstances this would correct the problem for those programs that rely on a DOS stack environment which require a little more than the default environment space.

Two other suggestions operators have volunteered are to avoid files which are non-contiguous or damaged. A non-contiguous file can be overcome by copying the file to a blank diskette where you can be sure it wouldn't be staggered in segments all over a hard drive but in a clean logical format. To overcome possible file damage, block the file, delete and restore to second document screen or worksheet and save. This method has proved to remove and strip a lot of possible problems and creates a new Program Segment Prefix (PSP) for the file. In the case of a PlanPerfect Spreadsheet, this method not only creates a new PSP but also strips all links, names, page formats, column widths and sometimes the errant code if you are lucky.


Stack Failure:
Usually this is identical to Stack Overflow except that it is usually a result of the stack pointer or stack segment register not having the value(s) that should have been present at a certain point in program code. Some programs are written to periodically test the validity of the stack meaning that the software has been keeping tabs on the contents of the stack registers for later comparisons. A good software product will use these periodic values to test if TSRs have not returned the stack registers to the original values at which they were before the TSR created a detour in the software. This is particularly necessary regardless of whether the program currently running recognizes any resident software presently in the environment of the PC since it must also protect and defend against its own code as large multi-complex programs may often become intricate and unwieldy in terms of self regulation.

When a program is checking the integrity of the stack pointer and finds that it has become damaged, the routine will most assuredly not have any clue as to what caused this or what has been going on in the computer's environment since stack damage occurred. It is therefore prudent to halt all operations in order to prevent insignificant or severe results to valuable information that may still be safe on user media.


Divide Overflow:
This occurs when using the "DIV" or "IDIV" opcodes in machine language where the devisor register is 0 or the result of the devisor is too large for the destination register. It causes an immediate INTERRUPT #0 (Divide by Zero INT). Moreover, when the answer is larger than the destination location regardless of what the operation is, be it division, addition, multiplication etc., the source code will occasionally generate an immediate "INTO" instruction which performs INTERRUPT #4 (Overflow) if the overflow flag is set. This allows for error correction if the programmer took the time to install his/her procedure for an overflow operation. If the programmer didn't install one, the CPU will halt and display a divide overflow error and halt operation just as if it were a "Divide by Zero" error. Some BIOS Versions will not; they will simply return back to the program as if nothing happened. Sometimes, even though the program has installed it's overflow routine for INT #4, an interrupting TSR will temporarily re-establish it's INT #4 vectors, run its program and then restore the original INT #4 that was there when the TSR was originally installed rather than restoring the current vector it wiped out, which now damages the integrity of the program that the TSR interrupted. Depending on whether the current program running has changed the interrupt vector containing the address of the routine in the event of a Divide Overflow, will result in some type of Divide Overflow message on the screen or else the ability to correct the problem regardless of how involved and then continue back to normal program operation.

Most software programs are written to either avoid this situation, by testing and correcting before it occurs or use it as a means of completing a larger division. However, if the program was unsuccessful in detecting the problem or provided no safeguards against such, then the divide overflow interrupt will be executed. It is usually not a good idea for a program to correct this error if it originally was designed to safeguard against it. The reason why is that something drastic has probably occurred and the program can no longer maintain its integrity. Locking up the CPU with a divide overflow message at this point would be the best decision since returning back to the program may result in data damage both in memory and on the hard or floppy disk.


The manual DOS: The Complete Reference gives the following explanation of the "Divide Overflow" message:
      "Divide overflow: An application program attempted to divide a number by 0, or an internal error caused DOS to invoke the divide by 0 interrupt. Control is returned to DOS."

It is common for programs which don't safeguard or protect itself against it to use this error to correct the problem and then return to normal operation. Many programs incorporate both methods because of the nature or objective of the software. A divide overflow may also occur as a result of TSRs altering code at key memory positions where the resident program should never have been in the first place or not returning certain registers to the original values they contained before the TSRs took temporary control after returning. Running vanilla is a trouble shooting technique for this type of problem as explained earlier in Stack Overflow.


References:
iAPX 86, 88, 186 and 188 User's Manual, intelTM (Programmer's Reference - 1983).
8088 Macro Assembler Programming, Dan Rollins - 1985.
Handbook of Software and Hardware Interfacing for IBM PCs, Jeffrey P. Royer - 1987.
Architecture, Software & Interfacing Techniques, Walter Triebel & Avtar Singh - 1987.

Answer:

Details:


Product specifications, packaging, technical support and information (*Specifications*) refer to theUnited States retail English version only. Specifications for other versions may vary. All Specifications, claims, features, representations, and/or comparisons provided are correct to the best of our knowledge of the date of publication, but are subject to change without notice.OUR MAXIMUM AGGREGATE LIABILITY TO YOU AND THAT OF OUR DEALERS AND SUPPLIERS IS LIMITED. IT SHALL NOTEXCEED THE AMOUNT YOU PAID TO ACCESS THE INFORMATION. SEE LEGAL DISCLAIMER.