
|
 |
|
Michael Abrash's Graphics Programming Black Book, Special
Edition by
Michael Abrash The Coriolis Group
ISBN: 1576101746 Pub
Date: 07/01/97 |
Chapter 24 Parallel
Processing with the VGA
Taking on Graphics Memory
Four Bytes at a Time
This heading refers to the ability of the VGA chip to manipulate up to
four bytes of display memory at once. In particular, the VGA provides four
ALUs (Arithmetic Logic Units) to assist the CPU during display memory
writes, and this hardware is a tremendous resource in the task of
manipulating the VGA’s sizable frame buffer. The ALUs are actually only
one part of the surprisingly complex data flow architecture of the VGA,
but since they’re involved in almost all memory access operations, they’re
a good place to begin.
VGA Programming: ALUs and
Latches
I’m going to begin our detailed tour of the VGA at the heart of the
flow of data through the VGA: the four ALUs built into the VGA’s Graphics
Controller (GC) circuitry. The ALUs (one for each display memory plane)
are capable of ORing, ANDing, and XORing CPU data and display memory data
together, as well as masking off some or all of the bits in the data from
affecting the final result. All the ALUs perform the same logical
operation at any given time, but each ALU operates on a different display
memory byte.
Recall that the VGA has four display memory planes, with one byte in
each plane at any given display memory address. All four display memory
bytes operated on are read from and written to the same address, but each
ALU operates on a byte that was read from a different plane and writes the
result to that plane. This arrangement allows four display memory bytes to
be modified by a single CPU write (which must often be preceded by a
single CPU read, as we will see). The benefit is vastly improved
performance; if the CPU had to select each of the four planes in turn via
OUTs and perform the four logical operations itself, VGA
performance would slow to a crawl.
Figure 24.1 is a simplified depiction of data flow around the ALUs.
Each ALU has a matching latch, which holds the byte read from the
corresponding plane during the last CPU read from display memory, even if
that particular plane wasn’t the plane that the CPU actually read on the
last read access. (Only one byte can be read by the CPU with a single
display memory read; the plane supplying the byte is selected by the Read
Map register. However, the bytes at the specified address in all four
planes are always read when the CPU reads display memory, and those four
bytes are stored in their respective latches.)
Each ALU logically combines the byte written by the CPU and the byte
stored in the matching latch, according to the settings of bits 3 and 4 of
the Data Rotate register (and the Bit Mask register as well, which I’ll
cover next time), and then writes the result to display memory. It is most
important to understand that neither ALU operand comes directly from
display memory. The temptation is to think of the ALUs as combining CPU
data and the contents of the display memory address being written to, but
they actually combine CPU data and the contents of the last display memory
location read, which need not be the location being modified. The most
common application of the ALUs is indeed to modify a given display memory
location, but doing so requires a read from that location to load the
latches before the write that modifies it. Omission of the read results in
a write operation that logically combines CPU data n with whatever data
happens to be in the latches from the last read, which is normally
undesirable.
Figure 24.1 VGA ALU data
flow.
Occasionally, however, the independence of the latches from the display
memory location being written to can be used to great advantage. The
latches can be used to perform 4-byte-at-a-time (one byte from each plane)
block copying; in this application, the latches are loaded with a read
from the source area and written unmodified to the destination area. The
latches can be written unmodified in one of two ways: By selecting write
mode 1 (for an example of this, see the last chapter), or by setting the
Bit Mask register to 0 so only the latched bits are written.
The latches can also be used to draw a fairly complex area fill
pattern, with a different bit pattern used to fill each plane. The
mechanism for this is as follows: First, generate the desired pattern
across all planes at any display memory address. Generating the pattern
requires a separate write operation for each plane, so that each plane’s
byte will be unique. Next, read that memory address to store the pattern
in the latches. The contents of the latches can now be written to memory
any number of times by using either write mode 1 or the bit mask, since
they will not change until a read is performed. If the fill pattern does
not require a different bit pattern for each plane—that is, if the pattern
is black and white—filling can be performed more easily by simply fanning
the CPU byte out to all four planes with write mode 0. The set/reset
registers can be used in conjunction with fanning out the data to support
a variety of two-color patterns. More on this in Chapter 25.
The sample program in Listing 24.1 fills the screen with horizontal
bars, then illustrates the operation of each of the four ALU logical
functions by writing a vertical 80-pixel-wide box filled with solid,
empty, and vertical and horizontal bar patterns over that background using
each of the functions in turn. When observing the output of the sample
program, it is important to remember that all four vertical boxes are
drawn with exactly the same code—only the logical function that is
in effect differs from box to box.
All graphics in the sample program are done in black-and-white by
writing to all planes, in order to show the operation of the ALUs most
clearly. Selective enabling of planes via the Map Mask register and/or
set/reset would produce color effects; in that case, the operation of the
logical functions must be evaluated on a plane-by-plane basis, since only
the enabled planes would be affected by each operation.
|