by Jim BrainDoes your mind go blank when you hear about the SuperCPU? With all the
mention of it in magazines and newsletters, are you left wondering how
much of the discussion is hype and how much is true? Are you worried
that this latest attempt is just another design destined for failure
like the others before it? Well, if so, then you're not alone. With
the reputation accelerator cartridges and their manufacturers have
acquired over the years, you are wise to be concerned. Judge for
yourself, as we peer under the hood of the Creative Micro Designs
SuperCPU accelerator cartridges.
Note: The information contained in this article has been gleaned from
talks with CMD, Mr. Charlie Christianson's post to comp.sys.cbm,
responses to USENET posts by Mr. Doug Cotton, and information from Commodore
World Issue #12. While general information is not likely to change,
some details discussed in this article may differ slightly from those
incorporated in the final product.
EditWhat's An Accelerator?
Did you know a Commodore 64 CPU executes things at 1 MHz? A tiny clock
inside the 64 ticks off 1 million "cycles" per second, and instructs
the CPU to move forward one cycle at a time. The CPU, in turn,
either executes an internal operation, reads from memory, or writes
to memory during that cycle. These operations are concatenated to
form funtions, which is the smallest piece of work a programmer can
ask the CPU to perform. These function are called instruction, and
take an average of 3 cycles each to perform. So, the typical C64
CPU does 333,333 things a second. The C128 fares a bit better, as it
can run twice as fast when in "fast" mode. In either case, there is
an upper bound on the amount of useful work each CPU can do in a
amount of time.
An accelerator increases that amount of work done by substituting a
faster CPU and clock speed for the 1 MHz 64 CPU. The ratio of
increase should be as easy to determine as dividing the new clock
frequency by 1 MHz for a 64. If this were true, an accelerator that
runs at 4 Mhz would execute things at 4 times the speed of a stock 64.
Sadly, this is not true, since not all parts of the system can be sped
up to the higher frequency. So, the accelerator runs at full speed while
it utilizes ICs designed for the faster clock speed, and slows down when
it must "talk" with ICs like the SID and VIC-II in the 64, which run only
at the slow 1 MHz clock speed.
Most accelerators are produced as large cartridges that plug into the
expansion port of the computer system. Some require special wires be
attached to internal components, while others do not.
EditThe New Kid on the Block
In mid 1995, Creative Micro Designs, after having evaluated the FLASH 8
accelerator from Europe with only mild success, noted that there might
possibly be a market for a speedy accelerator that would run GEOS and
other useful applications in the USA. After surveying the readership
of Commodore World, the Internet, and FIDONet, CMD decided that interest
in such a unit was forthcoming. Shortly thereafter, the SuperCPU
announcement was made.
As development work ensued, progress reports and preliminary information
about the product surfaced from CMD. The first items involved the processor
choice, which was originally the 65C02S but is now its bigger brother, the
16 bit 65C816S. Another piece of information involved the case, which is
an enclosure 6" wide by 2" deep by 3" wide. This enclosire contains
a circuit board protruding from the front of the unit that will plug into
the Commodore 64 or 128 expansion port. In back, a complementary card
edge connector is provided to pass signals through the cartridge. This
will allow users to attach other expansion port cartidges to the
system. On top sit three switches, described below.
The first switch enables or disables the SuperCPU unit. The second switch
enables or disables JiffyDOS, which is built into the unit. The third
switch determines the speed of the unit. This third switch has three
positions. The first position forces the accelerator to operate at 1
MHz speed (the same speed as the stock C64). The second position allows
the programmer to change the speed via a register in the SuperCPU memory
map. The third position locks the SuperCPU into 20 MHz mode, regardless
of register settings.
The use of the CMD SuperCPU will be straightforward. Simply plug the
unit into the expansion port, set the appropriate switches on the top of
the unit, and powering on the unit.
EditTechnical Details
The basic system utilizes a WDC W65C816S 16 bit microprocessor running at
20 MHz. This CPU can not only fully emulate a CMOS 6502, it can be
switched into "native" mode which allows access to 16 bit registers and
16 megabytes of RAM without bank switching, DMA, or paging.
Attached to the CPU is a bank of 64 kilobytes of Read Only Memory (ROM)
and 128 bilobytes of high speed static RAM (SRAM). The extra RAM above
64 kB is used to "mirror" the contents of the slower ROM. See below for
details.
A number of features designed to maximize the performance of the
SuperCPU are being developed into the unit. Since the late 1980's
ROM speeds have not been able to keep pace with CPU clock frequencies.
With the CMD accelerator moving into the frequency range of newer
PC systems, this becomes a problem for the SuperCPU as well. The
Commodore typically stores its KERNAL and BASIC code in ROMS, and the
SuperCPU will need to read that code. The easiest solution is to read
the stock ROMs in the computer, but those ICs can only be accessed
at 1 MHz (they are part of that set of older ICs that cannot be utilized
at 20 MHz). So, the next option is to copy that code into faster ROMs
and instal those ROMs int the cartridge. Well, as stated earlier,
ROMs of sufficient speed are very expensive and not widely available.
So, the third option, which is the one CMD will use, is to copy the
KERNAL and BASIC at startup to RAM and write protect the RAM area, making
it look like ROM. Fast static RAM (SRAM) is available to meet the
20 MHz clock requirements, and is not terribly expensive, as most new
PC systems use the same memory for similar uses. This technique is
called ROM shadowing and has been utilized for a few years in the IBM
PC community.
The heart of the unit is the Altera Complex Programmable Logic Device
(CPLD). Analogous to electonic "glue", this single chip can replace
ten or hundreds of discrete ICs in circuits. This unit is responsible
for decoding the complex series of signals presented in the expansion
port, handling DMA requests to an REU unit, emulating the specialize
I/O port found at locations $00 and $01 on the 6510 CPU, and handling
the synchronization of the SuperCPU memory and C64 memory.
One item that has plagued accelerator designers for years and minimized
the widespread acceptance of accelerators invoves this RAM sync operation
the Altera CPLD handles. In areas of the stock C64 memory map where
only RAM is present, like $0002 - $40959, the synchronization of
memory can be handled very easily. However, when dealing with areas
like $d000, where RAM AND IO can be present, the situation becomes more
complex. The SuperCPU overcomes this problem as well, which is important
since many video applications use the RAM under IO at $d000 for graphics
or text.
As the VIC-II IC in the C64 and C128 requires that screen information be
present in on-board memory, memory "mirriring" is necessary. However,
CMD has introduced two new technologies, called WriteSmart(tm) and
CacheWrite(tm) to reduce the slowdown associated with mirroring the
SuperCPU SRAM and the slower on-board DRAM. According to documentation,
WriteSmart allows the programmer to decide which portions of memory need
mirroring. The four selections include "BASIC", where only text and
color memory are mirrored, "GEOS", where GEOS foreground bitmap and color
memory are mirrored, "ALL", where all 64 kB of RAM is mirrored, and
"NONE", where the SuperCPU does not attempt to syncronize memory contents
between the two RAM areas.
The other technology, called CacheWrite(tm), minimizes the effect of
this mirroring. When storing a value into SuperCPU RAM in a range of
RAM that requires mirroring, the value is stored not only in SuperCPU
RAM, but also into a special cache memory location. The SuperCPU is
allowed to continue processing, while the system waits for the on board
DRAM to acknowledge readiness to store a value. When successive stores
to mirror ranges are done, the system must slow down, but can still
operate at about 4 MHz. This speed is achieved because the SuperCPU need
not wait for the value to be successfully stored before it attempts to
fetch the next opcode and operand. Since opcodes that write value to
memory avarage 4 cycles to complete, the SuperCPU can effectively do 4
cycles worth of processing in 1 period of the 1 MHz clock. Note that
this slowdown does not occur if the cache is not full when a store
instruction is executed.
EditFeatures
Being a CMD product, the CMD SuperCPU comes with JiffyDOS, CMD's
flagship speed enhancement routines, installed. However, JiffyDOS
can be switched out for those applications that fail to run with this
serial bus enhancement functionality.
The unit also features compatibility with RAMLink, CMD's RAM drive unit.
As the RAMLink fucntions by sharing the CPU with the computer system and
runs a special set of instructions called RL-DOS, the SuperCPU contains its
own version of RL-DOS optimized to take advantage of the speed and extra
features available in the 65C816S. Preliminary information suggests that
RAMLink data retrieval, typicially much slower the REU data retrieval,
will now operate at speeds approaching that of the REU. In addition, the
on-baord RL-DOS will handle usage of the special parallel CMD HD drive
cable available with the RAMLink.
For those with expansion in mind, CMD has incorporated a special
expansion port internal to the unit. The port, called the "Rocket
Socket", will allow access to the complete signal set from the W65C816S
CPU and possibly other support ICs. This will allow developers to
produce peripheral cards for the unit containing hardware that will run
at 20 MHz (The cartridge port will still be limited to slow speed).
EditMyths About the Unit
In the early phases of development, CMD hinted that possibly extra RAM
installed in the unit could be used as a fast RAM disk, a la RAMLink.
However, the inability to battery back up that RAM area, coupled with the
small increase in speed gained form doing so and the lengthy development
time needed to realize this feature, has prompted CMD to abandon this
idea for the time being. Later in the development cycle, such an idea
might resurface, but the feature is most likely never to be implemented.
Also, early information about the units noted that two speed options would
be available, but low support for the slower 10 MHz model prompted CMD to
discontinue development on that version. As of now, there is only one
speed option available: 20 MHz.
When CMD first announced the unit to the public, it was to include the
Western Design Center W65C02S microprocessor. However, in late 1995/early
1996, CMD opted to switch from that CPU to its bigger brother, the W65C816
16 bit CPU, owing to small increase in per item cost, more flexibility, and
more expansion options.
Although the speed of the CPU in the SuperCPU unit is running at 20 MHz,
that does not imply all operations will occur twenty times faster. Some
operations, like reads from I/O ICs, derial bus operation, and mirroring
of video memory, require the CPU to slow down temporarily. This will
reduce the effective speed to about 17-18 MHz.
EditCompatibility Issues
All legal 6502/6510/8502 opcodes are supported in the accelerator.
Undocumented or "illegal" opcodes are not supported and will fail.
Although not a compatibility issue, some applications that rely on the
CPU running at a certain speed to correctly time events will most likely
fail or operate too quickly to be useful. Event or interrupt driven
code should operate correctly.
The SuperCPU 64 model will operate correctly with any C64 or C64C model
of computer system, as well as with any C128 or C128D in 64 mode. However,
CMD has recently announced a 128 native version of the cartridge.
EditSuper128CPU
In early 1996, CMD announced that interest was compelling and that would
begin development on a 128 version of the SuperCPU. As a result of this
announcement, the ship date was moved from Februarty to April as CMD
validated the SuperCPU design so that it could be used to manufacture
both the SuperCPU 64 and SuperCPU 128. Both units will operate at a
maximum of 20 MHz, and will most likely be packaged in the same enclosure.
The SuperCPU 128 will operate in both 64 mode and native 128 mode. It
will not enhance CP/M mode on the C128. CMD announced that the
availability of this unit would be Auguest or September ot 1996. As far
as cost is concerned, a current estimate falls at $300.00, and advance
orders are being taken with a security deposit of US$50.00 needed to
place an advance order.
As this announcement was made, some confusion has resulted in the naming
scheme. Previously called the SuperCPU or SuperCPU 64/20 (64 model at
20 MHz), the new models are referred to as alternately:
128 model 64 model
Super128CPU Super64CPU
SuperCPU 128/20 SuperCPU 64/20
EditPrototype Testing and Benchmarks
As no developer unit have shipped as of this date, CMD has the sole unit
availabel for be testing and benchmarks. CMD's prototype unit consists
of a handwired unit on perfboard. At first, CMD was hesitant that the
prototype would actually run at 20 MHz, since such designs are not
"clean" and can suffer from eignal degradation, signal skew, and
crosstalk, which inhibits operation at higher frequencies. So, with
that in mind, early tests were done at 4 MHz. CMD reported in late
Fenbruary 1996 that the prototype had been ramped up to 20 MHz and was
operating correctly. In fact, the unit appears to run faster than it
can, illustrated by the following example:
CMD tested the following program at 1 MHz on a Commodore 64
10 TI$="000000"
20 FORI=1TO10000:NEXT
30 PRINTTI
The result from this test was 660. After enabling the unit, the test was
rerun and the result printed out again: 31.
Quick calculations by the CMD personnel verified that the unit was
executing this program 21.29 times the normal speed. However, that
is impossible, as the CPU is only clocked 20 times the nortmal speed.
The supposed impossibility is explained if you delve deeper into the
timing of the 64. As many know, the VIC-II "steals" cycles from the CPU
in order to refresh the VIC-II video screen. Extra cycles are "stolen"
for sprites. With the SuoperCPU disabled, the above code runs at 1 MHz
minus the amount of time the VIC-II "steals" from the CPU. With the
SuperCPU enabled, the VIC-II does not "steal" cycles from the unit, as
the accelerator uses it own private memory area for operation. The VIC,
meanwhile, uses the on-board C64 memory.
CMD notes that games that use timers or are event driven function
correctly, but hotse that count processor cycles or utilize spin-wait
loops run so quickly as to be virtually unusable.
Of partiular note to Commodore Hacking readers is the test done with the
object code for the Polygonamy (Reference: polygon) article elsewhere in
this issue. On a stock 64, the program renderes approximately 12-13
frames per second. With the SuperCPU enabled, the frame rate jumped to 128
fps. CMD notes that further gains might be realized if the code was
modified to cooperate more fully with the CupserCPU memory scheme.
As for Ram Expansion Unit compatibility, CMD responds that the issues
have been tackled and that DMA operation is available on the SuperCPU
unit. In adiition, CMD notes that the CPU need not be running at 1 MHz
to initiate a DMA transfer.
As stated from the beginning, the 64 model of the SuperCPU accelerator
wil work on the Commodore 128 in 64 mode, and test have confirmed that
the prototype 64 model does indeed frunction correctly any the C128 and
C128D.
EditConclusion
While it is too early to determine the success of the CMD SuperCPU
product, the company has a reputation for delivering stable products
packed with features. While no accelerator can guarantee 100%
compatibility with all Commodore software, the CMD offering should provide
the best compatibility options thus far, due to its solutions to
RAM synchronization problems that have plagued accelerator designers for
years. The fact that CMD also owns the marketing rights to the GEOS
family of software products and manufacturers a wide variety of
successful mass media storage devices bodes well for compatibility with
those applications and peripherals.
EditFor More Information
TO find out more about the CMD SuperCPU family of accelerators, contact
CMD at the following address of via email:
Creative Micro Designs, Inc.
P.O. Box 646
East Longmeadow, MA 01028-0646
(413) 525-0023 (Information)
(800) 638-3263 (Ordering only)
cmd.sales@the-spa.com (Internet Contact for Sales)
Advance orders are being taken for all units, and the cost to place an
advance order is $50.00.
For programmers, CMD is planning to make available a Developer's Package,
which will help those wanting to exploit the potential of the new unit to
achieve success. A W65C816S assembler supporting all the new opcodes and
addressing modes will be provided, as will documentation pertaining to the
unit, the CPU, and its capabilities.