This document discusses how to get something on the screen with the Sega MegaDrive Visual Display Processor (VDP) after you've initialized the MegaDrive. It tries to keep it simple and thus does not discuss the use of the window plane or sprites but just plane A and plane B. You can download either the single firststeps.S or a ZIP file with all sources and the assembled ROM image.
While I do explain a few things about the VDP this is by no means a reference. I'm still learning so I don't claim that everything I write here is correct. That said, let's begin by explaining what the VDP is and does.
The VDP is a separate processor found in the Sega MegaDrive that is responsible for drawing to your TV set. Back when the MegaDrive was designed processors where quite slow. The MegaDrive's main processor, the Motorola M68000, runs at about 8 MHz. And RAM was very expensive as well. The MegaDrive only has 64kB work RAM, 64kB VDP RAM and 8kB sound RAM. With these limited resources it's just not possible to use a framebuffer or some other pixel based video unit like we're used today. Well, it would be possible but the M68000 would be busy with drawing and wouldn't have much resources left to do some real work like physics (e.g. gravity when jumping) and simulating the enemies.
To get workload off the M68000 there are several coprocessors, one of which is the VDP. It has its own RAM and works happily on its own without disturbing the M68000 so long as we don't ask it to do so (by triggering interrupts). Telling the VDP what it should do is done by reading/writing to two special memory locations, the VDP data port at memory location 0x00C00000 and the VDP control port at 0x00C00004. To write data into the VDP RAM we need to tell the VDP to point the data port at a certain location within the VDP RAM and then we write to the data port and what we write to it gets then written into the VDP RAM.
The VDP also has some registers which gets set via the data port as well. Those registers tell the VDP which resolution to use, where we want certain tables to be within the VDP RAM and other informations.
One of the most important registers is register 15 which is the autoincrement register. Imagine we want to write successively pieces of data into the VDP RAM. We would need to tell the VDP to point the data port at position X, write to the data port, then tell the VDP to point the data port at position X + 1, write to the data port, and so on. This is expensive, so there's a better solution: by setting the autoincrement register every time we write to the data port the address to which it points gets incremented by the value in the autoincrement register. Thus we would set the autoincrement register, point the VDP to the desired start address and then just write to the data port in a loop.
Another important concept of the VDP are planes and priorities. There are three planes: plane A, plane B and the window plane. Normally plane A is in front of plane B, and the window plane is in front of plane A. But patterns in these planes can get a high priority bit set in which case they are displayed in front of the next higher plane instead of behind it. So if we draw a pattern to plane B with high priority bit set that pattern will appear in front of plane A while all other patterns of plane B will be drawn behind plane A. I hope I didn't confuse you too much ;-)
But this document is intended to help you on your first steps, so if you need to learn more about the VDP (and you will if you like to program on the MegaDrive) you should read the Sega Technical Overview aka sega2.doc or some other VDP documentation like genvdp.txt. Just google them up.
So lets start and get dirty: I'm assuming you're using the init.S discussed in my Initializing the Sega MegaDrive document. So we don't worry about setting anything up here except some administrative stuff and some registers we'll be using for the rest of this source.
2: .global main
5: | MAIN
8: move.l #0x00C00000, %a4 | Throughout all my code I'll use A4
9: move.l #0x00C00004, %a5 | for the VDP data port and A5 for the
10: | VDP control port
First we're including a file variables.inc which defines the memory locations of some variables we're using like VarVsync. In line two we're telling the assembler to export the symbol main which is (you guessed it) our main routine. This .global statement is needed so that the linker can do its job: in init.S we're jumping to main when we're finished with initializing the MegaDrive. As main is not defined in init.S the linker is now searching for a global symbol main and we need to tell the assembler to export our main symbol here so the linker can find it (it won't use local symbols for linking).
In lines 8 and 9 the registers A4 and A5 get set to 0x00C00000 and 0x00C00004 which are the VDP data and control port. So whenever we want to write to the VDP control port we simply write to (%a5).
Next we're branching to a few subroutines that I'll explain in depth later on:
12: bsr LoadPalettes
13: bsr LoadPatterns
14: bsr FillPlaneA
15: bsr FillPlaneB
As the names of these subroutines suggest we'll first load the color palettes, then load some patterns and finally fill plane A and B.
A plane is filled with patterns. Patterns are 8x8 pixels in size where we specify the color for each pixel as an index in a palette. Each palette can hold 16 colors, so each byte in a pattern can represent two pixels. When we draw the pattern to a plane we specify whether to use the priority bit, whether to flip the pattern horizontal and/or vertical, which pattern to use and the palette to use. More on this in the sections on LoadPalettes, LoadPatterns and FillPlaneA.
We've loaded everything, the planes should already be visible. But this is too boring, we want the ROM to do something. We want to be able to move the plane A left/right with the joypad.
17: clr.l %d6 | Set D6 to 0
18: move.w #0x2000, %sr | Enable the interrupts
19: move.l #0x50000003, (%a5) | Point the VDP data port to the hori-
20: | zontal scroll table
21: move.w #0x8F00, (%a5) | Disable autoincrement
In line 17 we're clearing register D6. Line 18 sets the status register so that interrupts are now enabled. In init.S we've set up two interrupt service routines that increase the long words at the locations from VarVsync and VarHsync. More on this in the section about WaitVsync.
Line 19 points the data port to the horizontal scroll table. The very first long word in this table is the amount of pixels by which to scroll plane A if we're in full scroll mode (which we are as we've set it in init.S).
Finally we have to disable the autoincrement (register 15) as we want each write to the data port to write to the first entry in the horizontal scroll table. If we'd set the autoincrement to a non-zero value then the next write to the data port would miss that first entry and the plane would simply not scroll.
The main loop
The next section is the main loop. It may be a bit hard to understand at first.
We'll first read the joypad values, increase or decrease D6 if either left or right is pressed and then set the scrolling value for plane A. At the end we'll wait for the vertical retrace and loop. When the retrace happens then the electron beam from your TV set has reached the lower right corner and is on its way back to the upper left corner. During this time a game has to make all drawing changes because otherwise you would see flickering objects (because it could then happen that an object is at position X, the electron beam has drawn half of the object, the object position changes to Y and the electron beam would draw the rest of the object at that new position; you'd see half an object at position X and half an object at position Y for a split second but you would notice it as flickering).
23: Loop: bsr ReadJoypad | Read joypad values into D7
25: btst #2, %d7 | Test whether for left button
26: beq 1f | If it's pressed branch to 1
28: btst #3, %d7 | Test whether for right
29: bne 3f | If it's not pressed branch to 3
31: addq.w #1, %d6 | Increase D6 by one (if right button
32: | is pressed)
33: bra 2f | Branch to 2
35: 1: subq.w #1, %d6 | Decrease D6 by one (if left button
36: | is pressed)
37: 2: move.w %d6, (%a4) | Write D6 into horizontal scroll table
39: 3: bsr WaitVsync | Wait for vertical retrace
40: bra Loop | Loop
Line 23 branches to the ReadJoypad subroutine which returns the current joypad state in the least significant byte of D7.
We test for bit 2 in D7 which will be 0 if the left button has been pressed. If it is zero then branch to 1 (line 35) which decrements D7 and then writes the D7 to the data port which points to the very first entry in the horizontal scroll table as described in the Last preparations section.
If bit 2 was not zero the button has not been pressed and so we end up in line 28 which tests for bit 3 which represents the right button. This time we test whether that bit is not zero, because if it is not zero we jump to 3 and thus skip the whole scrolling part.
If bit 3 is zero and thus the button is pressed then we increase D6 in line 31 and then jump to label 2 where we write D6 to the horizontal scroll table.
Finally we're waiting for the vertical retrace to occur. This is the normal timing method for most games and the cause of the problem that PAL versions of some MegaDrive games run slower than the NTSC versions: NTSC uses 60 half-frames per second, PAL uses 50. If we're synchronizing with the retrace we'll run 60 times a second through the main loop when using NTSC but only 50 times when using PAL. As soon as the retrace happens we jump back to the Loop label, looping forever.
This subroutine loads our palettes. We'll use two palettes. We could have used just one but I'd like to demonstrate the use of more than just one palette.
A color has the bit-format 0000 BBB0 GGG0 RRR0 where B is blue, G green and R red. This means we have three bits per color giving a total of 512 colors. The very first color in each palette is the transparent color. If you're using a color value of 0 in a pattern then the sprite is transparent at this pixel. This means you can only use 15 colors of a palette but we have to specify 16 colors nevertheless.
44: | LoadPalettes
47: move.l #0xC0000000, (%a5) | Point data port to CRAM
48: move.w #0x8F02, (%a5) | Set autoincrement (register 15) to 2
50: moveq #31, %d0 | We'll load 32 colors (2 palettes)
51: lea Palettes, %a0 | Load address of Palettes into A0
53: 1: move.w (%a0)+, (%a4) | Move word from palette into VDP data
54: | port and increment A0 by 2
55: dbra %d0, 1b | If D0 is not zero decrement and jump
56: | back to 1
58: rts | Return to caller
First we'll point the data port to the start of the color RAM area in line 47 and then set the autoincrement register to 2. This means that every time we write to the data port the address to which the data port points gets incremented by 2. Because every palette has 16 colors and we'd like to load 2 palettes which means we have to transfer 32 colors. To do so we load 31 into D0 (since counting is zero-based, 0-31) in line 50. In line 51 we load the address of our palettes data table into A0.
At line 53 we move a color value (which is a word in size) from our palette table into the VDP data port. A0 then gets incremented by two, pointing to the next color. Because we have set the autoincrement to 2 the address to which the VDP's data port points is incremented by 2 as well.
We then test whether D0 is zero in line 55. If it's not zero D0 get decremented by one and jumps back to label 1, otherwise we return to the caller which is our main routine.
This subroutines loads a few patterns which we'll display.
A pattern is 8x8 pixels in size. For each pixel we have to specify an color index number between 0 and 15. This means that every byte represents two pixels and thus four bytes (one long word) represent a complete line of a pattern, so we can define a pattern with eight long words. When we draw a pattern to a plane we can specify which palette to use for each pattern which allows one pattern to be used with different palettes.
We'll use four patterns: the first one is just using the color 15 all over the place. Because it gets the index 0 (as it's the first pattern in the pattern space and counting starts with 0) we'd see a red screen if we'd stop right after this subroutines. This is because in our palettes color 15 is red, and because we cleared the RAM with zeros in init.S both plane A and plane B would be filled with patterns #0 using palette #0. So the whole screen would be red.
But this is boring, so we load three additional patterns: pattern #1 is striped diagonally, pattern #2 just has a spot in the middle and pattern #3 is a box. As they all use non-red colors we then know we have done something wrong when we see a red square somewhere ;-)
61: | LoadPatterns
64: move.l #0x40000000, (%a5) | Point data port to start of VRAM
65: move.w #0x8F02, (%a5) | Set autoincrement (register 15) to 2
67: moveq #31, %d0 | We'll load 4 patterns, each 8 longs
68: | wide
69: lea Patterns, %a0 | Load address of Patterns into A0
71: 1: move.l (%a0)+, (%a4) | Move long word from patterns into VDP
72: | port and increment A0 by 4
73: dbra %d0, 1b | If D0 is not zero decrement and jump
74: | back to 1
76: rts | Return to caller
First we're pointing the VDP's data port to the start of the VRAM which is the start of the pattern space in line 64 and set autoincrement to 2 in line 65. We'll load four patterns, with each pattern being 8 long words in size this make 32 long words to copy. So we load 31 into D0 at line 67 and then load the address of the Patterns table into A0.
Line 71 might look a bit unexpected if you look close: we're copying a long word to the data port and advance A0 by four, but autoincrement is only set to 2. But this will be interpreted as two single word write accesses and thus the address to which the VDP data port points gets increased by four. We could as well have done moveq #63, %d0 in line 67 and then used move.w in line 71, it would have the same effect. However the version where a long gets written should be slightly faster (not much, but every cycle counts).
In line 73 we decrement D0 and loop back to label 1 unless D0 is not zero and return to the caller in line 76.
We've loaded our palettes and patterns, so we've now finally reached a point where we can put something on screen ! As noted before, plane A is in front of plane B (at least when all patterns are in low priority). We'll fill this plane with pattern #2 using palette #1 and place a single pattern #3 using palette #0 in the middle of the screen.
So this subroutine demonstrates quite a few things: we'll see how to draw a pattern with a certain palette number, how to fill a plane and then how to place a pattern at a certain location. Because there is much to explain I'll split the source for this routine.
79: | FillPlaneA
82: move.l #0x40000003, (%a5) | Point data port to 0xC000 in VRAM,
83: | which is the start address of plane A
84: move.w #0x8F02, (%a5) | Set autoincrement (register 15) to 2
86: move.w #0x2002, %d0 | We'll use palette #1 and pattern #2,
87: | don't flip the pattern and set it to
88: | low priority.
89: moveq #27, %d1 | The screen is 28 cells high
91: 1: moveq #63, %d2 | One line is 64 cells wide
93: 2: move.w %d0, (%a4) | Move our pattern data into the plane
94: dbra %d2, 2b | Loop back to 2
96: dbra %d1, 1b | Loop back to 1
Now this is our biggest subroutine. But don't worry, that's only because I've written the code for placing pattern #3 in the middle of the plane very verbosely (lines 98 to 108), I could have written that in just two lines when precalculating the value but then it would be harder to understand.
So let's start explaining this subroutine: first we'll point the VDP's data port to the start of plane A (which we defined to start at address 0xC000 in init.S) in line 82. We then set the autoincrement to 2 in line 84 and then move the value 0x2002 into D0. This value will get interpreted as "pattern #2 using palette #1, no flipping and no priority". I'll show you what the bits of this word mean:
If bit 15 (pri) is set the pattern is displayed with high priority. The two bits 13 and 14 (cp0 and cp1) choose the palette to use (as you can see we can only use a maximum of 4 palettes per plane or even frame). Bits 11 (hf) and 12 (vf) flip the pattern horizontally or vertically if set and the bits 0-10 specify the pattern number in the pattern space.
So when we like to use pattern #2 we have to set the lower byte to 0x02: 0x??02. And we like to use palette #1 so we have to set cp0 which means we set the upper byte to 0x20, giving us 0x2002.
We'd like to fill the entire plane, and the plane is 28 cells high which is why we D1 to 27 in line 89. However we chose to have the planes 40 cells wide. But this is only what appears on screen, internally a row is 64 cells wide (because we've set the scroll plane to be 64x32 ?) which is why we have set D2 to 63 in line 91.
Line 93 now finally moves the content of D0 (0x2002) to the data port. So now we've just drawn a pattern to plane A ! A single pattern is not enough, so we loop back to label 2 in line 94 until D2 is zero and thus a whole line has been drawn and then we loop back to label 1 in line 96 until D1 is zero as well and thus 28 lines have been drawn.
99: moveq #13, %d1 | We want to draw in line 13...
100: mulu.w #64, %d1 | ... and a line is 64 cells wide...
101: addi.w #20, %d1 | ... and we want to draw in column 20
102: rol.l #1, %d1 | Equivalent to multiply by 2
103: swap %d1 | Swap words in D1
104: or.l %d1, %d0 | Add it to D0
106: move.l %d0, (%a5) | Point the VDP to line 13, column 20
107: | in plane A
108: move.w #0x0003, (%a4) | Display pattern #3 with palette #0
110: rts | Return to caller
Now that we've filled the plane we'd like to draw a single pattern in the middle of the (visible part of the) plane, which is at coordinate (20, 13). To do so we must point the VDP's data port at the correct point in VRAM. We first write the bit pattern that points the VDP to 0xC000 (plane A's start address) into D0 at line 98 and then modify this pattern to point to the correct offset in the plane.
In line 99 we move the value 13 into D1, because this is the row in which we'd like to draw. Because every row is 64 cells wide we have to multiply D1 by 64. We'd then have the offset of the start of row 13, so we add 20 to D1 in line 101 and now have the offset of (20, 13). Well, not really, because that number has to be multiplied by 2 since every entry in the plane is a word and thus 2 bytes long. Shifting/rotating the register to the left by one bit is equivalent to multiplying by two but is normally executed faster than a real multiply (at least it's the case on the x86 plattform and I think it's the same on the M68000 as well).
So now we've got the correct offset but we have to merge that offset into the bit pattern that we have to move into the VDP control port. To be able to do so we first have to swap the two words in D1 (we could as have rotated D1 by 16 bits but that would be two instructions) in line 103. We can then do a logical or, joining the two registers D1 and D0 and we get the correct bit pattern in D0 in line 104.
We then move our calculated bit pattern into the VDP control port to point the data port to the middle of the visible part of plane A. We could precalculated the value, and instead of lines 98 - 106 just write move.l #0x46A80003, (%a5) but I liked to show how to calculate the value dynamically.
Finally we write 0x0003 to the VDP data port to draw pattern #3 with palette #0 in line 108 and the return to the caller.
Well, if you've understood what FillPlaneA does then there is no need to explain this subroutine. It's just like FillPlaneA with only three differences: we're using plane B which is at 0xE000 in VRAM, we draw pattern #1 with palette #0 and we're not calculating the offset of a single pattern and then drawing a single pattern at the end.
113: | FillPlaneB
116: move.l #0x60000003, (%a5) | Point data port to 0xE000 in VRAM,
117: | which is the start address of plane A
118: move.w #0x8F02, (%a5) | Set autoincrement (register 15) to 2
120: move.w #0x0001, %d0 | We'll use palette #0 and pattern #1,
121: | don't flip the pattern and set it to
122: | low priority.
123: moveq #27, %d1 | The screen is 28 cells high
125: 1: moveq #63, %d2 | One line is 40 cells wide
127: 2: move.w %d0, (%a4) | Move our pattern data into the plane
128: dbra %d2, 2b | Loop back to 2
130: dbra %d1, 1b | Loop back to 1
132: rts | Return to caller
Since we want our plane A to scroll with our joypad we must read its state. Unfortunately it's not enough to just read its value as the buttons are multiplexed. The joypads report two different data sets back (three if it's a six-button joypad but we'll just support three-button joypads here to make it easier). So we first have to tell the joystick "I'd like data set #1", read the data, tell the joypad "now data set #2, please" and read the data. To make it more convenient for us I'll join these data so that we get the complete state in the last byte of register D7.
135: | Read joypad 1
137: | Returns the joypad values in the last byte of D7 with the following layout:
138: | SACBRLDU (Start A C B Right Left Down Up)
141: move.l #0x00A10003, %a0 | Joypad 1 is at 0x00A10003
143: move.b #0x40, (%a0) | Set TH to high
144: nop | Wait for the bus to synchronize
145: move.b (%a0), %d7 | Read status into D0
147: andi.b #0x3F, %d7 | D7.b = 00CBRLDU
149: move.b #0x00, (%a0) | Set TH to low
150: nop | Wait for the bus to synchronize
151: move.b (%a0), %d0 | Read status into D0
152: | D0.b = ?0SA00DU
154: rol #2, %d0 | D0.b = SA00DU??
155: andi.b #0xC0, %d0 | D0.b = SA000000
156: or.b %d0, %d7 | D7.b = SACBRLDU
158: rts | Return to caller
In line 141 we move the memory address of joypad 1 into A0. Then we write 0x40 to the joypad in line 143 which sets the so-called TH line to high. This tells the joypad "give me data set #1, please". I'm not sure whether the nop is necessary afterwards but it doesn't hurt as it's synchronizing the bus and gives the joypad time to react. We then read a byte from the joypad in line 145 into D7. It already has all the informations we want except for the start button and A button.
We mask off the first two bits of the last byte of D7 by and'ing it with 0x3F in line 147. This simply means that we set the bits 6 and 7 to 0 without affecting the other bits.
Then we write 0x00 to the joypad, which sets TH to low and tells the joypad "and now data set #2, please". Another nop for synchronizing the bus and then we read another byte from the joypad into D0. The only thing we're interested in are the bits for the start and A buttons. So we rotate the byte two bits to the left in line 154. Now we have the bits for the start and A buttons at positions 6 and 7 in the byte. In line 155 we mask off all other bits and then combine D0 and D7 with a logical or in line 156, resulting in a byte that has all the joypad values in one byte.
Line 158 just returns to the caller.
This is a very simple but important routine: it waits for the vertical retrace. The vertical retrace is when the electron beam of your TV set has reached the lower right corner and is then on its way back to the upper left corner. During this time a game should modify the planes and sprites to avoid flickering.
161: | WaitVsync
164: move.l (VarVsync), %d0 | Read value from VarVsync into D0
166: 1: move.l (VarVsync), %d1 | Read value from VarVsync into D1
167: cmp.l %d0, %d1 | Compare D0 and D1
168: beq 1b | If result is 0 the value has not been changed
169: | so jump back to 1
171: rts | Return to caller
We first read the long word at the memory address specified by VarVsync (which is defined in variables.inc) into D0. The we do the same thing again in line 166 but read the long word into D1. Line 167 compares D0 and D1 and line 168 jumps back to label 1 if D0 and D1 are equal. This means we're looping until the value in VarVsync has changed, which happens when the interrupt service routine InterruptVBlank in init.S gets called which in turn gets called by the M68000 when the VDP triggers the vertical retrace interrupt.
So when a retrace happens VarVsync gets changed and we loop until this change happens. We then return to the caller.
For completeness, here are the data tables for our palettes.
175: | Data
178: .word 0x0000 | Color 0 is always transparent
179: .word 0x00EE | Yellow
180: .word 0x0E00 | Blue
181: .word 0x000E | Red
182: .word 0x000E | Red
183: .word 0x000E | Red
184: .word 0x000E | Red
185: .word 0x000E | Red
186: .word 0x000E | Red
187: .word 0x000E | Red
188: .word 0x000E | Red
189: .word 0x000E | Red
190: .word 0x000E | Red
191: .word 0x000E | Red
192: .word 0x000E | Red
193: .word 0x000E | Red
195: .word 0x0000 | Color 0 is always transparent
196: .word 0x0000 | Black
197: .word 0x0EEE | White
198: .word 0x000E | Red
199: .word 0x000E | Red
200: .word 0x000E | Red
201: .word 0x000E | Red
202: .word 0x000E | Red
203: .word 0x000E | Red
204: .word 0x000E | Red
205: .word 0x000E | Red
206: .word 0x000E | Red
207: .word 0x000E | Red
208: .word 0x000E | Red
209: .word 0x000E | Red
210: .word 0x000E | Red
And here are the patterns we're using.
213: .long 0xFFFFFFFF
214: .long 0xFFFFFFFF
215: .long 0xFFFFFFFF
216: .long 0xFFFFFFFF
217: .long 0xFFFFFFFF
218: .long 0xFFFFFFFF
219: .long 0xFFFFFFFF
220: .long 0xFFFFFFFF
222: .long 0x22111111
223: .long 0x12211111
224: .long 0x11221111
225: .long 0x11122111
226: .long 0x11112211
227: .long 0x11111221
228: .long 0x11111122
229: .long 0x21111112
231: .long 0x00000000
232: .long 0x00000000
233: .long 0x00000000
234: .long 0x00011000
235: .long 0x00011000
236: .long 0x00000000
237: .long 0x00000000
238: .long 0x00000000
240: .long 0x22222222
241: .long 0x21111112
242: .long 0x21222212
243: .long 0x21200212
244: .long 0x21200212
245: .long 0x21222212
246: .long 0x21111112
247: .long 0x22222222
I hope this little commented code gives the beginner (like I still am) a little insight about how the VDP works. There are lots of things to experiment with here already. As I'm myself still a beginner in programming the MegaDrive I surely made some errors in this document. If you found some or have some questions, please mail me.