Re-hacking the 80’s-part 8

Back from long ago

previous part

What, why?

animation of in-game pirate

This is part 8 in a blog series where I retrace my short Commodore64 (C64) hacking career from 30 years ago and try to improve upon it.
The previous part was put up 20 months ago! What happened? Mostly the Forum64 Protovision Game competition 2017 (F64PGC).

I had an idea for an intro to add to the cracked game; it was an intro where the screen is built by only changing the border color. While working on it I realised it could be used for making a game. It was hard work and motivation wasn’t always there but, that all changed when the F64PGC came. The theme was ‘sports’ which fitted the game I was making. This gave me my motivation to finish it.
But all the while the Willow Pattern Rehack Project stalled. I still wanted to use the same technique for the intro, but didn’t want to ruin the novelty (1) of my game by blogging about the techniques used, before release of the game.

The game: Kung Fu Pixel

Download link

The game didn’t get very good reviews
and didn’t fare well in the competition (although this nice
German Let’s Play video of the compo entries doesn’t sound negative
(3)) but I’m still pleased with the techniques I used.

It’s basically ‘racing the beam’: changing the border color value while the video chip is writing to the screen. If you time it right, you can write lines of a single color that are 32 C64-pixels long. So maximum horizontal resolution is ~400/32 ~> 12 pixels (the last pixel is a bit wider).

The vertical resolution became 21 pixels because of several reasons.

  • It makes the screen resolution the same as a C64 multicolor sprite.
  • I had to use zero page to make the code fast enough. I use one zero page location per pixel, which means I can use 254 pixels at most (location 0 and 1 are not usable). 252 is a multiple of 12 so, that is the maximum.
  • Every line drawn has to be alternated with a blank line where the screen routine is rewritten. Every line has to be rewritten in the same of preferably less time than drawing. I could only make that work by writing several lines for every pixel. Making a ‘pixel’ out of more lines would mean rewriting would take longer. Realistically (meaning having enough time left to do anything for a game) that would make it almost impossible.

Rewriting

Usually the GPU takes care of drawing the screen, the main program ‘just’ has point the GPU to the right part of memory. But for my game the processor is constantly telling the GPU what to do ( ‘RED! RED! BLUE! BLUE! RED! LIGHT-GREY!’). Those instructions have to be rewritten if you want changes on screen, but with 252 out of 304 screen lines there is nowhere near enough time to rewrite all those instructions every frame.
My solution for that problem was rewriting during every even line. The resulting line would be blank, but I thought that would would still result in something that could pass for graphics (opinions differ on that).

The resulting drawing (pseudo) code would look something like this:

repeat 6x { //(height of 1 pixel) 
  drawline:
    // (y register contains background color during all of screen drawing and is never changed)
    lda color1_for_row
    ldx color2_for_row
    repeat 12x {
           sta/stx/sty/sax $d020
    }
    sty $d020 (to set the border color to the blank line color)
    // the sta/stx/sty/sax instruction would be the part rewritten every frame

  rewrite:
    repeat 2x {
      lda zeropage_location__current_pixel
      calculate_location_to_be_rewritten()
      repeat 6x {
        sta location_to_be_rewritten
      }
    }  
//there is now enough time to rewrite 2 lines of a 6 lines-pixel
   calculate_part_pixel()
   calculate_part_pixel_location_to_be_rewritten()
   lda zeropage_part_pixel
   sta part_pixel_part_1
   sta part_pixel_part_2

The drawing routine is hard coded (speed-code, no looping to optimize execution speed), but I would never have been able to do this by hand, I used macro’s and other tools to create the code for me.

I ‘wrote’ the base of the screen drawing routine above in my head and on paper during a train journey, which was one my most satisfactory programming experiences I ever had. After several tweaks and trimmings, the rewriting part had become so efficient, that it took less time to rewrite a line then drawing a line.

Sounds good?

With 18 screen lines worth of processing time left I decided to put the music routine ‘in those lines’.
The rationale of that decision seemed good, because I knew that music was something that could be going on, during all phases of the game (intro, menu, pre-game, game, post-game), so it could be firmly wedged into the rest of the code without having to be changed. Just change some pointers to change the tune, right?
But the routine would have to be contained in 18 parts that would always have to take exactly 70 cycles. And it would have to be flexible enough, to handle all kinds of musical events, like play modular parts, tremolo, changes of ‘instruments’, pulse sweep etc.
It gave me a lot of headaches and without the C64 Debugger I never would have managed to do it.

The resulting code consisted of three mostly equal pieces of code for every voice. Every voice could do these things:

  • Check time till next event and optionally set up new event
  • handle slide or vibrato event
  • handle new note event or no-note event
  • handle note-off tied-note and vibrato event (2 parts)
  • handle new module event (possibly got first module of song) (2 parts)
  • handle new instrument event (2 parts)
  • do cleanup after handling event
  • execute vibrato
  • execute pulse vibrato

Example code, I used a macro to repeat the code for every voice

//==============================================
//				VIBRATO EXECUTE
//==============================================
	//this part takes care of vibrato or slide
	//this routine is always done, but with values of zero if there is no vibrato or slide
//  dovibrato:		
.macro vibrato(){
		
		ldx vibrcount + voice	//4
		//set code to adc/clc  or sbc/sec depending on note going up or down
dv4:	        lda vibcodetable1,x		//5 !! SHOULD NOT CROSS PAGE BOUNDARY !!
		sta dv2				//4
		sta dv3				//4
dv5:	        lda vibcodetable2,x		//5 !! SHOULD NOT CROSS PAGE BOUNDARY !!
		sta dv1				//4
		lda freqlo + voice		//4
dv1:	        clc				//2
dv2:	        adc #0				//2  
		sta $d400 + [voice*7]	        //4
		sta freqlo + voice 		//4
		lda freqhi +voice		//4
	
dv3:	        adc #0			        //2		
		sta freqhi +voice		//4
		sta $d401  +[voice*7]	        //4
// switching these to NOP can change the speed of vibrato or slide, switching all to NOP will 'lock' routine to ADC or SBC value  
// thereby basically creating a slide   		
		inx						//2
		inx						//2    switch inx for nop to make vibrato at half speed
		inx						//2   or switch this nop for inx for faster vibrato
		stx vibrcount + voice  	                        //4
		nop
}

I had the number of cycles in the comments of every line to see if I would end up with exactly the right amount of cycles.
There were also restrictions on the data being read, because some instructions take a clock cycle longer if the address crosses a page boundary (a page is 256 bytes), which would screw up the absolute synchronicity I needed.

I used self modifying code to be able to make the same code do different things, like using the same part for slide (up and down) and vibrato and make the amount variable.

This made it possible to have one piece of code executed every frame which would execute vibrato, slide or none:

//==============================================
//				VIBRATO EXECUTE
//==============================================
	//this part takes care of vibrato or slide
	//this routine is always done, but with values of zero if there is no vibrato or slide
//  dovibrato:		
.macro vibrato(){
		
		ldx vibrcount + voice	//4
		//set code to adc/clc  or sbc/sec depending on note going up or down
dv4:	lda vibcodetable1,x		//5 !! SHOULD NOT CROSS PAGE BOUNDARY !!
		sta dv2					//4
		sta dv3					//4
dv5:	lda vibcodetable2,x		//5 !! SHOULD NOT CROSS PAGE BOUNDARY !!
		sta dv1					//4
		
		lda freqlo + voice		//4
dv1:	clc						//2
dv2:	adc #0					//2  
		sta $d400 + [voice*7]	//4
		sta freqlo + voice 		//4
		lda freqhi +voice		//4
	
dv3:	adc #0					//2		
		sta freqhi +voice		//4
		sta $d401  +[voice*7]	//4
// switching these to NOP can change the speed of vibrato or slide, switching all to NOP will 'lock' routine to ADC or SBC value  
// thereby basically creating a slide   		
		inx						//2
		inx						//2    switch inx for nop to make vibrato at half speed
		inx						//2   or switch this nop for inx for faster vibrato
		stx vibrcount + voice  	//4
		nop

}

The music player was the first player I have ever written and it’s not
very pretty nor good, but it is the part that took me the longest to write.

Gameplay

I chose a Fighting game, not because I like that type of game, but because it was the type of game I thought I would be able to pull of in a 12×21 pixel resolution.
I suck at those type of games and my usual strategy is spamming the move I find to be the most succesful.
I wanted the game to prevent that kind of playing, trying to put some more strategy into it.

Firstly I wanted to put energy into the equation. Spamming attacking moves would deplete your energy, which you need for attacks and jumps.
Secondly I added defensive moves that would not only block an attack but with the right defensive move could freeze the opponent thereby giving a window of opportunity for a counter attack.
Thirdly I wanted to prevent very defensive playing by adding penalties when not having attacked for a certain amount of time.

AI

The competition had the requirement to have a one player option, so I had to include some kind of AI.

The easiest way is a randomly acting AI. Actually it didn’t even work that bad (maybe even better than the end result) but I wanted the computer to be a bit smarter.
Quite a challenge considering I hardly have any experience in writing an AI and doing it for a C64 game with limited processing time was a challenge.
I do like the system I came up with (5): if the computer is not executing a move it will chose a random move from a pool of possible moves. Depending on the circumstances some moves would have more or less occurences in the pool, therby increasing the chance of executing that move. Filling the pool with possible moves would be done in one frame, chosing and executing in the next to minimize the time needed for this system.

It worked by having a number (starting with a zero) for every possible behavior (attack, defend, retreat, advance) and then adding something to that number depending on cirumstances. Adding 128 to that number would make it negative (4), which would veto that behavior making it easy to rule it out (for instance ruling out attacking when energy is zero). Any other circumstance determined later on could add to that number, but it would remain negative so was still ruled out.
Then the pool would be filled with occurences based on the number (if not negative).

Example code of part of the ‘behavior pooling’

//case: further than one pixel away		
furtherThanOnePix:		
 		mov #$80:attackBehaviorCount	//too far away for attack or defend
 		mov #$80:advanceAndAttackBehaviorCount
		mov #$80:defendBehaviorCount
		lda pos1
		bne !+	
		mov #$80:retreatBehaviorCount	//backup not possible
		lda passivityCount1		
		cmp passivityCount2		
		blt	!+	
		mov #$80:passiveBehaviorCount
		mov #$80:retreatBehaviorCount
!:		inc passiveBehaviorCount
		inc retreatBehaviorCount
		//todo don't advance if player 2 is already attacking
		//on higher levels: don't advance if penalty counter is lower
		inc advanceBehaviorCount				
		inc advanceBehaviorCount				
		lda dist				
		cmp #1				
		bne !+				
		inc advanceAndAttackBehaviorCount					
		inc advanceAndAttackBehaviorCount						
!:		jmp done				

(I used a feature of the compiler that makes it possible to replace code like ‘lda #value, sta address’ with
‘mov #$value:address’

 There was one small additional trick I used when the AI would defend: it would predict your attack  based on a list of your previous attacks: the game keeps track of the last x attacks, then randomly chooses from the list and then reads the optimum defense from a table.

The competition rules demanded a ‘cheat version’ so every judge could play from beginning till the end.
My AI system made that quite easy; the code beneath prevents any attacking from the enemy based on a compile-time variable ‘cheat’:

.var doCheat = cmdLineVars.get("cheat")	
	.if(doCheat == "true") {
		lda #$80
	}else {
		lda #$0
	}
		sta attackBehaviorCount
		sta advanceAndAttackBehaviorCount

Finally I added levels so you could advance in a one player game. Making the AI more intelligent I wasn’t able to do, but the AI became faster, recovered energy more quickly every level increase.
Also you had to win within a set amount of time which became less every level increase.

I thought the gameplay and AI ideas were quite good, but I don’t think I ever got the settings just right for all those ideas to really work. It wasn’t helped by the lack of play testing (especially with two players, preferably not including myself).
Also playing on the emulator is not a good indication of how it plays on real hardware (getting diagonal movements on real joysticks is a lot harder then using a numpad).


OK, enough about the game and get back to the Willow Pattern Rehack Project.

The intro

As I mentioned before: the technique of the intro is based on the same technique as the game, but with some differences:

  • It’s 294 screenlines high instead of 252, which makes it use all of the screen (actually a bit more).
  • It doesn’t use 252 zero page addresses, but a lot less.
  • The re-writing routine is more efficient.
  • It converts sprite data to screen data every frame, during rewriting.
  • There are no artifacts; no visible changing of the screen while it’s written
  • The music routine is not done during the writing, rewriting of the screen routine.

Almost all those things are all improvements, but the downside is that there is virtually no processing time (2) left. Any music routine must be done in that time so that rules out any standard routine, like a goattracker or Sid wizard.

That is the reason the intro is not finished yet. I now have integrated the inverted version of the Willow Pattern music because that uses very little raster time, but it is very limited (no changes of sound, single length notes, no patterns). Maybe the one-raster music player is an option, although it looks like a complicated thing to write for.
I also need to think about what I want to say in the intro, but that’s not very important now.

But with a nice sprite like the whale from ‘Killerwatt’ it sure looks nice (I kept part of the loading screen in the gif for scale) :

The code of the current intro is added to git, including a compiled version.
The code doesn’t compile in it’s current state, because stuff is missing and the compiled version will crash after clicking the joystick button. But in time this will be fixed.
The compiled version also has all sprites of the game Wizzball added.

What next?

Considering the previous 20 months I shouldn’t promise anything (and there is a 16K compo coming up to stear me off the path), but this post is a restart and already I’m getting reacquainted with the code.
I want to write a bit about combining parts with a packer/relocator, adding trainer parts, maybe add some more to the trainer selection part, maybe do a live reenactment of the cracking all those years ago.


(1): it’s pretty hard to be really novel in making anything on the C64, escpecially technique-wise. Building screen by only changing the border color has been done before, and making games that use all of the screen, including the border also has been done before.
But as far as I know, making a game by only changing the border color has never been done before as far as I can tell.

(2): processing time is usually called ‘raster time’ in the C64 scene. How many raster lines the VIC-II (GPU) has drawn.

(3) a note on the video: The games were played using an emulator. The standard (version 3+) VICE emulator has quite pale color settings, which I feel don’t really reflect how it looks on real hardware. Especially on a SX64 it looks a lot more colorful.
On VICE you can choose different color profiles. Colodore (PAL) resembles the SX64 colors the most, I feel.
Some people would prefer to see the Grey Pixel Bug as well, which makes the pixels stand out more. You would need a C64C PAL for that or run the x64sc variant of VICE.

The game on a VICE emulator with Colodore colors and running x64sc.

(4) on a 6502/6510 CPU numbers are considered ‘negative’ if they are in the range of 128-255, which means bit 7 is 1. The CPU has simple instructions to determine if a number is negative.

(5) In fact I like the solution so much I might use it in other games I’ll make.

Leave A Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.