There's always tradeoffs to be made
Generally speaking code can have a small footprint, or be fast. Sometimes (but not as often as you'd want) you can get lucky and have both.
The level data in this case is one of those things. There are currently less than 8 types of tile on a 13x13 grid. This means that in theory we only need 3 bytes per block, and with the map being 169 tiles this would be 507 bits (64 bytes, rounded up). So what options do we have?
- 3 bits per tile - This would be the smallest representation, but as the start of each tile would start at an odd place in a byte and may even cross into the next byte along. This lands us straight into the roughest end of speed vs size territory as you'd need to find the start of the 3 bits, which may be one or 2 byte reads, shift them into the bottom 3 bits of a byte, maybe swapping and combining part-bytes. We also have the overhead of logic to figure out if these special cases happen, which takes time, and then we can use them. This is the worst case scenario.
So... we're pressed for both memory
and time, but we're going to have to choose one. Let's look at the wastage options, shall we?
- 4 bits per tile - Not quite as small in terms of representation at 676 bits (85 bytes), but we can look at it as expansion room for future tile types. Tiles still need expanding from their positions within the byte, but they alternate between the top and bottom half of each byte, with no crossings so it would be quicker and simpler at least.
- 8 bits per tile - This brings us up to the full 169 bytes for a level description. As far as reading out the data is concerned we would be wasting half of each byte for storing the tile number, but no unpacking would be needed. We can even turn the negatives into a positive and use the top half of the byte to hold other information about the tile (such as collision information) so you get everything in a single read and you can just AND-mask off the bits you don't need for drawing. Even random access is easy - just look up MAP_START_ADDRESS+X+(Y*13) and you'll get the byte. We're sorted!
oh. wait... Not quite that simple.
Y*13 now becomes our enemy. In the relative comfort of C++ it's easily done. On a modern cpu, or even a 16-bit 68000 or 32-bit ARM you can just use a MUL instruction, but on a machine with no multiply how do you even do this?
You could just have a lookup table with 13*0,13*1...13*12 in it, but in 6502 we only have 3 registers, 2 can be used for indexing into lists like this but they're probably busy doing other things anyway. The other way is to use shifts. Even if you don't code you've probably done this subconsciously by multiplying by 10,100,1000 or so by just moving everything left and adding zeroes onto the end. Binary works the same only shifting left multiples by 2,4,8,16,32... depending how far you go.
What about values that aren't nice binary values, such as 13, though? Well - you just find the binary shift values that make up the add number and add the result of doing those individual shifts together:
13 = 8 (shift left 3) + 4 (shift left 2) + 1 (original value). So we can do those 2 shifts, add them together. Oh - and do a bunch of swapping and putting stuff in and out of memory because as I'm sure I mentioned before
we only have 3 registers... my head aches just thinking about it. This brings us to one further option:
- 8 bits per tile, but with each line padded out to 16 bytes. We're now up to 208 bytes per level map, but we can find any given tile with (Y shifted left 4) + X. Fairly quick and decent. Even better is that if we have a pixel position on screen each tile is 16 pixels high, so to turn that into a block position we'd have to divide the Y position by 16, then multiply it by 16 again. Those two cancel out and can be replaced by just clearing the bottom half of the Y position by ANDing it with 0x0F. Even quicker.
The two-way split
Generally as I think about game code, whether it's 64k of assembler running at 1Mhz or thousands of lines of C++ in xcode I mentally divide code into 2 camps: gametime and non-gametime.
- Gametime is basically when you're playing, or an animation is running and you really need everything to by silky-smooth and as fast as it can go. If something happens in gametime it needs to favour speed. If it can not happen in gametime that would be even better,
- Non-gametime. This covers loading time, static screens being shown, and basically any time where responsiveness is less of an issue. Anything that happens during this time can take a size-over-speed bias.
We also need to have the game data whilst playing in RAM, as the level might change as we're playing it due to things being destroyed or moved. So this means we can choose 2 of the above options - one for storing the levels that aren't being played (in the cartridge ROM in this case), and another for what is called the 'inplay buffer' in the code, which is a working copy of the level in the RAM.
Currently the inplay buffer uses the 16x13 layout, as it is almost exclusively used during gametime so speed is of the essence however I've yet to decide on the ROM layout format. I'm tending towards the 4 bits per tile layout. This would mean stripping the collision information out of each tile byte, but I can add that back in when I unpack the level to the RAM. It takes a little time, but this is non-gametime so I'm okay with that. It also assumes that each tile type would always have the same collision information but consistency is good in gameplay so I can live with that too.
This means in a single 16K cartridge bank I can hold 192 screen definitions, which is overkill and doesn't factor in other level definition info, such as enemy placements, timers or other per-level adjustments I might come up with as I go along.
Not so important on the NES, but if I backport to other machines later on then this might gain more importance.