Fading Stars

12 Jan 2021

Link (seems better for mobile)

Demo showing the combination of the stars and fade effect I used in Pico Space.

[UPDATE 2021-4-12: 8bit and 16bit cached modes (explanation below and in code), some other small tweaks]

[UPDATE 2021-5-6: added interleaved 8bit mode]

Stars

The stars are just simple particles that have x,y,z coordinates.

In this demo I use a couple of sin functions to give them some movement combined with a divide by the z coord for a bit of parallax. In game, I feed in the player's position.

Then I clamp the resulting x,y values to the screen with the modulus operator so they're always visible (%128). It does mean that the same stars go past constantly, but otherwise I was processing a lot of particles that don't get seen very often (not aiming for realism here).

Fade Effect

This works by mapping the colour of every pixel on the screen to another colour that tends to a target e.g. 1 (dark blue)-> 2 (black).

You can use a similar mapping with the pal(x,1) function to do fades to black between screens etc. but that fades everything including anything else drawn that frame.

In this demo I process the pixels already in screen memory so that the screen is faded by a step, then draw fresh stars on top of that.

It's pretty expensive to do the whole screen like this (IIRC about 90% of performance at 60fps) so I've set it up to do every fourth scan line, starting from a different point each frame. Effectively a quarter of the screen is faded at a time. It takes 4 frames to fade the whole screen one step.

I initially tried fading in quarter strips top to bottom, but the tearing on bigger objects like planets looked pretty bad.

Using the order 0,2,1,3 for the scanlines does some rough dithering to make the effect look a bit more uniform. A random value flr(rnd(4)) works quite well too, but is messier looking.

Since I found using poke4 to work on 8 pixels at a time was fastest (not surprising really) dithering horizontally is limited and isn't in the demo. Nevertheless, I keep meaning to try a "Z" pattern i.e.

0000000011111111
2222222233333333

I'm concerned it might cost too much more in performance/tokens for too little visual improvement.

[edit]

Of course, as soon as I write about it the old subconscious starts working away and it takes 5 minutes to implement just that - a reverse N pattern as it turns out. Same performance, same tokens. See the new cart.

Pros of Effect

  • You can draw whatever you want really and the effect essentially "just works" as a replacement for a cls().
  • Cross-fading out from a scene just happens "free".
  • very simple particles look much more complicated than they really are

Cons

  • You can't draw anything that moves without the fade effect "catching" it. It can be mitigated by drawing around your objects (e.g. black borders), but if the view moves more than the width of the border you're out of luck. Conversely, the effect is only visible where you _don't_ draw that frame - so if your game has e.g. a full-screen scrolling background that's drawn every frame then you won't see any effect at all.
    For a space game this isn't a huge problem, but it's still visible here and there e.g. if you fly over a large planet.
  • If nothing moves then there's no effect - try hacking the stars to be still in the demo.
  • Performance cost is approx 21% at 60fps.
  • Obv costs some tokens.

Caching

The effect works fine by extracting each pixel's colour value via shifting and masking then dumping the mapped values back onto the screen, but it's still pretty performance heavy.

When I was writing PICO Space I'd read a few times that procedurally generated content used a lot of memory so I didn't want to try anything like the following, but now I have a much better idea of the game's memory requirements I thought I'd give it a go.

8-bit Mapping

Pixels in PICO-8's screen are determined by a 4-bit value, but peeking and poking only works with 8-bit granularity at best i.e. a pair of pixels or more at once. The mappings I have contain 16 values for each possible colour of a pixel.

Considering pairs of pixels instead of single pixels, there are 16 * 16 = 256 possible combination of colours that need to be mapped. Why not store a table with each of these values - it can't be that large, right?

Turns out it isn't, especially when compared to the 2MB of space lua is given in PICO-8. In fact the demo seems to only use about 2K or so (which is still a lot more than the 256 bytes it should take, but still pretty small).

This means that a lot of masking and shifting isn't as necessary inside the inner loop. It even takes fewer tokens. The performance improvement is enough that half or even all of the screen being processed per frame isn't too bad.

16-bit Mapping

The next step was obviously to try mapping 4 pixels at a time using 16-bit values.

This would need a table of 16^4 = 65536 entries which isn't very big for a modern machine, but is pushing it pretty far for PICO-8. It's possible - take a look at the code. It also takes up a lot more memory: about 1200KB it seems. That's well over half of the total space available and for my purposes in PICO Space is enough to give me sporadic out of memory errors as it stands (PICO Space takes about 600-900KB depending on the size of the current galaxy and how much is going on in it at any particular moment). For other games it may be absolutely fine and it's tempting since there's about a 2x speed-up compared to my original implementation of the effect using this technique.

A Bit Too Far

PICO-8's number format is 16bit.16bit fixed point so every value I've been storing so far is actually 32 bits in size whether I use all of those bits or not. Why not use them all?

Storing mappings for 8 pixels isn't going to work: 16^8 = 4,294,967,296 - a bit too much for PICO-8.

Instead, the last implementation that I've tried (so far) stores two 16-bit values in each number in the cache table so that the same amount of mapping values as in the previous section takes half the entries and hence half the space. The upper 16 bits take the even values; lower 16 bits the odd values.

This brings the memory usage down to about 600KB or so, which is fairly reasonable.

Unfortunately, the two mapping values packed into a single PICO-8 table value need to be unpacked to be used in the inner loop of the effect. By the time shifts and masks are applied to do this I couldn't get the performance to really be any better than the original effect (without any caching of values), never mind faster than the other cached value versions.

Yet Another Way

Up until this point I'd only considered making the effect faster and not "better". Two horizontally adjacent pixels are represented by each byte in the screen so one of the first compromises I'd made was to assume I couldn't fade these separately per frame and so fading the whole screen over four frames was done with chunks of at least two horizontally adjacent pixels at a time.

Since the 8bit cache version uses so little memory, is faster and deals with all combinations of two pixels both fading on the same iteration it struck me that there wouldn't be much cost to keeping two caches of 8bit values, one with the left side pixel faded, one with the right and swapping which cache is used per frame. When combined with alternating which rows are processed, this allows a dither pattern that works on a block of 2x2 pixels - no more horizontal chunking:

01
23

Code

-- fading stars
-- by drakeblue

function _init()
	pal(15,140,1) -- mid blue instead of light peach
	g_scpal_map={
	-- map of every colour to a darker colour
	{[0]=0,unpack(split'0,1,1,2,1,13,10,2,4,9,3,15,5,4,1')},
	
	-- could equally map to lighter colour to fade to white
	-- or redder colour etc.
	{[0]=1,unpack(split'15,8,11,8,13,7,7,9,10,7,10,6,6,7,12')},
	{[0]=2,unpack(split'2,8,5,8,4,9,10,8,8,9,10,13,2,8,13')},
	
	-- map white/black or whatever your "target" colour is to another colour
	-- and it gets a bit trippy
	{[0]=1,unpack(split'15,8,11,8,13,7,0,9,10,7,10,7,6,7,12')},
	{[0]=7,unpack(split'0,1,1,2,1,13,6,2,4,9,3,15,5,4,1')},
	}
	
	g_dith={[0]=0,2,1,3}
	
	-- generate some stars with 3d coords
	g_stars={}
	srand(1)
	for i=1,500 do
		add(g_stars,{x=rnd(4096),y=rnd(4096),z=rnd(30)+0.1,c=ceil(rnd(15))})
	end
	
	g_sys_p,g_show_ui=0,1
	
	g_fade_types={scr_fade,scr_fade_z,clear,blank,
	scr_fade_8bit,scr_fade_8bit_bytez,scr_fade_8bit_half,scr_fade_8bit_all,
	scr_fade_16bit,scr_fade_16bit_half,scr_fade_16bit_all,
	scr_fade_16bp,scr_fade_16bp_half,scr_fade_16bp_all,
	scr_fade_8bit_inter}
	g_fade_type_names={"scr_fade","scr_fade_z","cls","none",
	"scr_fade_8b","scr_fade_8b_bz","scr_fade_8b_half","scr_fade_8b_all",
	"scr_fade_16b","scr_fade_16b_half","scr_fade_16b_all",
	"scr_fade_16bp","scr_fade_16bp_half","scr_fade_16bp_all",
	"scr_fade_8bit_inter"}
	g_fade=15
	g_map=1
	
	init_maps()
end

-- does nothing
function blank() end
function clear() cls() end

---------------------------------------------------------
-- sets up a pre-computed mapping of all possible pairs
-- of pixels to mapped pixels so that a byte can be processed
-- at a time. Removes the need for masking values retrieved
-- from memory
-- takes up v little memory approx 2k
function update_8bit_map()
	g_8bit_map={}
	for i=0,255 do
		g_8bit_map[i]=g_scpal_map[g_map][i>>>4&0xf]*16+g_scpal_map[g_map][i&0xf]
	end
end

function update_8bit_map2()
	g_8bit_map0={}
	g_8bit_map1={}
	for i=0,255 do
		--	 g_8bit_map0[i]=g_scpal_map[g_map][i>>>4&0xf]*16+g_scpal_map[g_map][i&0xf]
		--	 g_8bit_map1[i]=g_scpal_map[g_map][i>>>4&0xf]*16+g_scpal_map[g_map][i&0xf]
		g_8bit_map0[i]=(i&0xf0)+g_scpal_map[g_map][i&0xf]
		g_8bit_map1[i]=g_scpal_map[g_map][i\16&0xf]*16+(i&0xf)
	end
end

---------------------------------------------------------
-- sets up a pre-computed mapping of all possible quadruplets
-- of pixels to mapped pixels so that 2 bytes can be processed
-- at a time. Removes the need for masking values retrieved
-- from memory
-- takes up a lot of memory approx 1200k
function update_16bit_map()
	g_16bit_map={}
	for i=0x8000,0x7fff do
		g_16bit_map[i]=g_scpal_map[g_map][i>>>8&0xf]*256+g_scpal_map[g_map][i>>>12&0xf]*4096+
		g_scpal_map[g_map][i>>>4&0xf]*16+g_scpal_map[g_map][i&0xf]
	end
end

------------------------------------------------------------------------------------------
-- prt with colour 0 (black) outline
function prt_out(s,x,y,c)
	print(s,x-1,y,0)
	print(s,x+1,y)
	print(s,x,y-1)
	print(s,x,y+1)
	return print(s,x,y,c)
end

function init_maps()
	if g_fade>14 then
		g_8bit_map=nil
		g_16bp_map=nil
		g_16bit_map=nil
		update_8bit_map2()
	elseif g_fade>11 then
		g_8bit_map=nil
		g_8bit_map0=nil
		g_8bit_map1=nil
		g_16bit_map=nil
		update_16bp_map()
	elseif g_fade>8 then
		g_16bp_map=nil
		g_8bit_map=nil
		g_8bit_map0=nil
		g_8bit_map1=nil
		update_16bit_map()
	elseif g_fade>4 then
		g_16bit_map=nil
		g_16bp_map=nil
		g_8bit_map0=nil
		g_8bit_map1=nil
		update_8bit_map()
	else
		g_8bit_map=nil
		g_8bit_map0=nil
		g_8bit_map1=nil
		g_16bit_map=nil
		g_16bp_map=nil
	end
end

---------------------------------------------------------
-- sets up a pre-computed mapping of all possible quadruplets
-- of pixels to mapped pixels so that 2 bytes can be processed
-- at a time. Removes the need for masking values retrieved
-- from memory, but packs values so needs to be unpacked again
-- takes up half the memory of prev: approx 600k
function update_16bp_map()
	g_16bp_map={}
	local val
	for i=0x8000,0x7fff do
		local pack=g_scpal_map[g_map][i>>>8&0xf]*256+g_scpal_map[g_map][i>>>12&0xf]*4096+
		g_scpal_map[g_map][i>>>4&0xf]*16+g_scpal_map[g_map][i&0xf]
		if i&1==0 then
			val=pack>>>16
		else
			g_16bp_map[i\2]=pack+val
		end
	end
end
function _update60()
end

function _draw()
	
	g_sys_p+=1 -- value to feed animation and fade function.
	-- in game, i use the player's position to transform
	-- the star's positions for drawing
	
	if btnp(🅾️) then
		g_fade=(g_fade%#g_fade_types)+1
		init_maps()
	end
	
	if btnp(❎) then
		g_map=(g_map%#g_scpal_map)+1
		init_maps()
	end
	
	g_fade_types[g_fade](g_sys_p)
	
	--scr_fade(flr(rnd(4))) -- fun too
	
	-- switch between single pixels exclusively and some crosses
	if btnp'1' then
		g_points=nil
	elseif btnp'0' then
		g_points=1
	end
	
	-- switch overlay on and off
	if btnp'2' then
		g_show_ui=1
	elseif btnp'3' then
		g_show_ui=nil
	end
	
	-- draw stars
	local snx,sny=sin(g_sys_p/1280)*550,sin(g_sys_p/2560)*710
	for i,s in pairs(g_stars) do
		if g_points then
			pset((s.x-snx)/s.z%128,(s.y-sny)/s.z%128,s.c)
		else
			circfill((s.x-snx)/s.z%128,(s.y-sny)/s.z%128,s.c%2,s.c)
		end
	end
	
	-- show some stats, current algorithm and mapping
	if g_show_ui then
		prt_out("mem:"..stat(0).." cpu:"..stat(1)..":"..stat(2),0,0,12)
		prt_out("🅾️change algo:"..g_fade_type_names[g_fade].."\n❎change palette map",1,116,12)
	end
end

-- fades a quarter of the lines on the screen at a time
-- scan line by scan line using mapping above.
-- which line is dictated by p.
-- takes quite a chunk of performance
-- even only doing a quarter of the screen at a time.
function scr_fade(p)
	local dith={[0]=0,2,1,3} -- try to mix up lines a bit
	
	-- local tables seem to be faster.
	-- change start line based on dith value
	local m,d=g_scpal_map[g_map],0x6000+(dith[p%4]<<6)
	
	-- for a quarter of the 128 lines on the screen
	for j=0,31 do
		local j8=j<<8 -- saves a token
		-- for every 4bytes of this line
		for a=d+j8,d+j8+60,4 do
			
			-- grab existing value
			local v=$(a)
			
			-- map every pixel's colour to another one
			-- shift and mask 4bit pixel in 32bit value to just 4bit value
			-- to allow look up in map then shift back
			-- need logical shift >>> since don't want to consider sign
			poke4(a,m[v&0xf]|m[(v>>>4)&0xf]<<4|m[(v>>>8)&0xf]<<8|m[(v>>>12)&0xf]<<12
			|m[(v<<16)&0xf]>>>16|m[(v<<12)&0xf]>>>12|m[(v<<8)&0xf]>>>8|m[(v<<4)&0xf]>>>4)
		end
	end
end

-- fades a quarter of the lines on the screen at a time
-- following a Z pattern
-- which line is dictated by p.
-- takes quite a chunk of performance
-- even only doing a quarter of the screen at a time.
function scr_fade_z(p)
	local dith={[0]=0,64,4,68} -- backwards N pattern actually
	
	-- local tables seem to be faster.
	-- change start line based on dith value
	local m,d=g_scpal_map[g_map],0x6000+(dith[p%4])
	
	-- for half of the 128 lines on the screen
	for j=0,63 do
		local j8=j<<7 -- saves a token
		-- for every second 4bytes of this line
		for a=d+j8,d+j8+56,8 do
			
			-- grab existing value
			local v=$a
			
			-- map every pixel's colour to another one
			-- shift and mask 4bit pixel in 32bit value to just 4bit value
			-- to allow look up in map then shift back
			-- need logical shift >>> since don't want to consider sign
			poke4(a,m[v&0xf]|m[(v>>>4)&0xf]<<4|m[(v>>>8)&0xf]<<8|m[(v>>>12)&0xf]<<12
			|m[(v<<16)&0xf]>>>16|m[(v<<12)&0xf]>>>12|m[(v<<8)&0xf]>>>8|m[(v<<4)&0xf]>>>4)
		end
	end
end

-------------------------------------------------------------
-- 8 bit
-- sacrifice a little bit (2-3k) of lua ram
-- for performance

-- fades a quarter of the lines on the screen at a time
-- following a Z pattern
-- which line is dictated by p.
-- uses precomputed table with pairs of values
-- takes quite a bit less performance because
-- there's no need for pixel swizzling
function scr_fade_8bit(p)
	local dith={[0]=0,64,4,68} -- backwards N pattern actually
	
	-- local tables seem to be faster.
	-- change start line based on dith value
	local m,d=g_8bit_map,0x6000+(dith[p%4])
	
	-- for half of the 128 lines on the screen
	for j=0,0x1f80,128 do
		-- for every second 4bytes of this line
		for a=d+j,d+j+56,8 do
			
			-- map every pair of pixels to a mapped pair
			-- 4 bytes at a time
			-- shorter and quicker
			poke(a,m[@a],m[@(a+1)],m[@(a+2)],m[@(a+3)])
		end
	end
end

-- fades a quarter of the lines on the screen at a time
-- following a Z pattern
-- which line is dictated by p.
-- uses precomputed table with pairs of values
-- takes quite a bit less performance because
-- there's no need for pixel swizzling
function scr_fade_8bit_bytez(p)
	local dith={[0]=0,64,1,65} -- backwards N pattern actually
	
	-- local tables seem to be faster.
	-- change start line based on dith value
	local m,d=g_8bit_map,0x6000+(dith[p%4])
	
	-- for half of the 128 lines on the screen
	for j=0,0x1f80,128 do
		-- for every second 4bytes of this line
		for a=d+j,d+j+56,8 do
			
			-- map every pair of pixels to a mapped pair
			-- 4 bytes at a time
			-- shorter and quicker
			poke(a,m[@a])
			poke(a+2,m[@(a+2)])
			poke(a+4,m[@(a+4)])
			poke(a+6,m[@(a+6)])
		end
	end
end

------------------------------------------------------------------------------------------
-- fades a quarter of the screen at a time
-- scan line by scan line, left pixel then right pixel byte by byte
-- which line, side of pair is dictated by p
function scr_fade_8bit_inter(p)
	-- local tables seem to be faster.
	-- change start line based on oddness value
	local d,m=0x6000+p%2*64,p&2==0 and g_8bit_map0 or g_8bit_map1
	
	-- for half of the 128 lines on the screen
	for j=0,0x1f80,128 do
		-- for every 4bytes of this line
		for a=d+j,j+d+60,4 do
			
			-- map every pair of pixels to a mapped pair
			-- 4 bytes at a time
			-- shorter and quicker
			poke(a,m[@a],m[@(a+1)],m[@(a+2)],m[@(a+3)])
		end
	end
end

-- fades half of the lines on the screen at a time
-- which line is dictated by p.
-- uses precomputed table with pairs of values
-- takes about the same performance as doing a quarter
-- of the screen because there's no pixel swizzling.
-- effect is less noticeable
function scr_fade_8bit_half(p)
	-- local tables seem to be faster.
	-- change start line based on oddness value
	local m,d=g_8bit_map,0x6000+(p&1)*64
	
	-- for half of the 128 lines on the screen
	for j=0,0x1f80,128 do
		-- for every 4bytes of this line
		for a=d+j,j+d+60,4 do
			
			-- map every pair of pixels to a mapped pair
			-- 4 bytes at a time
			-- shorter and quicker
			poke(a,m[@a],m[@(a+1)],m[@(a+2)],m[@(a+3)])
		end
	end
end

-- fades all of the screen at a time
-- uses precomputed table with pairs of values
-- takes about half of total perf at 60 fps.
-- might be okay for some games e.g.
-- might be okay if 30fps is target.
-- effect is much less noticeable
function scr_fade_8bit_all()
	-- local tables seem to be faster.
	-- change start line based on oddness value
	local m=g_8bit_map
	
	-- for all of the 128 lines on the screen
	for j=0,0x1fc0,64 do
		-- for every 4bytes of this line
		for a=0x6000+j,j+0x603c,4 do
			
			-- map every pair of pixels to a mapped pair
			-- 4 bytes at a time
			-- shorter and quicker
			poke(a,m[@a],m[@(a+1)],m[@(a+2)],m[@(a+3)])
		end
	end
end

-------------------------------------------------------------
-- 16 bit
-- sacrifice a lot (approx 1200k) of lua ram
-- for even more performance
-- if your game doesn't use lua memory much
-- then this may be the way to go

-- fades a quarter of the lines on the screen at a time
-- following a Z pattern
-- which line is dictated by p.
-- uses precomputed table with quads of values
-- takes quite a lot less performance because
-- there's no need for pixel swizzling and half the read/writes
function scr_fade_16bit(p)
	local dith={[0]=0,64,4,68} -- backwards N pattern actually
	
	-- local tables seem to be faster.
	-- change start line based on dith value
	local m,d=g_16bit_map,0x6000+(dith[p%4])
	
	-- for half of the 128 lines on the screen
	for j=0,63 do
		local j8=j<<7 -- saves a token
		-- for every second 4bytes of this line
		for a=d+j8,d+j8+56,8 do
			
			-- map every pair of pixels to a mapped pair
			-- 4 bytes at a time
			-- shorter and quicker
			poke2(a,m[%a],m[%(a+2)])
		end
	end
end

-- fades half of the lines on the screen at a time
-- which line is dictated by p.
-- uses precomputed table with pairs of values
-- takes quite a bit less performance because
-- there's no need for pixel swizzling and half the read/writes
-- effect is less noticeable
function scr_fade_16bit_half(p)
	-- local tables seem to be faster.
	-- change start line based on oddness value
	local m,d=g_16bit_map,0x6000+(p&1)*64
	
	-- for half of the 128 lines on the screen
	for j=0,63 do
		local j8=j<<7 -- saves a token
		-- for every 4bytes of this line
		for a=d+j8,j8+d+60,4 do
			
			-- map every pair of pixels to a mapped pair
			-- 4 bytes at a time
			-- shorter and quicker
			poke2(a,m[%a],m[%(a+2)])
		end
	end
end

-- fades all of the screen at a time
-- uses precomputed table with pairs of values
-- takes a little more perf than initial version
-- effect is much less noticeable
function scr_fade_16bit_all()
	-- local tables seem to be faster.
	-- change start line based on oddness value
	local m=g_16bit_map
	
	-- for all of the 128 lines on the screen
	for j=0,127 do
		local j8=j<<6 -- saves a token
		-- for every 4bytes of this line
		for a=0x6000+j8,j8+0x603c,4 do
			
			-- map every pair of pixels to a mapped pair
			-- 4 bytes at a time
			-- shorter and quicker
			poke2(a,m[%a],m[%(a+2)])
		end
	end
end

-------------------------------------------------------------
-- 16 bit packed
-- sacrifice a bit less (approx 600k) of lua ram
-- but because of the unpacking needed it's not actually v fast.
-- unless there's a way to solve that this is pretty useless

-- fades a quarter of the lines on the screen at a time
-- following a Z pattern
-- which line is dictated by p.
-- uses precomputed table with quads of values
function scr_fade_16bp(p)
	local dith={[0]=0,64,4,68} -- backwards N pattern actually
	
	-- local tables seem to be faster.
	-- change start line based on dith value
	local m,d=g_16bp_map,0x6000+(dith[p%4])
	
	-- for half of the 128 lines on the screen
	for j=0,63 do
		local j8=j<<7 -- saves a token
		-- for every second 4bytes of this line
		for a=d+j8,d+j8+56,8 do
			
			-- map every pair of pixels to a mapped pair
			-- 4 bytes at a time
			-- shorter and quicker
			local v1,v2=%a,%(a+2)
			
			-- anyone know how to pack/unpack values faster than this?
			--	 poke2(a,m[v1\2]<<((v1&1)<<4),m[v2\2]<<((v2&1)<<4))
			--	 poke2(a,m[v1\2]<<(v1&1==0 and 16 or 0),m[v2\2]<<(v2&1==0 and 16 or 0))
			poke2(a,v1&1==0 and m[v1\2]<<16 or m[v1\2],v2&1==0 and m[v2\2]<<16 or m[v2\2])
		end
	end
end

-- fades half of the lines on the screen at a time
-- which line is dictated by p.
-- uses precomputed table with pairs of values
-- takes quite a bit less performance because
-- there's no need for pixel swizzling and half the read/writes
-- effect is less noticeable
function scr_fade_16bp_half(p)
	-- local tables seem to be faster.
	-- change start line based on oddness value
	local m,d=g_16bp_map,0x6000+(p&1)*64
	
	-- for half of the 128 lines on the screen
	for j=0,63 do
		local j8=j<<7 -- saves a token
		-- for every 4bytes of this line
		for a=d+j8,j8+d+60,4 do
			
			-- map every pair of pixels to a mapped pair
			-- 4 bytes at a time
			-- shorter and quicker
			local v1,v2=%a,%(a+2)
			poke2(a,v1&1==0 and m[v1\2]<<16 or m[v1\2],v2&1==0 and m[v2\2]<<16 or m[v2\2])
		end
	end
end

-- fades all of the screen at a time
-- uses precomputed table with pairs of values
-- takes a little more perf than initial version
-- effect is much less noticeable
function scr_fade_16bp_all()
	-- local tables seem to be faster.
	-- change start line based on oddness value
	local m=g_16bp_map
	
	-- for all of the 128 lines on the screen
	for j=0,127 do
		local j8=j<<6 -- saves a token
		-- for every 4bytes of this line
		for a=0x6000+j8,j8+0x603c,4 do
			
			-- map every pair of pixels to a mapped pair
			-- 4 bytes at a time
			-- shorter and quicker
			local v1,v2=%a,%(a+2)
			poke2(a,v1&1==0 and m[v1\2]<<16 or m[v1\2],v2&1==0 and m[v2\2]<<16 or m[v2\2])
		end
	end
end