Saturday, October 28, 2006

Algorithm: JPEG and related

I've been researching image compression on and off for a while. I see the methods that make up JPEG forming a core standard that most modern image and video compression is based on.

I've been wanting to make a JPEG compressor in a dsPIC for a while now. What do you do when you only have 16K to 30K of RAM? You look at what you're doing!

JPEG takes a YCbCr image and compresses each channel separately. A block of 8x8 values is put through a discrete cosine transform (DCT) which, like the FFT, provides a frequency value. Next, there is a zig-zag pattern defined by JPEG that takes this 8x8 block and rearranges it into a linear 64 byte block. So far, if the DCT result is kept in a floating-point format, no data has been lost. Next up is quantization. Either a fixed multiplier sequence (simple) or one calculated from the overall image (complex) is used to scale the image data, losing resolution. After that, any data below a certain threshold is set to zero, which starts the compression. The final step is what's called Hufmann coding. It uses either a set or calculated table to assign a unique binary symbol (say 01 is one symble, and 011 is another symbol) to replace a part of data. The theory is that a symbol's length is inversely related to how often it shows up. Ideally the table should be generated beforehand from statistics, although on the fly table generation gets close. Both the Hufmann table and the quantization table need to be passed to the reader so that it may be effectively decoded.

It's a lot of information, but there are better and more in depth discussions on the net.

I have a camera that does VGA images. Standard output is 4:2:2 (I think that's the format). For every pixel, it puts out a greyscale 8 bit value and either a red or a blue 8 bit value.
So, I don't have to do any conversion. I can decimate the red and blue channels to bring it down to a 4:1:1 depth (4 greyscale per 1 of the color channels). To correctly capture this data, I need 8 lines stored in RAM. As a raw data count, that would be 10,240 bytes. After decimation that's 7,680 bytes. Might be a bit tight in there...

dsPICs have DMA, which makes it easy to get the data into buffers without CPU capture intervention. Since the 8x8 DCT is so common in image compression, people have been trying all SORTS of ways to get it fast and simple A single 8x8 DCT can be broken down into 16 1x8 DCTs. You run a DCT on each row, then one on each column. My plan is to run the DCT in place in the buffer on each row and put the results into a new buffer outside of DMA control. I can also sort the rows out to isolate the various channels at the same time. Once 8 rows have been read in for any channel buffer, the 1D column DCT can be run.

there are 40MHz TI DSPs that have DCT hardware for image processing, allowing them to do the rest of the compression in software. TI DSPs are great. I just don't want to fork over several grand for a compiler.

After that, I can immediately start making decisions. I don't necessarily need a full JPEG compatible image as long as I have a reader on the other end that can decode what I encode. One way suggested in a technical paper (find reference) is that the encoding stops at quantization and truncating the data. This is nearly as effective in compression as the Huffman table (both size and quality) and greatly reduces the additional workload required. High frequency information suffers greatly, however. I'm considering this method. given the extremely rapid data transfer that must be done for an embedded camera. I'd like to get a VGA or QVGA video compressed down small enough to stream over an RF link, but I believe that may be impossible given the horsepower I want to attempt it with. Time for a 200MHz ARM32?