18 July 2006

Lossy Logic Paradigm

Data compression algorithms fall into two categories: Lossless and Lossy. The former expects the decompressed data to be exactly the same as the original data while the latter can sacrifice a given amount of precision to achieve greater compression rates, so long as the decompressed data is Good Enough.

On the other hand, almost all other algorithms are lossless. For example, bigdatabasetable.getCount() to return exactly the correct, stable, no less of precision count at that time, no matter how many megabytes of information and seconds you need to run through to get that number. Which is annoying, when it returns 1e+6 when the context of the statement is:


if (bigdatabasetable.getCount() > 1024) {
// Did we just scanned a terabyte of information? Oops.
cout << "Database is not small!";
}


If you're using SQL, the above can slightly be "optimized" to
SELECT COUNT(*) FROM (SELECT 1 FROM bigdatabasetable LIMIT 1024) AS FOO

to avoid scanning too much information.

What'd be interesting is: bigdatabasetable.getEstimatedCount(100) where 100 would mean "spend up to about 100 msec on this, then give me your best result". In this case, we would be losing precision, but gaining control over the performance of the call.

getEstimatedCount is only a example of lossying a given algorithm or function call, but it can be equally applied to many other algorithms. bigarray.getBestMatch(".*?foobar", 100).getFirstTenItems().sort()

Applying Lossy Logic would mean we have to build applications to expect a little errors, but we do it all the time in time sensitive applications like video decoders and VOIP, network sensitive apps.

This coupled with Asynchronous calls via Message Bus might be an intriguing idea. Too sleepy to detail here now, but this'll allow us to easily make use of spare processors cores, either in the same computer, or nearby...

No comments: