02 August 2006

Series generating functions in PostgreSQL

I'm not sure how much the following constitutes Functional Programming in SQL, but various aspects of SQL reminds me much of the list and array processing functionalities in perl, python and Unix pipes.


Anyway, consider the following clone of python's range() built-in function in PostgreSQL:
Aw, crud. I'm an idiot. PostgreSQL has already a built-in generate_series() to cover this. Preserving my original code here though:

-- range() to emulate python's range(start, stop, step)
-- Note: "immutable" kind of makes this a pure function with no side effects
CREATE OR REPLACE FUNCTION range(int, int, int) RETURNS SETOF INT
LANGUAGE plpgsql IMMUTABLE AS '
DECLARE
i integer;
start alias for $1;
stop alias for $2;
step alias for $3;
BEGIN
i := start;

IF step = 0 THEN
EXIT;
END IF;

IF step > 0 THEN
LOOP
IF i >= stop THEN
EXIT;
END IF;
RETURN NEXT i;
i := i + step;
END LOOP;
ELSE
LOOP
IF i < stop THEN
EXIT;
END IF;
RETURN NEXT i;
i := i + step;
END LOOP;
END IF;
END;
';


-- Overloaded range() to emulate python's range(start, stop)
-- Written in the "sql" language because pgsql inlines SQL functions (faster).
CREATE OR REPLACE FUNCTION range(int, int) RETURNS SETOF INT
IMMUTABLE LANGUAGE SQL AS 'select * from range($1, $2, 1);';

-- Overloaded range() to emulate python's range(stop)
-- Written in the "sql" language because pgsql inlines SQL functions (faster).
CREATE OR REPLACE FUNCTION range(int) RETURNS SETOF INT
IMMUTABLE LANGUAGE SQL AS 'select * from range(0, $1, 1);';


I'm losing my train of thought here, but point is that with generate_series(), and postgresql'S CREATE AGGREGATE as a starting point, you'd be able to construct some pretty functional style SQL programming.

One quick, practical use of generate_series() would be to quickly fill in your database with test values:


-- Insert 1000 random users, 20-40 years of age,
INSERT INTO users (uid, username, date_of_birth)
SELECT
nextval('uid_seq'),
generate_random_username(),
now() - ('1 year'::interval * random() * 20) - '20 years'::interval
FROM
generate_series(1, 1000);

which the method we often use at work. More later.

18 July 2006

Lossy Logic Paradigm

Data compression algorithms fall into two categories: Lossless and Lossy. The former expects the decompressed data to be exactly the same as the original data while the latter can sacrifice a given amount of precision to achieve greater compression rates, so long as the decompressed data is Good Enough.

On the other hand, almost all other algorithms are lossless. For example, bigdatabasetable.getCount() to return exactly the correct, stable, no less of precision count at that time, no matter how many megabytes of information and seconds you need to run through to get that number. Which is annoying, when it returns 1e+6 when the context of the statement is:


if (bigdatabasetable.getCount() > 1024) {
// Did we just scanned a terabyte of information? Oops.
cout << "Database is not small!";
}


If you're using SQL, the above can slightly be "optimized" to
SELECT COUNT(*) FROM (SELECT 1 FROM bigdatabasetable LIMIT 1024) AS FOO

to avoid scanning too much information.

What'd be interesting is: bigdatabasetable.getEstimatedCount(100) where 100 would mean "spend up to about 100 msec on this, then give me your best result". In this case, we would be losing precision, but gaining control over the performance of the call.

getEstimatedCount is only a example of lossying a given algorithm or function call, but it can be equally applied to many other algorithms. bigarray.getBestMatch(".*?foobar", 100).getFirstTenItems().sort()

Applying Lossy Logic would mean we have to build applications to expect a little errors, but we do it all the time in time sensitive applications like video decoders and VOIP, network sensitive apps.

This coupled with Asynchronous calls via Message Bus might be an intriguing idea. Too sleepy to detail here now, but this'll allow us to easily make use of spare processors cores, either in the same computer, or nearby...

15 July 2006

Asynchronous + Message Bus

Think Asynchronous, not Synchronous; Message Bus, not Point To Point.

It's a moment of epiphany when I realized the two related concepts are the key to many, many nagging software engineering problems of mine. Neither of them are alien to me, and I'm aware of them for quite a while. It's just that I've never realized how universal the patterns are, and how it applied to so many things. asdfe

Consider: how would you elegantly show the progress of a large file copy across several different user interfaces? (e.g. text, GUI, web?) It's tempting to get lost in low level details like Dependency Injection, when it's better to decouple the file copy process and the progress display into two asynchronous thread or process, and then send out periodic progress events to a Message Bus where interested user interfaces can display them. Caveat: this only applies to slow operations. This even allows for more than one user interface (listener) per operation. Traditionally, the file copy operation would be the same process that updates the User Interface's progress bar synchronously.

That was just the beginning as I found more and more problems can be elegantly solved by thinking asynchronously. AJAX web interfaces. dbus's interface between network events and the NetworkManager applet. Qt's Signals and Slots. Pipelining. Everyone of them are related.

Phrasing their definition to relate to each other:

Asynchronous: issue a request, but don't expect an immediate response. You'll get notified when the response is completed.

Message Bus: don't choose what component to talk to directly, instead send and forget to the bus (aka Message Queue, Channel, Slot), and other interested components themselves to choose to listen to the bus.

09 July 2006

Vernor Vinge Motherlode

Finally found the rest of Vernor Vinge's works at the local book store today (Borders, Berjaya Times Square). Slurged on 4 of them: Tatja Grimm's World, Marooned in Realtime, Collected Stories of Vernor Vinge, The Peace War. Should be fun, Vinge's one of the few writers I dare to buy out right without research.

Adding to the pile, I got 3rd Ed. of Stevens' (and Fenner and Rudoff now) Unix Network Programming Volume 1. Had the hardcover 2nd ed, but don't know where that went. This new one is the more expensive (?!) softcover "International Edition".

'tis instant noodles for the forseeable future.