Outlier

Wordpad on Windows 7 supports ODF

2009-01-14T19:11:00.002+08:00

Linkjacking myself, heh: Wordpad on Windows 7 supports ODF

Got Windows 7 Beta running on VMware, discovered that Wordpad supports ODF, as mentioned by yoonkit. Also notice that they're using the Ribbon interface from Microsoft Office 2007.

PHP Namespaces

2008-10-29T19:25:00.003+08:00

Succinctly put. I'm a fan of worse is better, but this is pushing it way too much. PHP's like a bunch of infinite monkeys whacking on keyboards, alternatively coming up with brilliant stuff like SimpleXML, Wordpress and Drupal, or brillant [sic] stuff like the namespaces fiasco (":::" is "hard to parse/easy to be confused"), and magic_quotes and register_globals. Utterly tasteless decisions.

FOSS.my in the planning.

2008-09-10T21:46:00.002+08:00

Overdue for a FOSS event:

http://foss.org.my/projects/events/foss.my/FOSS.my

Tentative 8 Nov 2008.

Will have an IRC meet on it, probably #myoss, freenode, evening 16 Sep 2008. Everyone should join in if possible, even to lurk.

See you there.

nginx "bug"

2008-08-07T00:32:00.002+08:00

Encountered some odd drupal+apache2+nginx interaction bug today. Basically, "Transfer-encoding: chunked" was done on the content twice if drupal 404'd. Doesn't seem to trigger outside of drupal, IIRC. Don't know exactly which fault was it, but:

http://www.ruby-forum.com/topic/152435

described it and nginx author Igor Sysoev gave a patch to fix it. His own development version of nginx doesn't seem to have that patch incoorperated, but the patch made the bug go away.

Win. I had hacked nginx .deb for Ubuntu Hardy amd64 at my informal work site.

Patch your nameservers!

2008-07-24T14:26:00.006+08:00

Just patched a number of our nameservers. ( http://www.doxpara.com/ ). Be careful that it's not your content nameservers that matters here, but your own resolver and upstream's nameservers that matters. Check with the tool in www.doxpara.com. Ingenius way to check for vulnerablity, btw.

If you don't trust your upstream's dns, run your own patched nameserver, but don't forward queries upstream, but straight to the root servers.

P.S. Use opendns or our own patched dns: 202.190.85.116 (temporary while upstream patches their's. I'll remove this in a bit)

Caching queries via functional indices in PostgreSQL.

2008-06-11T20:45:00.000+08:00

Straight to the point:


foo=# select count(*) from bar;
count
--------
624569
(1 row)

foo=# explain analyze select count(*) from bar where baz ilike '%some%string%' and quux = '123';
                                                      QUERY PLAN                                                     
------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=25846.58..25846.59 rows=1 width=0) (actual time=1543.654..1543.655 rows=1 loops=1)
  ->  Seq Scan on bar  (cost=0.00..25846.53 rows=17 width=0) (actual time=288.667..1543.556 rows=32 loops=1)
        Filter: (((baz)::text ~~* '%some%string%'::text) AND (quux = '123'::bpchar))
Total runtime: 1543.798 ms

It takes 1.5 seconds on our sample data to find 32 rows we want out of 624569 rows. If this happens to be a large table, and we often need to run this query, rather than creating some external or internal trigger to cache this query, we can use PostgreSQL's partial indices to do the work for us:


foo=# create index bar_idx_some_string on bar(id) where baz ilike '%some%string%' and quux = '123';
CREATE INDEX
foo=# explain analyze select count(*) from bar where baz ilike '%some%string%' and quux = '123';
                                                          QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=12.71..12.72 rows=1 width=0) (actual time=0.470..0.473 rows=1 loops=1)
  ->  Index Scan using bar_idx_some_string on bar  (cost=0.00..12.67 rows=17 width=0) (actual time=0.122..0.406 rows=32 loops=1)
Total runtime: 0.534 ms

Now it takes 0.0005 seconds.

This also works if in a single transaction or query session, you need to run a number of queries against a table with the similar conditions. Create the index first (give it a temporary name), run the queries, then drop the index.

Caveats:
Creating the index takes up about same time as the original query, so you need to reuse the conditions to get back your initial investment.

Adding or updating to the table takes up a teeny more time. Works best on infrequently updated tables.

Kind of useless if the index condition matches a large percentage of the table.

You can use some rather complication functions and conditions for your index, but they must be immutable and must not use external info (other tables, time of day).

CREATE INDEX locks the table against writes. Use CREATE INDEX CONCURRENTLY on live systems. It comes with it own caveats though.

Docs: CREATE INDEX

Rant: Stupid OOo hyperlinks.

2007-10-11T20:55:00.000+08:00

OpenOffice.org 2.3. Type:

The quick brown fox http://www.google.com/
over the lazy dog.

Now click on the end of line after http://www.google.com/. Type "jumps". Now try to remove "jumps" away from the link. You can't.

This: http://www.laliluna.de/remove-openoffice-hyperlink.html don't really work properly. It just changes the style so it doesn't look like a link. In HTML speak, it becomes

http://www.google.com/jumps
Source:
<a href="http://www.google.com/">http://www.google.com/</a><a href="''" style="text-decoration: none">jumps</a>

instead of

http://www.google.com/jumps
Source:
<a href="http://www.google.com/">http://www.google.com/</a>jumps

The original problem is that OOo puts the cursor to the left of the invisible "</a>" when you try to get to the end of the line, not after.

Very, very irritating bug that destroys the mental flow.

Related problems: 4364 Nothing new. Resolution: "INVALID"? wtf?

Back! Random web optimizations.

2007-08-08T19:03:00.000+08:00

Wow, that's a long time since my last entry. Just gonna dive right in and continue.

Anyway, been using using Firebug for ages, and now combined with YSlow, I've finally been looking at the performance numbers for some sites and web apps we have. A few interesting tidbits:

Apache Bench not that useful at showing real browser experience. Too many pages have external requests that take forever to load. Firebug's "Net" information is more useful in tracking down bottlenecks. YSlow would take that information and tell you where things suck.

Just bite the CPU load hit and enable transparent deflate compression on Apache. It's worth it. The network bandwidth is a lot more of a bottleneck than CPU processing is.
AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css application/x-javascript

Drupal 5.x can automagically concat its many .css together. No need to muck around. Admin - Site Config - Performance - Aggregate and compress CSS files. Too bad it does not handle the nonconformant themes' css.

Series generating functions in PostgreSQL

2006-08-02T01:33:00.000+08:00

I'm not sure how much the following constitutes Functional Programming in SQL, but various aspects of SQL reminds me much of the list and array processing functionalities in perl, python and Unix pipes.

~~Anyway, consider the following clone of python's range() built-in function in PostgreSQL:~~
Aw, crud. I'm an idiot. PostgreSQL has already a built-in generate_series() to cover this. Preserving my original code here though:


-- range() to emulate python's range(start, stop, step)
-- Note: "immutable" kind of makes this a pure function with no side effects
CREATE OR REPLACE FUNCTION range(int, int, int) RETURNS SETOF INT
LANGUAGE plpgsql IMMUTABLE AS '
DECLARE
 i     integer;
 start alias for $1;
 stop  alias for $2;
 step  alias for $3;
BEGIN
    i := start;

    IF step = 0 THEN
        EXIT;
    END IF;

    IF step > 0 THEN
        LOOP
            IF i >= stop THEN
                EXIT;
            END IF;
            RETURN NEXT i;
            i := i + step;
        END LOOP;
    ELSE
        LOOP
            IF i < stop THEN
                EXIT;
            END IF;
            RETURN NEXT i;
            i := i + step;
        END LOOP;
    END IF;
END;
';


-- Overloaded range() to emulate python's range(start, stop)
-- Written in the "sql" language because pgsql inlines SQL functions (faster).
CREATE OR REPLACE FUNCTION range(int, int) RETURNS SETOF INT
IMMUTABLE LANGUAGE SQL AS 'select * from range($1, $2, 1);';

-- Overloaded range() to emulate python's range(stop)
-- Written in the "sql" language because pgsql inlines SQL functions (faster).
CREATE OR REPLACE FUNCTION range(int) RETURNS SETOF INT
IMMUTABLE LANGUAGE SQL AS 'select * from range(0, $1, 1);';

I'm losing my train of thought here, but point is that with generate_series(), and postgresql'S CREATE AGGREGATE as a starting point, you'd be able to construct some pretty functional style SQL programming.

One quick, practical use of generate_series() would be to quickly fill in your database with test values:


-- Insert 1000 random users, 20-40 years of age,
INSERT INTO users (uid, username, date_of_birth)
  SELECT 
    nextval('uid_seq'), 
    generate_random_username(), 
    now() - ('1 year'::interval * random() * 20) - '20 years'::interval
  FROM
    generate_series(1, 1000);

which the method we often use at work. More later.

Lossy Logic Paradigm

2006-07-18T00:18:00.000+08:00

Data compression algorithms fall into two categories: Lossless and Lossy. The former expects the decompressed data to be exactly the same as the original data while the latter can sacrifice a given amount of precision to achieve greater compression rates, so long as the decompressed data is Good Enough.

On the other hand, almost all other algorithms are lossless. For example, bigdatabasetable.getCount() to return exactly the correct, stable, no less of precision count at that time, no matter how many megabytes of information and seconds you need to run through to get that number. Which is annoying, when it returns 1e+6 when the context of the statement is:


if (bigdatabasetable.getCount() > 1024) {
    // Did we just scanned a terabyte of information? Oops.
    cout << "Database is not small!";
}

If you're using SQL, the above can slightly be "optimized" to

SELECT COUNT(*) FROM (SELECT 1 FROM bigdatabasetable LIMIT 1024) AS FOO

to avoid scanning too much information.

What'd be interesting is: bigdatabasetable.getEstimatedCount(100) where 100 would mean "spend up to about 100 msec on this, then give me your best result". In this case, we would be losing precision, but gaining control over the performance of the call.

getEstimatedCount is only a example of lossying a given algorithm or function call, but it can be equally applied to many other algorithms. bigarray.getBestMatch(".*?foobar", 100).getFirstTenItems().sort()

Applying Lossy Logic would mean we have to build applications to expect a little errors, but we do it all the time in time sensitive applications like video decoders and VOIP, network sensitive apps.

This coupled with Asynchronous calls via Message Bus might be an intriguing idea. Too sleepy to detail here now, but this'll allow us to easily make use of spare processors cores, either in the same computer, or nearby...

Asynchronous + Message Bus

2006-07-15T02:56:00.000+08:00

Think Asynchronous, not Synchronous; Message Bus, not Point To Point.

It's a moment of epiphany when I realized the two related concepts are the key to many, many nagging software engineering problems of mine. Neither of them are alien to me, and I'm aware of them for quite a while. It's just that I've never realized how universal the patterns are, and how it applied to so many things. asdfe

Consider: how would you elegantly show the progress of a large file copy across several different user interfaces? (e.g. text, GUI, web?) It's tempting to get lost in low level details like Dependency Injection, when it's better to decouple the file copy process and the progress display into two asynchronous thread or process, and then send out periodic progress events to a Message Bus where interested user interfaces can display them. Caveat: this only applies to slow operations. This even allows for more than one user interface (listener) per operation. Traditionally, the file copy operation would be the same process that updates the User Interface's progress bar synchronously.

That was just the beginning as I found more and more problems can be elegantly solved by thinking asynchronously. AJAX web interfaces. dbus's interface between network events and the NetworkManager applet. Qt's Signals and Slots. Pipelining. Everyone of them are related.

Phrasing their definition to relate to each other:

Asynchronous: issue a request, but don't expect an immediate response. You'll get notified when the response is completed.

Message Bus: don't choose what component to talk to directly, instead send and forget to the bus (aka Message Queue, Channel, Slot), and other interested components themselves to choose to listen to the bus.

Vernor Vinge Motherlode

2006-07-09T23:56:00.000+08:00

Finally found the rest of Vernor Vinge's works at the local book store today (Borders, Berjaya Times Square). Slurged on 4 of them: Tatja Grimm's World, Marooned in Realtime, Collected Stories of Vernor Vinge, The Peace War. Should be fun, Vinge's one of the few writers I dare to buy out right without research.

Adding to the pile, I got 3rd Ed. of Stevens' (and Fenner and Rudoff now) Unix Network Programming Volume 1. Had the hardcover 2nd ed, but don't know where that went. This new one is the more expensive (?!) softcover "International Edition".

'tis instant noodles for the forseeable future.