Friday, March 28, 2008

NULL should be renamed to UNKNOWN

problem:

the following has no output when it should has:

select * from src where n not in (select n from dst);



reproduce the problem:


create table src (n int);


create table dst(n int);


insert into src(n) values(1),(2),(3),(4),(5);
insert into dst(n) values(1),(2),(3);
select * from src where n not in (select n from dst); -- lists 4, 5

insert into dst(n) values(null);
select * from src where n not in (select n from dst); -- no output


should list 4 and 5?


it isn't, it didn't list anything


hmm.. seems strange



explanation:



SELECT 'SEE' WHERE 7 IN (SELECT 2 UNION SELECT 4)
is a shorthand for:
SELECT 'SEE' WHERE 7 = 2 OR 7 = 4



SELECT 'SEE' WHERE 7 NOT IN (SELECT 2 UNION SELECT 4)
is a shorthand for:
SELECT 'SEE' WHERE 7 <> 2 AND 7 <> 4




now let's inject NULL data :

SELECT 'SEE' WHERE 7 IN (SELECT 2 UNION SELECT 4 UNION SELECT NULL)
is a shorthand for:
SELECT 'SEE' WHERE 7 = 2 OR 7 = 4 or 7 = NULL



SELECT 'SEE' WHERE 7 NOT IN (SELECT 2 UNION SELECT 4 UNION SELECT NULL)
is a shorthand for:
SELECT 'SEE' WHERE 7 <> 2 AND 7 <> 4 AND 7 <> NULL



you cannot compare NULL, null means unknown. by the virtue of it being unknown, the expression 7 <> NULL is neither true nor false. the expression 7 = NULL, is also neither true nor false. 7 <> NULL: NULL, 7 = NULL: NULL.

100 <> 500 results true
100 = 500 results false
100 <> NULL results NULL
100 = NULL results NULL
100 IS NULL results false
100 IS NOT NULL results true
NULL = NULL results NULL
NULL <> NULL results NULL

NULL AND NULL results NULL
NULL AND TRUE results NULL
TRUE AND NULL results NULL
TRUE AND TRUE results TRUE

NULL OR NULL results NULL
NULL OR TRUE results TRUE
TRUE OR NULL results TRUE
TRUE OR TRUE results TRUE


NULL represents unknown, cannot be determined, not yet determined

rough analogy, think of WHERE A <> B as left hand <> right hand

WHERE 100 <> NULL

in plain english: is the left hand holding 100 dollar bill different with the bill in the closed right hand?

so what's the answer for 100 <> NULL?


let's have an example, your friend approach you, his/her left hand is open, you can see 100 bucks, right hand is close, you can't see anything. then he/she asks you if right hand holds different bill than the left hand, what would you(or any human) would answer? a human would answer, "it's different", "not different", or could answer "i don't know"


while on a computer, when you ask it, the computer's answer is always "i don't know." there is no gratification for a computer to guess correctly. perhaps, it's human nature to try to predict unknown forces, like trying to gauge if himself is capable of ESP, of capable of correctly predicting things 9 out of 10, or inclined in believing in divination(god will help me correctly guess so i can win this Deal or No Deal). humans are constantly feeding on gratifications, be it a guessing game, or trying their "pure luck"(believing in divination sometimes plays a large role here). so much for digressing :-)



so there we go, computers can't guess, it always answer "i don't know" when faced with unknown.


so the answer for the expression WHERE 100 <> NULL is unknown. neither true nor false. 100 = NULL is also unknown, neither true nor false. perhaps, RDBMS vendors should rename NULL to UNKNOWN.


let's try replacing NULL with UNKNOWN

100 <> 500 results true
100 = 500 results false
100 <> UNKNOWN results UNKNOWN
100 = UNKNOWN results UNKNOWN
100 IS UNKNOWN results false
100 IS NOT UNKNOWN results true
UNKNOWN = UNKNOWN results UNKNOWN
UNKNOWN <> UNKNOWN results UNKNOWN

UNKNOWN AND UNKNOWN results UNKNOWN
UNKNOWN AND TRUE results UNKNOWN
TRUE AND UNKNOWN results UNKNOWN
TRUE AND TRUE results TRUE


UNKNOWN OR UNKNOWN results UNKNOWN
UNKNOWN OR TRUE results TRUE
TRUE OR UNKNOWN results TRUE
TRUE OR TRUE results TRUE


there it is, more intuitive, perhaps when reading NULL, mentally read them as UNKNOWN.


these would not return a row:
SELECT * FROM employee WHERE FALSE
SELECT * FROM employee WHERE NULL

this could:
SELECT * FROM employe WHERE TRUE




let's go back on NOT IN construct:


SELECT 'SEE' WHERE 7 NOT IN (SELECT 2 UNION SELECT 4 UNION SELECT NULL)
is a shorthand for:
SELECT 'SEE' WHERE 7 <> 2 AND 7 <> 4 AND 7 <> NULL

resolved to:
SELECT 'SEE' WHERE TRUE AND TRUE AND NULL

then resolved to:
SELECT 'SEE' WHERE NULL


final result: no output


perhaps the rationale for using the word NULL is because it sounds so computer-sciencey. as for not using UNKNOWN as an identifier for er.. unknown, heck, it's hard to read double negatives, NOT UNKNOWN, oh.. humour me :-)


but... all these arguments can be thrown out of the window when mainstream programming languages like c# go the extra mile to make null comparisons not much to fuzz about:


int? a = 7;

int? b = null;


if (a != null) Console.WriteLine("indeed");

if (a != b) Console.WriteLine("still indeed");



programmer's life is sometimes difficult, impedance mismatches on our array of tools are in your face. our tools sometimes deviates from what we expected and asserted them to be. nonetheless we're constantly humbly learning to deal and get over with it.

which led me to the conclusion that i shouldn't have written these IS NULL, IS NOT NULL stuffs in the first place :-)



// Buen


Einstein argued that there must be simplified explanations of nature, because God is not capricious or arbitrary. No such faith comforts the software engineer.
- Fred Brooks, Jr.



P.S.

i'd go on a limb here to generalize VB.NET's inconsistency(for the nth time), it's the MySQL of imperative programming languages. null handling in vb.net is basically broken. but perhaps one of VB.NET's few redeeming factor is its block-structured constructs, e.g. If-EndIf, Select-EndSelect, For-Next, Do-Loop, etc, they are very programmer-friendly, i wish C# use those block delineators instead of curly brackets



string s = null;
if (s == "") MessageBox.Show("Not Good"); else MessageBox.Show("Good");
// outputs Good

Dim s As String
s = Nothing
If s = "" Then
MessageBox.Show("Broken")
Else
MessageBox.Show("Expect Nothing")
End If
' outputs Broken