Thursday, April 20, 2006

scalar function


MS SQL:

CREATE FUNCTION FormatterNumber(@i INT) RETURNS VARCHAR(100)
AS
BEGIN
RETURN CONVERT(VARCHAR, @i) + ' Michael the INT'
END
GO
SELECT DBO.FormatterNumber(1)







PostgreSQL: Note postresql parameter can be also be referenced by their parameter number, e.g. $1, $2..





-- we don't have to declare varchar(100) in returns clause,
-- it is adjusted dynamically

-- method 1, this is flexible, can have multiple statements between begin and end
CREATE FUNCTION FormatterNumber(m_I INT) RETURNS VARCHAR
AS
$$
BEGIN
RETURN m_I::VARCHAR || ' INT';
-- RETURN $1::VARCHAR || ' INT'; -- this will work too
END;
$$
LANGUAGE 'plpgsql';



-- method 2, just change the language 'plpgsql' to 'sql',
-- this is rigid, only one statement is allowed,
-- though seems this is much faster(haven't yet benchmark against method 1)

-- somehow the disadvantage of language 'sql' is you can't refer the paramater
-- by its name, only by parameter position

CREATE FUNCTION FormatterNumber(m_I INT) RETURNS VARCHAR
AS
$$
SELECT $1::VARCHAR || ' Michael the INT';
$$
LANGUAGE 'sql';



invocation sample:

SELECT FormatterNumber(1);






MS SQL:

CREATE FUNCTION FormatterDate(@d DATETIME) RETURNS VARCHAR(100)
AS
BEGIN
RETURN CONVERT(VARCHAR, @d) + ' Michael the date'
END
GO

SELECT DBO.FormatterDate('November 5 2006')


PostgreSQL:



-- method 1
CREATE OR REPLACE FUNCTION FormatterDate(m_D DATE) RETURNS VARCHAR
AS
$$
BEGIN
RETURN m_D::VARCHAR || ' Michael the date';
END;
$$
LANGUAGE 'plpgsql';



-- method 2
CREATE OR REPLACE FUNCTION FormatterDate(m_D DATE) RETURNS VARCHAR
AS
$$
SELECT $1::VARCHAR || ' Michael the date';
$$
LANGUAGE 'plpgsql';


SELECT FormatterDate('November 5 2006')




PostgreSQL is much flexible in its function, you can overload function name,
you can have two function with same name,
resulting to a more symmetrical usage of function





-- method 1
CREATE OR REPLACE FUNCTION Formatter(m_I INT) RETURNS VARCHAR
AS
$$
BEGIN
RETURN m_D::VARCHAR || ' Michael the INT';
END;
$$
LANGUAGE 'plpgsql';



-- method 2
CREATE OR REPLACE FUNCTION Formatter(m_D DATE) RETURNS VARCHAR
AS
$$
SELECT $1::VARCHAR || ' Michael the date';
$$
LANGUAGE 'plpgsql';



SELECT Formatter(1);
SELECT Formatter('nov 5 2006');


Wednesday, April 19, 2006

best practice for avoiding dirty read

for transactions or triggers that do any data manipulation routine(UPDATE,DELETE,INSERT?), follow guideline below:

for MS SQL, always put TABLOCKX in SELECT clause:
SET @PreviousQty = (SELECT Qty FROM Inventory WITH(TABLOCKX) WHERE ItemCode = @ItemCode

FOR PostgreSQL, always put FOR UPDATE in SELECT clause:
m_PreviousQty = (SELECT Qty FROM Inventory WHERE ItemCode = m_ItemCode FOR UPDATE)


always put row lock even if the SELECTED will not be the table to be written upon. this will ensure preventing dirty reads from other concurrently executing transactions that modifies table you read.

Note regarding row-level locking:
WITH(TABLOCKX) and FOR UPDATE can also be used to implement row-level pessimistic locking for LAN-based application transactions, i.e. when you open one record, no other program can read the same record, this is called pessimistic locking.

shared row locks

PostgreSQL 8.1:

Shared row locks are a very important performance improvement for PostgreSQL’s implementation of referential integrity. PostgreSQL uses triggers to implement foreign keys, and thus it acquires locks using SQL statements. Before PostgreSQL 8.1, this meant the least restrictive lock it could acquire was a row exclusive lock (using a SELECT … FOR UPDATE statement). With 8.1, the developers added a method of acquiring a shared row lock via SQL (SELECT … FOR SHARE). This greatly reduces the locking contention in databases with lots of referential integrity.

DATEADD

MSSQL
select dateadd(year,1, '2004-2-29')

PostgreSQL
select '2004-2-29'::date + cast('1 years' as interval)

MSSQL:
select dateadd(month,1,'2004-2-29')


PostgreSQL:
select '2004-2-29'::date + cast('1 months' as interval)

MSSQL:
select dateadd(month,1, '1976-11-05 6:29 AM')

PostgreSQL:
select '2006-01-06 6:29 AM'::timestamptz + cast('1 months' as interval)

MSSQL:
select dateadd(day,1,'2006-01-06 6:29 AM')
-- or --
select convert(datetime,'2006-01-06 6:29 AM') + 1


PosgtgreSQL:
select '2006-01-06 6:29 AM'::timestamp + cast('1 days' as interval);

------------------------------------------------------------------------
------------------------------------------------------------------------
MS SQL equivalent compatible DATEADD function

-- Function: dateadd(character varying, integer, date)

-- DROP FUNCTION dateadd(character varying, integer, date);

CREATE OR REPLACE FUNCTION dateadd(character varying, integer, date)
RETURNS date AS
$BODY$

DECLARE
p_Interval ALIAS FOR $1;
p_N ALIAS FOR $2;
p_Date ALIAS FOR $3;




BEGIN
if p_Interval = 'm' then
return p_Date + cast(p_N || ' months' as interval);
elseif p_Interval = 'y' then
return p_Date + cast(p_N || ' years' as interval);
else
raise exception 'dateadd interval parameter not supported';
-- raise exception 'hello';
return null;
end if;
END;
$BODY$
LANGUAGE 'plpgsql';

Labels: ,

strip the time portion of the datetime

MS SQL:

select convert(datetime,convert(varchar,getdate(),101))

PostgreSQL:

select now()::date

ANSI SQL: PostgreSQL OK, MS SQL NOT

select cast(now() as date)

current row version checking. uses for concurrency checking

MS SQL:

-- checks if row has changed since last read
CREATE TABLE Distributor
(
MemberId VARCHAR(20),
TotalSales NUMERIC(10,2),
ConcurrencyCheck TIMESTAMP -- not related to ANSI-SQL 92 timestamp
)


INSERT INTO Distributor(MemberId, TotalSales) SELECT 'Michael',1000
SELECT * FROM Distributor
UPDATE Distributor SET TotalSales = TotalSales + 7000
SELECT * FROM Distributor

Here's the result:

MemberId TotalSales ConcurrencyCheck
-------------------- ------------ ------------------
Michael 1000.00 0x0000000000007C3B

MemberId TotalSales ConcurrencyCheck
-------------------- ------------ ------------------
Michael 8000.00 0x0000000000007C3C



---------------------------------------------------------

PostgreSQL:

Note: there's a hidden field in each postgres table, xmin for example, xmin changes every time changes happen in the row

-- checks if row has changed since last read
CREATE TABLE Distributor
(
MemberId VARCHAR(20),
TotalSales NUMERIC(10,2)
);


INSERT INTO Distributor(MemberId, TotalSales) SELECT 'Michael',1000;
SELECT *, Distributor.xmin FROM Distributor;
UPDATE Distributor SET TotalSales = TotalSales + 7000;
SELECT *, Distributor.xmin FROM Distributor;


Here's the result:

memberid | totalsales | xmin
----------+------------+------
Michael | 1000.00 | 3624

memberid | totalsales | xmin
----------+------------+------
Michael | 8000.00 | 3639



to convert xmin to integer(thanks to AndrewSN, freenode #postgresql):
select int8in(xidout(xmin)) from company

row-level locking, dirty-read prevention

explicit row-level locking and its uses

----------

MS SQL: (have tested WITH(ROWLOCK) doesn't seems working)

SET @m_Hit = @m_Hit + (SELECT hit FROM topleveldomainhits WITH(TABLOCKX) WHERE tldsuffix = '.com')

SET @m_Hit = @m_Hit + 1


PostgreSQL:

SELECT hit INTO m_Hit FROM topleveldomainhits WHERE tldsuffix = '.com' FOR UPDATE;

m_Hit = m_Hit + 1;


--------------------------------------------------

Note:

Explicit row-Level Locking is not needed if you are not doing complex computation, for the sake of discussion we just make an example of complex hit computation : )

here is a non-complex hit computation:
-- statement A:
UPDATE topleveldomainhits SET hit = hit + 1 WHERE tldsuffix = '.com'

row-level locking is needed if you are doing complex computation,e.g. passing result to variables(likely scenario is doing multi-line computation) and/or the possibility of parallel processing is very high.

if the statement is is one just one line like statement A, and two users executed it in parallel and even you SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED, they won't overwrite each other's data. single statements are an atomic operation in itself. so pagehit = 2 :)



-------------------------------
-------------------------------


another example of where explicit row-level locking is necessary, (complex computations)

for the purpose of discussion let's say we have only column field which is Bonus and employeeid



--- bonus computation prodedure run once every month ---



TotalBonus = $80
YearsWork 5
MonthsCount = 3


Final TotalBonus = $1200

MSSQL:

computer A:


CREATE PROCEDURE ComputeYearBonus
@EmployeeID VARCHAR(8)
AS
SET XACT_ABORT ON

BEGIN TRAN
DECLARE @YearsWork INT

SELECT @YearsWork = YearCount FROM YearsWork WITH(TABLOCKX) WHERE Employee =
@EmployeeId

DECLARE @Bonus MONEY
-- process 1. read bonus = 80 * 5 = $400
SET @Bonus = (SELECT TotalBonus FROM Employee WITH(TABLOCKX) WHERE EmployeeId = @EmployeeId) * @YearsWork

-- process 4. write bonus = $400
UPDATE Employee SET TotalBonus = @Bonus WHERE EmployeeId = @EmployeeID

COMMIT TRAN
GO

computer B:


CREATE PROCEDURE ComputeMonthBonus
@EmployeeID VARCHAR(8)

AS
SET XACT_ABORT ON
BEGIN TRAN

DECLARE @MonthsWork INT
SELECT @MonthsWork = MonthCount FROM MonthsWorkThisYear WITH(TABLOCKX) WHERE Employee = @EmployeeId

DECLARE @Bonus MONEY

-- process 2. read bonus = 80 * 3 = $240
SET @Bonus = (SELECT TotalBonus FROM Employee WITH(TABLOCKX) WHERE EmployeeId = @EmployeeId) * @MonthsWork

-- process 4. write bonus = $240
UPDATE Employee SET TotalBonus = @Bonus WHERE EmployeeId = @EmployeeID

COMMIT TRAN

GO



without WITH(TABLOCKX), when computer A and computer B execute at the same time, there is a high probability that they will overwrite each other's bonus computation. see the steps above. last written bonus is $240 instead of $1200.

if we put WITH(TABLOCKX) in our SELECTs, process 2 will wait until computer A commits. after computer A commits, computer B will read the bonus as $400, then $400 multiply by $3 = $1200.

same thing would effect even if computer B executes first, computer B computes $240. then computer A would read the bonus, computer A can't read the bonus until computer B commits. after B commits, computer A will read the bonus of $240. then computer A will multiply this by 5. $240 * 5 = $1200.


you'll notice that in the YearsWork in MonthsWorkThisYear we also put TABLOCKX, same thing, if there are other parallel executing transaction or trigger that updates the two table at the same time with our ComputeBonus. the computebonus will be messed, there will be dirty reads if there is no row-locking, we can't afford to have our bonus miscomputed.


------------------------------------------------



PostgreSQL Version:


postgresql has implicit BEGIN TRAN AND COMMIT TRAN:



CREATE FUNCTION ComputeYearBonus() RETURNS VOID
m_EmployeeID VARCHAR(8)
AS
$$
DECLARE
m_YearsWork INT;
m_Bonus NUMERIC;
BEGIN


SELECT YearCount INTO m_YearsWork FROM YearsWork WHERE Employee = m_EmployeeId FOR UPDATE;

-- process 1. read bonus = 80 * 5 = $400
m_Bonus = (SELECT TotalBonus FROM Employee WHERE EmployeeId = m_EmployeeId FOR UPDATE) * m_YearsWork;


-- process 4. write bonus = $400
UPDATE Employee SET TotalBonus = m_Bonus WHERE EmployeeId = m_EmployeeID;

END;
$$
LANGUAGE 'plpgsql';




CREATE FUNCTION ComputeMonthBonus() RETURNS VOID
m_EmployeeID VARCHAR(8)

AS
$$
DECRLARE
m_MonthsWork INT
m_Bonus NUMERIC
BEGIN


SELECT MonthCount INTO m_MonthsWork FROM MonthsWorkThisYear WHERE Employee = m_EmployeeId FOR UPDATE;


-- process 2. read bonus = 80 * 3 = $240
m_Bonus = (SELECT TotalBonus FROM Employee WHERE EmployeeId = m_EmployeeId FOR UPDATE) * m_MonthsWork;


-- process 4. write bonus = $240
UPDATE Employee SET TotalBonus = m_Bonus WHERE EmployeeId = m_EmployeeID;

END;
$$
LANGUAGE 'plpgsql';



table-level locking

MS SQL:
SET @m_Hit = (SELECT hit FROM totalhits WITH (TABLOCKX))
SET @m_Hit = @m_Hit + 1


PostgreSQL:

method 1:
SELECT hit INTO m_Hit FROM totalhits FOR UPDATE;
m_Hit = m_Hit + 1;

method 2:
LOCK TABLE pagehit IN SHARE ROW EXCLUSIVE MODE;
SELECT hit INTO m_Hit FROM totalhits;
m_Hit = m_Hit + 1;

Sunday, April 16, 2006

mssql features not present in postgresql

When migrating from PostgreSQL to MS SQL be aware that the following isn't present in PostgreSQL version 8

1. Returning multiple recordset from a function
2. Formula (but can be simulated in trigger)
3. Case Insensitive search (have to use ILIKE or UPPER)

ms sql stored proc, views, functions, triggers migrations to postgresql