Sergio and the sigil

The new data types in SQL Server 2008

Posted by Sergio on 2008-04-06

I can't wait to use SQL 2008. I wish I could convince my DBA to jump on it as soon as it goes RTM, but the data peeps don't suffer of the same short attention span as us developers.

Reading a recent Technet Magazine article and attending the launch event in Chicago made me drool for the new features. And I don't usually care that much for database technologies.

The new features that are more within reach for developers are the new data types.

Spatial Data Types

A good part of my work involves GIS databases and overlaying business intelligence on top of it (or deriving BI from it). As such, it's with great interest that I see the addition of spatial data types in SQL 2008.

With SQL 2008 I'll be able to treat all those latitude/longitude pairs as first class citizens in my tables and stored procedures. Add to that all the new supporting functions that come with these new data types. That means I do not have to write my own functions to determine the bounding rectangles around a collection of points or shapes. Nor will I have to create my own function to calculate the distance between two coordinate points considering the Earth's shape. Believe me, this can be huge.

Suppose I have some oddly-shaped polygon that represents a geometric region like a mining field, a flood zone, or a pizza delivery service area (or anything small enough that can be considered flat, discarding the Earth's curvature).

DECLARE @shape geometry
SET @shape = geometry::STPolyFromText('POLYGON ((
                 47.653 -122.358, 
                 47.653 -122.354, 
                 47.657 -122.354, 
                 47.653 -122.350, 
                 47.645 -122.350, 
                 ... (snip)
                 47.651 -122.355, 
                 47.653 -122.358))',  0)

First of all, it's nice to have a data type to represent this polygon, and not having to use my own parsing mechanism or using other tables to store that. Another good thing is that these data types follow the OGC standards.

To determine the bounding rectangle for the shape defined above, it's as simple as calling a method on that shape object.

SELECT @shape.STEnvelope().ToString()
-- outputs something like
/*
 POLYGON (( 
	47.657 -122.358,
	47.657 -122.350,
	47.645 -122.350,
	47.645 -122.358,
	47.657 -122.358))
*/

Did I say object? Doesn't the syntax above look like plain old .Net? Exactly. The spatial data types were implemented in .Net, leveraging the capability of adding .Net user defined types, which was introduced in SQL 2005.

Spatial data types is a large topic, maybe I'll come back to it with a longer post once I have a chance to build an actual application with them. For now, I'll just point you to this nice series from Jason Follas.

The DATE type

When you don't need the time portion of a DATETIME, you can use the new 3-byte DATE type to store dates from 1/1/0001 to 12/31/9999. This coincides with the range (minus time) of the .Net System.DateTime structure.

DECLARE @d DATE
SET @d = GETDATE()
SELECT @d -- outputs '2008-04-06 00:00:00.000'

SET @d = '1234-03-22 11:25:09'
SELECT @d  -- outputs '1234-03-22  00:00:00.000'

The TIME type

There are also scenarios when we only need the time portion of a DATETIME, that's where the new TIME type comes in handy.

The TIME data type's size can be 3, 4, or 5 bytes, depending on the chosen precision. The default precision is 7 digits, but you can specify the precision you need when declaring the data.

CREATE TABLE #times(
	T 	TIME,
	T1	TIME(1),
	T5	TIME(5),
	T7	TIME(7)
	)

INSERT INTO #times VALUES (
	'01:02:03.1234567',
	'01:02:03.1234567',
	'01:02:03.1234567',
	'01:02:03.1234567')

SELECT * FROM #times

T                T1               T5               T7
---------------- ---------------- ---------------- ----------------
01:02:03.1234567 01:02:03.1000000 01:02:03.1234600 01:02:03.1234567

The DATETIME2 type

If you liked the increased precision of the TIME type and the increased date range of the DATE type, you'll be happy to learn that the new DATETIME2 stored date and time, with greater precision and range, basically combining those other 2 new data types.

DECLARE @dateA DATETIME2 = '2008-04-05 01:02:03.12345'
PRINT @dateA -- outputs 2008-04-05 01:02:03.1234500
DECLARE @dateB DATETIME2(4) = '2008-04-05 01:02:03.12345'
PRINT @dateB -- outputs 2008-04-05 01:02:03.1235

With this new data type you can have your dates range from 0001-01-01 00:00:00.0000000 to 9999-12-31 23:59:59.9999999. Again, this works well with .Net applications. You can initialize the DATETIME2 with string literals as shown above. These literal can be either ODBC format or ISO-8601 (sortable date time format, same as DateTime.ToString("s") in .Net).

The DATETIMEOFFSET type

With this new data type, SQL Server learns about time-zones. This is of particular interest to me because of the globally distributed data and users that I have to deal with.

The DATETIMEOFFSET type ranges in size from 8 to 10 bytes. It's precision is also defined at declaration time, just like the other new types shown above.

-- time stamp on Central Daylight Time
DECLARE @today DATETIMEOFFSET = '2008-04-05T01:02:03.1234567-05:00'
PRINT @today -- outputs 2008-04-05 01:02:03.1234567 -05:00
DECLARE @today2 DATETIMEOFFSET(2) = '2008-04-05T01:02:03.1234567-05:00'
PRINT @today2 -- outputs 2008-04-05 01:02:03.12 -05:00

We can initialize the DATETIMEOFFSET values using literal strings in ISO-8601 YYYY-MM-DDThh:mm:ss[.nnnnnnn][{+|-}hh:mm] or YYYY-MM-DDThh:mm:ss[.nnnnnnn]Z (for times exclusively in UTC.)

The HIERARCHYID type

About time, thank you. That's all I'm going to say. After having to endure the pain of representing, traversing, and querying hierarchical information stored in flat tables, I plan to use this data type extensively and never having to look back ever again.

You probably know what I'm talking about. All those foreign keys that point to the same table, like ParentCategoryID in a Categories table or ReportsToID in an Employees table.

Now we can simply define a column of type HIERARCHYID that will keep track of the record's position within the hierarchy being managed.

-- our Categories table
CREATE TABLE #Categories (
	CategoryID INT IDENTITY(1,1),
	CategoryNode HIERARCHYID NOT NULL,
	CategName NVARCHAR(40) NOT NULL
	)

We will need to populate the CategoyNode field with the correct hierarchy information. The first element (the root) is the odd man out. After the root node, the process is quite repetitive.

-- the root category
DECLARE @root HIERARCHYID = hierarchyid::GetRoot()
INSERT INTO #Categories (CategoryNode, CategName) 
	VALUES (@root, 'All #Categories')

-- insert the 'Electronics' category
DECLARE @electronics HIERARCHYID
SELECT @electronics = @root.GetDescendant(NULL, NULL)
INSERT INTO #Categories (CategoryNode, CategName) 
	VALUES (@electronics, 'Electronics')

-- insert the 'Music' category after 'Electronics'
DECLARE @music HIERARCHYID
SELECT @music = @root.GetDescendant(NULL, @electronics)
INSERT INTO #Categories (CategoryNode, CategName) 
	VALUES (@music, 'Music')

-- insert the 'Apparel' category between 'Electronics' and 'Music'
SELECT @music = @root.GetDescendant(@music, @electronics)
INSERT INTO #Categories (CategoryNode, CategName) 
	VALUES (@music, 'Apparel')

-- insert some children under 'Electronics'
DECLARE @video HIERARCHYID
--   We could do a simple @category.GetDescendant() but, let's
--      show something that is more likely to happen
SELECT @video = CategoryNode.GetDescendant(NULL, NULL)
  FROM #Categories WHERE CategName ='Electronics'
INSERT INTO #Categories (CategoryNode, CategName) 
	VALUES (@video, 'Video Equipment')

-- insert some children under 'Video Equipment'
DECLARE @tvs HIERARCHYID
SELECT @tvs = @video.GetDescendant(NULL, NULL)
INSERT INTO #Categories (CategoryNode, CategName) 
	VALUES (@tvs, 'Televisions')

DECLARE @players HIERARCHYID
SELECT @players = @video.GetDescendant(NULL, @tvs)
INSERT INTO #Categories (CategoryNode, CategName) 
	VALUES (@players, 'DVD - BluRay')

When we query the table, the output from the CategoryNode column reflects the position of the record in the hierarchy, similar to directory paths.

SELECT 
	CategoryID, CategName, 
	CategoryNode, 
	CategoryNode.ToString() AS Path
FROM #Categories
Output:
CategoryID  CategName         CategoryNode   Path
----------- ----------------- -------------- ---------
1           All #Categories   0x             /
2           Electronics       0x58           /1/
3           Music             0x48           /0/
4           Apparel           0x52C0         /0.1/
5           Video Equipment   0x5AC0         /1/1/
6           Televisions       0x5AD6         /1/1/1/
7           DVD - BluRay      0x5AD2         /1/1/0/

Note that the numbers that are separated by / in the Path column are not the CategoryID values. They are values that represent the sequence of the record within its siblings. The value is directly related to where we positioned the node when we called GetDescendant. The HIERARCHYID data type stores a binary value, as shown in the above output.

Now let's see how we would return all the items under a given category, recursively.

DECLARE @electronics_Categ HIERARCHYID
SELECT @electronics_Categ=CategoryNode 
	FROM #Categories WHERE CategoryID=2
SELECT CategoryID, CategName, CategoryNode.ToString() AS Path 
	FROM #Categories 
	WHERE @electronics_Categ.IsDescendant(CategoryNode)=1
Output:
CategoryID  CategName       Path
----------- --------------- --------
2           Electronics     /1/
5           Video Equipment /1/1/
6           Televisions     /1/1/1/
7           DVD - BluRay    /1/1/0/

Being able to perform these sorts of queries without having to resort to Common Table Expressions makes the queries so much simpler. I need this now.

You can get more details on the HIERARCHYID data type here.

Wrapping up

This post is by no means an extensive overview of these data types, but I hope it serves as a brief introduction to what is available in terms of data in the upcoming release of SQL 2008.