SQL SERVER TIPS: May 2009

TSQL Challenge 5

The context

You have to do some reporting on blog posts on searches by keywords.
The blog engine stores the tags defining the post in one comma separated values column.
Sample Blog post data:

id          name                 tags

----------- -------------------- --------------------

1           post 1               sql,profiler,table

2           post 2               sql,performance

3           post 3               profilter

4           post 4               view,table

Each time people make a search by keywords, you store it in a dedicated table of filtered searches. As for blog tags, the values are stored in only one comma separated column. This historical search values will be used to add an auto-complete search field on the blog.

Sample Filtered searches:

id          data

----------- --------------------------------

1           sql,performance

2           profiler

3           table,performance,view

The Challenge

You have to find for each search the relevant posts. A relevant post is a post that was tagged with at least one of the keywords of the search.
You have also to provide a relevance index with the number of corresponding tags found on the post.
Here is the result table you should have to produce regarding the previous sample data.
Posts by Filtered searches with relevance:

ID data                    name     tags                 RELEVANCE

-- ----------------------- -------- -------------------- -----------

1  sql,performance         post 2   sql,performance      2

1  sql,performance         post 1   sql,profiler,table   1

2  profiler                post 1   sql,profiler,table   1

3  table,performance,view  post 4   view,table           2

3  table,performance,view  post 1   sql,profiler,table   1

3  table,performance,view  post 2   sql,performance      1

SOLUTION:

First create two tables filter and blog having the relevent values as given in the sample data

DECLARE @filter TABLE (id INT IDENTITY, data nvarchar(32))

insert into @filter (data) values ('sql,performance')
insert into @filter (data) values ('profiler')
insert into @filter (data) values ('table,performance,view')

DECLARE @blog TABLE (id INT IDENTITY, name nvarchar(20), tags nvarchar(20))

insert into @blog (name,tags) values ('post 1','sql,profiler,table')
insert into @blog (name,tags) values ('post 2','sql,performance')
insert into @blog (name,tags) values ('post 3','profilter')
insert into @blog (name,tags) values ('post 4','view,table');
---------------------------------------------------------------

Let us start with the filter table which contains comma separated data. Our first task would be to split the data by delimiter ','.
But again then this should be done with a single query. I have looked for many options to split the data but i found splitting using xml is the best solution.

This is done by the following code:
----------------------------------------------------------------------------------

select F1.id,
F1.data,
O.filterdata
from
(select *,
cast('<(X)>'+replace(F.data,',','<(/X)><(X)>')+'<(/X)>' as XML) as xmlfilter
from @filter F
)F1
cross apply
(
select fdata.D.value('.','varchar(50)') as filterdata
from f1.xmlfilter.nodes('X') as fdata(D)) O

------------------------------------------------------------------------------------
Once this is achieved next task would be cross joining the splitted data and the blog table tags to find a match of the splitted data in each and every tags of blog table.
This can be easily done using the CHARINDEX function as shown below:

------------------------------------------------------------------------------------
select F.id,
F.data,
B.name,
B.tags,
filterdata,
case when CHARINDEX(filterdata,B.tags) >0 then 1 else 0 end as relevance
from (
select F1.id,
F1.data,
O.filterdata
from
(
select *,
cast('<(X)>'+replace(F.data,',','<(/X)><(X)>')+'<(/X)>' as XML) as xmlfilter
from @filter F
)F1
cross apply
(
select fdata.D.value('.','varchar(50)') as filterdata
from f1.xmlfilter.nodes('X') as fdata(D)) O
) F
,@blog B

--------------------------------------------------------------------------------------------------------------------
Now, finally count the matches of the word in tags to find the relevence of a filter.
--------------------------------------------------------------------------------------------------------------------

select id,data,name,tags,COUNT(relevance) as Relevance
from
(
select F.id,
F.data,
B.name,
B.tags,
filterdata,
case when CHARINDEX(filterdata,B.tags) >0 then 1 else 0 end as relevance
from (
select F1.id,
F1.data,
O.filterdata
from
(
select *,
cast('<(X)>'+replace(F.data,',','<(/X)><(X)>')+'<(/X)>' as XML) as xmlfilter
from @filter F
)F1
cross apply
(
select fdata.D.value('.','varchar(50)') as filterdata
from f1.xmlfilter.nodes('X') as fdata(D)) O
) F
,@blog B
)F1
where relevance =1
group by F1.id,F1.data,F1.name,F1.tags
order by F1.id,F1.data,F1.name,F1.tags

-----------------------------------------------------------------------------------------------------------------

Hope everything is clear in the above solution. Please contact me on divya.ce@gmail.com if you have any queries or suggestions regarding the solution.

Note: Please remove the brackets '()' from '<(/X)>' and '<(X)>' tags before executing the code.

SQL SERVER TIPS

Monday, May 11, 2009

Articles on SQL Server

Thursday, May 7, 2009

Solution of T-SQL Challenge 5

TSQL Challenge 5

The context

The Challenge

Buy me a coffee

Followers

Blog Archive

About Me