In: 未分类
2 十 2008网上搜索下tag的设计,虽然找到一个tag的设计描述,但也是跟我们正常的设计差不多
最后他认为最好tag设计要遵循三范式的设计基础设计
大概描述如下:
原文:http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html
最近, 在 del.icio.us 邮件列表, 我问了个问题 “有谁知道 del.icio.us的数据库设计?” .
我得到些私人的反馈,下面跟大家分享下:
问题: 你需要设计一个数据库作为一个书签(或者是一个博客或者其他的)需要给它们建立许多的标签, 之后,你想通过SQL语句获取标签的union(并集)或者intersection(交集), 你同时需要在短时间内然会标签查询结果.
显然 有三种的解决方法 (注意::如果你做一个网站允许用户自定义标签, 确信看过我在站点里面进行的 my performance tests测试报告.)
一个表的解决方法:
Query for “search+webservice+semweb”: //查询一段tagSELECT *
FROM `delicious`
WHERE tags LIKE "%search%"
AND tags LIKE "%webservice%"
AND tags LIKE "%semweb%"
Query for “search|webservice|semweb”:
SELECT *
FROM `delicious`
WHERE tags LIKE "%search%"
OR tags LIKE "%webservice%"
OR tags LIKE "%semweb%"
Query for “search+webservice-semweb”SELECT *
FROM `delicious`
WHERE tags LIKE "%search%"
AND tags LIKE "%webservice%"
AND tags NOT LIKE "%semweb%"
优点:
缺点:
VARCHAR). 否则你要用到,text的字段或者相似的, 但是查询的速度会很慢, I supposeLIKE "%search" will also find tags with “websearch”. If you alter the query to LIKE " %search% " you end up having a messy solution: You have to add a space to the beginning of the tags value to make this work.使用两个表,一个外键关联在一起

Query for “bookmark+webservice+semweb”:SELECT b.*
FROM scBookmarks b, scCategories c
WHERE c.bId = b.bId
AND (c.category IN ('bookmark', 'webservice', 'semweb'))
GROUP BY b.bId
HAVING COUNT( b.bId )=3
First, all bookmark-tag combinations are searched, where the tag is “bookmark”, “webservice” or “semweb” (c.category IN ('bookmark', 'webservice', 'semweb')), then just the bookmarks that have got all three tags searched for are taken into account (HAVING COUNT(b.bId)=3).
Query for “bookmark|webservice|semweb”:
Just leave out the HAVING clause and you have union:SELECT b.*
FROM scBookmarks b, scCategories c
WHERE c.bId = b.bId
AND (c.category IN ('bookmark', 'webservice', 'semweb'))
GROUP BY b.bId
Query for “bookmark+webservice-semweb”, that is: bookmark AND webservice AND NOT semweb.SELECT b. *
FROM scBookmarks b, scCategories c
WHERE b.bId = c.bId
AND (c.category IN ('bookmark', 'webservice'))
AND b.bId NOT
IN (SELECT b.bId FROM scBookmarks b, scCategories c WHERE b.bId = c.bId AND c.category = 'semweb')
GROUP BY b.bId
HAVING COUNT( b.bId ) =2
Leaving out the HAVING COUNT leads to the Query for “bookmark|webservice-semweb”.
Credits go to Rhomboid for helping me out with this query.
I guess the main advantage of this solution is that it is more normalized than the first solution, and that you can have unlimited number of tags per bookmark.


Toxi came up with a three-table structure. Via the table “tagmap” the bookmarks and the tags are n-to-m related. Each tag can be used together with different bookmarks and vice versa. This DB-schema is also used by wordpress.
The queries are quite the same as in the “scuttle” solution.
Query for “bookmark+webservice+semweb”SELECT b.*
FROM tagmap bt, bookmark b, tag t
WHERE bt.tag_id = t.tag_id
AND (t.name IN ('bookmark', 'webservice', 'semweb'))
AND b.id = bt.bookmark_id
GROUP BY b.id
HAVING COUNT( b.id )=3
Query for “bookmark|webservice|semweb”SELECT b.*
FROM tagmap bt, bookmark b, tag t
WHERE bt.tag_id = t.tag_id
AND (t.name IN ('bookmark', 'webservice', 'semweb'))
AND b.id = bt.bookmark_id
GROUP BY b.id
Query for “bookmark+webservice-semweb”, that is: bookmark AND webservice AND NOT semweb.
SELECT b. *
FROM bookmark b, tagmap bt, tag t
WHERE b.id = bt.bookmark_id
AND bt.tag_id = t.tag_id
AND (t.name IN ('Programming', 'Algorithms'))
AND b.id NOT IN (SELECT b.id FROM bookmark b, tagmap bt, tag t WHERE b.id = bt.bookmark_id
AND bt.tag_id = t.tag_id AND t.name = 'Python')
GROUP BY b.id
HAVING COUNT( b.id ) =2
Leaving out the HAVING COUNT leads to the Query for “bookmark|webservice-semweb”.
Credits go to Rhomboid for helping me out with this query.
The advantages of this solution:
Disadvantages:
If you want to have more complicated queries like (bookmarks OR bookmark) AND (webservice or WS) AND NOT (semweb or semanticweb) the queries tend to become very complicated. In these cases I suggest the following query/computation process:
SELECT b.id FROM tagmap bt, bookmark b, tag t WHERE bt.tag_id = t.tag_id AND b.id = bt.bookmark_id AND t.name = "semweb"In this way, you can also do queries like (del.icio.us|delicious)+(semweb|semantic_web)-search. This type of queries (that is: the brackets) cannot be done by using the denormalized “MySQLicious solution”.
This is the most flexible data structure and I guess it should scale pretty good (that is: if you do some caching).
Update May, 2006. This arcticle got quite some attention. I wasn’t really prepared for that! It seems people keep referring to it and even some new sites that allow tagging give credit to my articles. I think the real credit goes to the contributers of the different schemas: MySQLicious, scuttle, Toxi and to all the contributors of the comments (be sure to read them!)
P.S. Thanks to Toxi for sending me the queries for the three-table-schema, Benjamin Reitzammer for pointing me to a loughing meme article (a good reference for tag queries) and powerlinux for pointing me to scuttle.