There is alot of discussion around how Twitter search can be improved including the following recent post on Mashable. The following then are my thoughts on how Twitter Search could evolve.
In some ways the evolution of Twitter is not dissimilar to that of the World Wide Web. That evolution firstly involved an ever increasing number of sites and then the tools and technology evolved and indeed continue to evolve today. Twitter has grown rapidly and tools are emerging. One of the key steps for the Web was providing a powerful search mechanism to access the wealth of information contained in those websites. A similar challenge is now facing Twitter – how to search effectively to find those nuggets of information and trusted sources. Google’s approach of PageRank for websites revolutionized searching for information across websites as the results returned were deemed to come from more trusted and reliable sources.
So the question is what is the equivalent of the Google PageRank for Twitter ? How does a user qualify as being a trusted and reliable source ?
A possible TwitterRank algorithm that could be used to index Twitter users could help in facilitating more powerful search could comprise of the following
- Parsing and extraction of high frequency keywords/tags (eg open source, CMS, CMIS) of recent posts by a user (ie last 200 posts) – this approach could use one of the many Information Retrieval algorithms and leverage stemming and synonyms
- Analysis of content in links could also contribute to keyword/tags for the user
- Frequency and age of posts
- Ratio of high scoring keywords to number of posts ( ie 1 in 4 posts contain high scoring keywords)
- Number of followers with similar high scoring keywords – potentially ratio of these followers to overall followers, though penalizing for having non-relevant followers might be unfair it could help combat the mass follower practise
- Content of bio and bio link also contributing to keyword score
This TwitterRank would then be used in sorting the search results for particular keywords. The goal would be that the people who score highly on the keywords that are searched for have their recent posts returned higher than others as the scoring would indicate they are more active and contribute alot on this topic as well as having a high number of equally relevant followers.