Hyperlink-Induced Topic Search (HITS)

0
Your rating: None

Hyperlink-Induced Topic Search (HITS) is a link analysis algorithm which helps in rating Web pages also known as Hubs and authorities and is developed by Jon Kleinberg.

It conclude two main values for a page:

1. Page authority, which estimates the value of the content of the page.
2. Page hub value, which estimates the value of its links to other pages.

First it retrieve the set of results to the search query so that the computation is performed only on this result set and not across all Web pages.

Authority and hub values are defined in terms of one another in a mutual recursion.An authority value is calculated as the sum of the scaled hub values that point to that page. A hub value is the sum of the scaled authority values of the pages it points to. Some implementations also consider the relevance of the linked pages.

The algorithm performs a series of iterations, each consisting of two basic steps:

Authority Update: Update every node's Authority score to be equal to the sum of the Hub Score's of every node that points to it. That is, a node is given a high authority score by being linked to by pages that are recognized as Hubs for information.
Hub Update: Update every node's Hub Score to be equal to the sum of the Authority Score's of every node that it points to. That is, a node is given a high hub score by linking to nodes that are considered to be authorities on the subject.

The Hub score and Authority score for a node are defined with the following algorithm:

* Start with every node having a hub score and authority score of 1.
* Run the Authority Update Rule
* Run the Hub Update Rule
* Normalize the values by dividing every Hub score by the sum of the squares of all Hub scores, and dividing each Authority score by the sum of the squares of all Authority scores.
* Repeat from the second step as necessary.

HITS focus on both authoritative pages and good hub pages, but Pagerank only focus on the authoritative pages.
HITS is query dependent. while querying, it will cost time to calculate the authoritative pages and hubs based on the query. So HITS may not be quite efficient. But the score of pagerank is static until new pages are added.
In addition to the efficiency mentioned above, HITS only consider thefirst group of random relevant pages, but the content relevance of later expanded pages are ignored. Besides, it likes to return more general pages than specific answers.

 #

i liked this article.
Can you please give an example based on matrix manipulation as i found it confusive at math.CORNELLs official website.
thanks for posting a nice article.

 

Post new comment

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Copy the characters (respecting upper/lower case) from the image.

Cumulus Tag Cloud

Kiran Says

SEO Architect,SEO Specialist reflect my values.

By enduring black hat methods the SEO industry is setting itself up for washout and sleepwalking into oblivion.

Jobs in India