Tag Archives: Eigen

ALGORITHMS USED BY SEARCH ENGINE – The Maths behind SERP & SEO.

One of the most challenging tasks for search engines like Google, Yahoo, Bing, etc is to rank-order websites in terms of relevance in search lists with the most relevant appearing on the top in what is termed as SERP (Search Engine Results Page). On the other side website owners would always like to have their websites appear on top of search lists. Here comes the understanding of the mechanism that enables search engines to rank-order websites in terms of relevance from millions and millions of websites. In general any search engine would do the following in the background:

1) Crawl the web for all web pages with public access.
2) Index relevant keywords or phrases.
3) Rate importance of each page and display in search lists according to the rating (like PageRank of Google)

The last step is fuelled by algorithms that use the concept of `Eigen vectors’ (a mathematical concept) in quantitatively determining the rating / relevance.

A web is a composition of web pages and each page is scored for relevance. Assigning a score to any given web page is that the page’s score is derived from the links made to that page from other web pages. The links to a given page are called the backlinks for that page. The web thus becomes a democracy where pages vote for the importance of other pages by linking to them. Let us assume a web with four pages with the following characteristic back links.

a) Page 1 links to page 3, 4 and 2
b) Page 2 links to 3 and 4
c) Page 3 links to 1
d) Page 4 links to 3 and 1

Hence the number of backlinks would be as follows:

I) page 1 would be 2 (from c and d ie from pages 3 and 4);
II) page 2 would be 1 (from a ie from page 1);
III)page 3 would be 3 (from a, b and d ie, from pages 1,2 and 4);
IV) page 4 would be 2 (from a and b ie, from pages 1 and 2).

Thus, page 3 would be the one with the top rank; pages 1 and 4 would tie for the next rank; and page 2 would be ranked last. But the drawback here is that it does not discriminate between a page with a backlink from an unimportant page and a page with a backlink from an important page. ie, a link to your homepage directly from Yahoo would increase your rank than a link from a site like http://www.uandascorpio.wordpress.com . This leads to a concept where score of a page is determined by sum of scores of pages that link to the page. In our example, score of page 3 would be the sum of scores of pages that link to page 3 which is sum of scores of page 1, page 2 and page 4. But this results in a fact that a web page does get extra influence simply by linking to lots of other pages. To avoid this, the following is applied. In our example, page 1 has links to three other pages namely 3,4 and 2. Hence the score of page 3 would be boosted by one-third of score of page 1, rather than the score of page 1. In this manner each web page gets a total of one vote, weighted by that web page’s score, that is evenly divided up among all of its outgoing links.

Let us list out the scores of the four web pages in our example.
i) Score of page 1 would be (score of page 3 divided by 1) + (score of page 4 divided by two)
Why is that so ? Page 1 has two back links one from page 3 and one from page 4 (Ref S.no. I); Page 3 has one link which is to page 1 and page 4 has two links which is to page 3 and page 1 (Ref S.no c and d) and thus divide score of page 3 by 1 and divide score of page 4 by 2 and sum these two components to get the score of page 1.
Similarly, the scores of other pages would be as follows:
ii) Score of page 2 would be score of page 1 divided by 3
iii) Score of page 3 would be (score of page 1 divided by 3) + (score of page 2 divided by 2) + (score of page 4 divided by 2)
iv) Score of page 4 would be (score of page 1 divided by 3) + (score of page 2 divided by 2)

Now comes the application of core mathematics of matrices and determinants. The above (i to iv) can be visualised in terms of a matrix named `A’. ie, scores of page 1, 2, 3 and 4 can be written in a matrix format with the coeffecients forming the elements of the matrix, as under:

0—- 0—-1—-1/2
1/3– 0—-0—-0
1/3– 1/2–0—-1/2
1/3– 1/2–0—-0

This is a square matrix meaning number of rows and columns are same, which is a condition for determining the eigen vector. For every matrix with an eigen vector there is an eigen value, and the product of eigen value and the eigen vector would give the matrix. I have not explained the methodology of determination of eigen value and eigen vector as it can be found in any basic mathematics book. The eigen vector of the above matrix would give a matrix with four columns and one row ( 4 X 1 matrix) representing score of page 1, page 2, page 3 and page 4, which is:

0.387
0.129
0.290
0.109

From the above, page 1 has a higher score followed by page 3, then by page 2 and finally page 4.
The point to note is that the page containing maximum backlinks (page 3) is not the one with maximum score. It might seem surprising that page 3, linked to by all other pages, is not the most important. To understand this, note that page 3 links only to page 1 and so casts its entire vote for page 1. This, with the vote of page 2, results in page 1 getting the highest importance score.