Latent Semantic Indexing And Google

14 Jun

Google has undergone a major algo change over the last few weeks. This change will have a significant on how well your sites do in Google.

Codenamed ‘Brandy’ by the SEO industry, this latest update is the biggest series of changes to happen to Google since Autumn 2003 Florida and Austin updates. Interestingly all three of these major updates are only really indicative of what Google are planning when examined together. Florida removed a lot of sites from Googles overall index. Austin out some back (people speculate that Florida went a bit further than even Google expected and Austin was an attempt to pull things back a little) and now Brandy seems to have reinstated almost all sites – we seem to have returned to a pre-Florida state.

However, there’s been a big change since Florida about the way Google now ranks sites and what it places importance on. Of course the very basic idea is the same – high quality, frequently updated content will always do well sooner or later – but the overall emphasis is changing.

It used to be that on-page factors were equally as important as off-page factors. For example, good use of Title and elements together with a keyword/phrase rich body would see you doing reasonably well. Particularly if you had lots of quality backlinks. Since Brandy it seems that these factors are not so important. Thats not to say they’re unimportant but Google is making a conscious effort to reduce the chances for SEO spamming to occur by lessening the impact these elements can have.

The big change for on-page optimisation factors is LSI (Latent Semantic Indexing). This is a logic system that Google have always been very keen on incorporating into their algo. What it does is basically study the theme of the page, then remove all associated keywords/phrases but look and reward words and phrases that share a semantic root.

A practical example: The site I run at work concentrates on Loans, Mortgages and Insurance. Traditionally I would use these keywords in the Title,

elements and repeat the keywords as much as possible in the body text. What the LSI system will do is ignore all these (or more likely lessen the impact they’ll have individually) but look for related words to support the theme of the page e.g. for a loans page it might look for mentions of holidays, new cars, furniture etc as these are things you might require a loan for and hence they are part of the theme of the page.

Latent semantic indexing adds an important step to the document indexing process. In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant. This simple method correlates surprisingly well with how a human being, looking at content, might classify a document collection. Although the LSI algorithm doesn’t understand anything about what the words mean, the patterns it notices can make it seem astonishingly intelligent.

LSI Definition.

Using this system makes it a lot more difficult for a SEO spammer to guess what semantically related words/phrases Google might assign most weight to in relation to the theme of the page.

Its also very likely that Google will be stepping up the importance of high-quality backlinks and will almost certainly use LSI techniques to judge the impact of these links. So for my Insurance pages it might be worth my while trying to get links on a holiday site (travel insurance related) or a dentists site (dental insurance related) or on a car sellers website (car insurance) as oppose to relying on big finance related directories.

This will probably have an impact on PR which may be used more ‘locally’ to distribute ranking around your site and reward each page due to the amount of themed pages that link directly (as oppose to through the home page) to it. As ever, text links will do better than image links.

You can read the whole of the article I linked to above at the Middlebury College site.

6 Responses to “Latent Semantic Indexing And Google”

  1. jaffry June 14, 2005 at 18:16 #

    were you like the nerd that everyone in class went to for all the answers?

    🙂

    nice article.

  2. Matt Robin June 14, 2005 at 20:37 #

    Kev: Good article. Brandy is all about LSI – and it seems this has caught a lot of web designers ‘on the hop’, so to speak, where their former focal points for SEO tactics are now looking a bit misplaced. I think it’s good for the web though as it is really bringing more relevance to search engine listings (something that they haven’t had for quite some time IMO). When I first read about LSI, I thought ‘great, about bloody time!’ I agree, I think quality back-linking will gain more emphasis in the algo in the near future – I’d state that to anyone who really wants there site to get good rankings!

    Here’s a question (not too serious): ‘Is LSI to SEO, what xhtml is to Web Standards?”

    Another aspect that will have greater urgency over the coming months and the second half of this year is local area-specific search patterns. MSN, Yahoo, and Google are already addressing this, but the significance between the local-searching and wide-area searches (and the linked-connection between the two index systems) will possibly push web professionals to give more attention to local-geo attributes too.

  3. Kev June 14, 2005 at 22:09 #

    Thanks Jaffrey. It has been said before ;o)

    “‘Is LSI to SEO, what xhtml is to Web Standards?””

    Well I can see where you’re coming from Matt but LSI is managed by Google/Yahoo/whoever whereas XHTML is managed by us. But I do think you’re right in that it will ‘force’ people to become more responsible regarding SEO techniques in the same way that strict XHTML has ‘forced’ us to become more responsible developers.

  4. Tom June 15, 2005 at 15:21 #

    Sounds allmost logical really but very very clever.

  5. Mike@TheWhippinpost July 7, 2005 at 02:39 #

    [QUOTE]Using this system makes it a lot more difficult for a SEO spammer to guess what semantically related words/phrases Google might assign most weight to in relation to the theme of the page.[/QUOTE]

    Well… where there’s a tilde 😉

    (Only just seen this – Get yo ass back to DDN, missin ya, cock!)

  6. Kev July 7, 2005 at 07:39 #

    Shhhhh ;o)

    I’ll be back soon – life’s been a tad hectic of late.

Comments are closed.