I was reading a few articles on Google AI a few days back and I came across this TF-IDF being used as one of its ranking factors for a webpage. In this post, we’ll look at what TF-IDF means, does Google use it as a ranking factor and how best to make use of this if it does.
What is TF-IDF?
TF-IDF is a text analysis method that intends to reflect how important a particular word (or phrase or term) is to a document in a collection of words. The first part of its name, TF stands for Term Frequency, describing exactly how many times a term or a word is appearing in your text. The last part, IDF means Inverse Document Frequency which tells you how specific is that term with respect to your text, based on the number of appearances of the same term in other documents.
I know this is still a bit vague. Let’s just take an example, shall we?
Suppose you’ve finished writing an article about “cat food”. The term “cat food” is appearing 17 times in your 500 word long article. The most basic calculation will tell you that the term frequency of “cat food” in your article is: 17 divided by 500. This term frequency is then called the term frequency adjusted for document length. But obviously there is more to that. In order to get more reliable values for term frequency, you can go for a logarithmically scaled term frequency which adjusts for the skewness of the term towards larger occurrences. Or you can use an augmented frequency that prevents a bias towards longer documents.
Now that we know about term frequency, what is IDF? In calculating IDF, you’re stuck with logarithmic scales because otherwise you’ll not be able to justify the use of it in finding out how rare or common your term is in relation to a document. IDF tries to find out the relative importance of your term in that article you’ve written.
When both of them are taken together and plotted on a graph and compared with the TF-IDF of your competitors, you’ll be able to see how well or how bad your article is performing with your term, “cat food” included in it.
Does Google use TF-IDF?
Google is a modern-day illuminati to say the least (about guarding their knowledge base and all that). Nobody knows exactly how its algorithms work, guessing game is open for all. But TF-IDF is the most common weighting scheme for text-based recommender systems, as claimed by this research paper. So we can be somewhat safe in saying that Google uses TF-IDF or at least some improved version of it (maybe even better versions of TF-IDuF or TF-PDF?) as one of their ranking factors.
But to what extent? Google’s Webmaster (that term is slowly fading these days) and trends analyst, John Mueller in fact doesn’t mention TF-IDF very often. Google has to use something in order to find the relative importance of keywords in a given webpage in comparison to others. But we can not be certain that TF-IDF is the algorithm it relies on.
So what to do? : Last Words
There are still a lot of tools that can help you plot those TF-IDF graphs and compare them to your competition for a targeted keyword but (in my opinion) they can be skipped with convenience.
Why do I say so? Because Google is always evolving its algorithm, to better cope with the human side of content. Google AI is getting the search engine closer and closer to how a human user reads and interprets data by using neural networks and machine learning at the forefront of its operation. On the other hand, TF-IDF was created a long time before WWW (in 1972; WWW was born in 1989).
Besides, it’s not really that obscure a thing to understand the goal of TF-IDF. It just means to use your keyword or all the variants of it in a much more spread-out basis. With just some basic keyword research, you can find out the LSI (Latent Semantic Indexing; but basically meaning related) variants of your targeted keyword. Then you need to space them out within your content in a strategic manner so as not to cause any issues with readability.
That’s it then. Unless you want to be a technical SEO wizard, you won’t need to know about TF-IDF; and most certainly won’t need any of those tools that give you those data either. Focus on creating content, and the algorithm will take care of your rankings.