At the SMX trade show held in Paris in June 2013, Benoit Arson, Business Analytics consultant at AT Internet, presented tips on how to get around the loss of information due to an increase in the number of “not-provided” keywords and provided tips on how companies can improve their SEO despite the different changes introduced by Google over the last few months.
With Google’s ‘not-provided’ keywords, the completeness of the keywords are lost.
The keywords analysis, a very popular analysis which has been part of audience measurement tools ever since they were created, is becoming increasingly impacted by the changes introduced by Google. The diagram below shows the chronological order in which the changes took place.
Result of these changes: we know that the visit comes from Google but we don’t know what the keyword is.
Google searches in secure mode has had an effect on all sites at different levels depending on the browser used.
The study below, based on a total of 10,000 websites in France, shows the rate at which the number of lost keywords increased between November 2012 and April 2013. The steep increase between February and March corresponds to Google Chrome’s new version 25 being placed in secure mode.
Google searches in secure mode vary according to the browser used. The graph below, taken from an AT Internet study, published in December 2012 based on approximately 8,000 sites, shows that the share of “not provided” keywords is much greater for Firefox than for the other browsers.
Which web analytics method should be used in SEO to deal with “not provided” keywords? What can be done against this loss of information?
Despite the secure mode (https), website managers and web analysts can get around the loss of keywords and determine those used by visitors who land on their site by analysing the destination pages.
For example: I carry out a search in secure mode, in other words I am logged in to my account, and I want to search the word “pedometer”. I click on the link provided by Google which takes me to the Decathlon website and I land on the destination page which is a page containing a list of pedometers.
The manager of the Decathlon website knows that the visit comes from Google, with a “not provided” keyword. The website manager also knows that the entry page deals with pedometers.
The solution involves relying on the destination pages to guess what Internet users have searched. The “not provided” traffic will land on different entry pages: from these entry pages, the site manager will be able to extract the general theme, and from the theme deduce the keyword entered.
The first method which can be used to reduce the loss of information associated with not provided keywords is to segment on “not provided” traffic and then to analyse the entry pages.
This solution, however, remains incomplete as there is a level of detail missing. If the entry page deals with pedometers, we can deduce that the searched keyword is based on pedometer, but we do not know the exact keywords which were used in the search: “pedometer”, “cheap pedometer”, “decathlon pedometer”, “red pedometer”, etc.
Your brand keywords may be decreasing artificially.
Secure searches may also have an impact on a site’s brand awareness as the “not provided” keywords also affect searches containing the name of your brand: brand keywords.
‘Not provided’ keywords might contain your brand name.
With the rise in the number of secure searches including brand keywords, the line graph showing brand awareness traffic will steadily fall. These brand keywords are, however, absorbed by search engine traffic, whose line graph will rise: brand awareness will lose visits from the not provided keywords which contain the brand name. These visits will then be added to the search engine line graph.
How to continue measuring brand awareness traffic
The diagram below shows the solution that we recommend, developed by our Surveys department: Apply the share of visits from brand awareness to the keywords which are provided. Take the share of brand keywords from all of the provided keywords available, and apply this to traffic coming from not provided keywords. You will then obtain the estimated traffic of brand keywords which are part of the not provided keywords. You can then add this traffic to the brand awareness traffic that you already know (thanks to the provided keywords) for an estimate of the overall traffic coming from brand keywords!
With this solution we make the hypothesis that the share of brand keywords is the same for both the provided and “not provided” keywords!
For further information on how “not provided” keywords impact brand awareness traffic and the solution to be used to find a way around this loss of information, we recommend the following article “How to overcome the impact of Google’s “not provided” keywords on the brand awareness source”.
The impact of “not provided” keywords on the long tail
Google has shortened the Long Tail as a result of its “not provided” keywords.
As a reminder: in SEO the long tail corresponds to numerous keywords which, individually, generate very few searches but when their individual traffic has been combined they form the largest proportion of searches made on search engines.
The graphs below show a comparison of the long tail before and after the phenomenon of the “not provided” keywords:
Before the phenomenon of the “not provided” keywords we had, on the extreme left, the keywords which generated the most visits followed by the other keywords making up the long tail (with each keyword representing only a few visits, but when all of the keywords are added together they represent a substantial share of the traffic).
After the introduction of the “not provided” keywords, we can see on the diagram on the right, a keyword which has captured a large part of the search engine traffic (the famous [-] of the “not provided” keywords), then a much shorter long-tail curve.
This loss of information on the long tail is very damaging to site managers, SEO managers and web analysts as they no longer know if they are able to capture the long-tail traffic, which is a very important combination of very specific keywords searched by visitors who know exactly what they want, and who are very close to making a purchase on the site.
How to evaluate long-tail traffic
The solution to be used to overcome the loss of information due to the “not provided” keywords and to measure the long-tail traffic involves counting the number of entry pages containing Search Engine traffic. The more you count the different entry pages for search engine traffic, the more chances you will have at capturing the long-tail traffic.
In conclusion, note the contradiction in a time when we have more and more data available, and the loss of the completeness of data. We may have more data at our disposal but data which is vaguer than before: “Google, Firefox and Apple are taking us from big data to blur data.”