Over the last several years, Google has slowly reduced the amount of data available to SEO practitioners.
You can read more about this in Russ Jones s excellent article that details the impact of his company s research and insights into clickstream data for volume disambiguation.
Common Crawl data is an open source project that scrapes the entire internet at regular intervals.
In addition to Common Crawl data, there is a non-profit called Common Search whose mission is to create an alternative open source and transparent search engine — the opposite, in many respects, of Google.
This piqued my interest because it means that we all can play, tweak and mangle the signals to learn how search engines operate without the huge time investment of starting from ground zero.
Currently, Common Search uses the following data sources for calculating their search rankings This is taken directly from their website :