Towards a Semantic, Multimodal, AI-powered Search Engine

In memory of Bill Slawski, the generous and tireless Indiana Jones of Google Patents. Your contributions to the SEO community and the understanding of Google’s algorithms will never be forgotten.

Feb 06, 2023 will always be remembered as the date that marked a significant shift in the world of information retrieval technology. The spread of ML technologies and the integration of chatGPT, a conversational agent developed by OpenAI, into Microsoft’s Bing search engine, as well as into its other products, from GitHub to PowerBi, and soon Office365 suite, through a multibillion-dollar partnership, forced Google to take action and respond to this technological revolution. In response, Sundar Pichai, CEO of Google and Alphabet, published an article entitled “An important next step on our AI journey” announcing Google’s commitment to adapting to this changing landscape and advancing their own AI capabilities.

This timeline infographic, updated over time, is a testament to the evolution of Google from a mere lexical search engine to a semantic search engine, capable of understanding the natural language of both user queries and web pages. It charts the journey of Google as it progressed from a simple match between keywords to a deeper understanding of the context and intent behind the user’s search, filtered by their search habits and preferences.

With the advent of ML technologies, the timeline now serves as a historical record of Google’s journey towards becoming a leader in the field of information retrieval and a symbol of the technological revolution that is transforming the way we access and interact with information.

An incarnation of the Semantic Web 

This transition is supported on the one hand by the adoption of some founding principles of Tim Berners-Lee’s vision known as the “Semantic web.” On the other hand, the increasing use of Machine Learning technologies.

If, as the late Bill Slawski, the Indiana Jones of Google’s patent, claims, the first “semantic” invention mentioned on a Google’s patent dates back to 1999 (read his article Google’s First Semantic Search Invention was Patented in 1999), it was in 2012 with the launch of Knowledge Graph (what we have indicated as one of the core principles of the Semantic Web) and later with the release of Rank Brain.

In this timeline, not only the major Core Updates will be included but also other core Google products based on ML and ranging from Translator to Google Photos, Lens, Google Assistant.

Finally, we’ll cover the release in the open-source world of those algorithms or frameworks for creating machine learning models that enable anyone who wants to exploit their potential.

We will also tell some suggestions from some new technologies announced to academia and the public but not yet implemented.

The end of the retrieval era and the beginning of the machine learning era

In 2015, Sundar Pichai, made strong statements about Google’s new course about the adoption of machine learning during the quarterly financial report on Alphabet, Google’s parent company.

“Machine learning is a core, transformative way by which we’re rethinking everything we’re doing,” he said.

“Our investments in machine learning and artificial intelligence are a priority for us,” “we’re thoughtfully applying it across all our products, be it search, ads, YouTube, or Play. “We’re in the early days, but you’ll systematically see us think about how we can apply machine learning to all these areas.”

Somehow these statements mark the end of an era, the era of the “retrievals” and of that genius Amit Singhal who managed to bring these technologies to the significant milestones ever reached.

“Singhal was an acolyte of Gerald Salton, a legendary computer scientist. His pioneering work in document retrieval inspired Singhal to help revise the grad-student code of Brin and Page into something that could scale in the modern web era” (as told by Weird in an article titled “How Google is Remaking Itself as a Machine Learning First Company“).

Taking the Search: some key figures

One of the key figures in this transition, also responsible for the internal training of Google engineers in ML, was David Pablo Cohn, ex-Tech Lead for Google Labs.

The ultimate turning point came when machine learning became an integral part of Search, forever transforming the SERP, its flagship and most profitable product.

To some extent, Search has always relied on artificial intelligence. However, for many years, the company’s most precious algorithms, those that supplied the “ten blue links” in response to a search query, were thought too critical for ML’s learning algorithms.

“Because search is such a large part of the company, ranking is very, very highly evolved, and there was much skepticism you could move the needle very much,” says Giannandrea.

At the beginning of 2014, “We had a series of discussions with the ranking team,” says Jeff Dean of the Brain Team. “We said we should at least try this and see. Is there any gain to be had?” The experiment his team had in mind turned out to be central to Search: how well a document in the ranking matches a query (as measured by whether the user clicks on it). “We sort of just said, let’s try to compute this extra score from the neural net and see if that’s a useful score.”

It turned out the answer was yes, and the system is now part of Search, known as RankBrain (went online in April 2015).

“It was significant to the company that we were successful in making search better with machine learning,” says Giannandrea, another key figure in Google’s early transition to a machine learning-first company.

Gianandrea was the founder of Metaweb and Freebase (acquired by Google), the first bricks on which the Google Knowledge Graph would be created, and strongly supported, until 2018 when he moved to Apple, the spread of machine learning. Machine learning that we could define as “programs that generate programs” (or algorithms that generate models made from data and rules to generate new data) caused a somewhat disruptive impact software engineers used to complete control of what they made machines do through their code: a revolutionary transformation of the mindset went through Google.

Open-sourcing technologies

It was precisely to support its engineers that Google’s Brain Team created TensorFlow, releasing it to the public in November 2015, and making even processors dedicated to running Machine Learning models, the Tensor Processing Units.

Search On 21

From then to now, the evolution has been exponential and the progressive introduction of MUM, announced at the Search On streaming event on 29 Sep 2021, marks a further shift towards multimodality.

During the event, Google unveiled a slew of new features that, taken together, represent the company’s most ambitious efforts yet to persuade users to do more than input a few phrases into a search box. The company intends to start a virtuous cycle using its new Multitask Unified Model (MUM) machine learning technology to deliver more depth and context-rich answers. In turn, users will ask – it hopes – more detailed and context-rich queries.

What Prabhakar Raghavan, Senior Vice President of Google, is seeing is a profound change not only in technology-driven by Google’s increasingly complex AI but also in ours search habits; and in the daily practices of us SEOs. The approach to content marketing and the investments of companies will also have to change. They will have to produce more and more quality multimedia content: part of a wider ecosystem able to cover extensively the topics on which companies want to establish their authority in the eyes of Google and their potential clients’.


We would like to sincerely thank some of the most relevant SEOs in the international community for contributing in various ways to the creation of this Timeline: Dawn Anderson, Jason Barnard, Kevin Indig, Cindy Krumm, Bill Slawsky, Andrea Volpini.

Together with them, we will accompany you in this journey that starts from the establishment of the Brain Team and follows the path of Google towards a Semantic, Multimodal and an AI-powered Search Engine.


How it all began…

Makoto Academy | 2011 -  Google Brain Team - where it all happens

Google Brain Team – where it all happens

Makoto Academy | 2011 -  Google Brain Team - where it all happens

The Google Brain team, created in the summer 2011 by Jeff Dean (the New Yorker dedicated a long article to him) and Stanford professor Andrew Ng, consists of researchers who develop artificial intelligence technologies used by Google products such as Google Translate, Google Photos, and Google Assistant. A lot of the research on the Google Brain Team is open source and available for everyone to look at. Some of their projects may be more advanced than those available to the general public.

A retrospective article about what the team has accomplished is published on the official blog every year.
The latest article is titled:
Google Research: Looking Back at 2020, and Forward to 2021

Some of the major projects of the Brain Team are:

The GNMT (the Google Neural Machine Translation project) was released in 2016 and deployed within a wide range of Google services like GMailBooksAndroid, and web search.

TensorFlow is an open source software library that allows anyone to utilize machine learning by providing the tools to train one’s own neural network. In Sep 2019 TensorFlow 2.0 became officially available.



Google Knowledge Graph – from Strings to Things

Makoto Academy | 2012 - Google Knowledge Graph - from Strings to Things

In September 2012, Amit Singhal, then SVP of the company, signed a seminal article, a true watershed published on the official Google Blog and titled “Introducing the Knowledge Graph: things, not strings.”

The second part of the title has a meaning as evocative as not immediately transparent.

The strings to which it refers are the old keywords, cross and delight of every SEO, and the things are nothing but the entities increasingly talked about in the SEO community. It must be said that Google uses the term “topic” more often, both in public communications and in its patents.

The KG represents how Google understands and stores all its factual information about the world and connects with content across the Internet. The goal of the KG is to transition Google “from being an information engine [to] a knowledge engine” (as stated in the official presentation video included in this infographic).

The technologies at the core of KG are certainly not a novelty introduced by Google and are. These technologies are the most widespread implementation of the vision and tech stack of the semantic web. They are also rooted in a concept dating back decades: the knowledge base.

Although used for the first time in the Netherlands in an academic context, the term knowledge graph was imposed to the general public and even to the academy itself, thanks to Google’s marketing.

Knowledge Base

A knowledge base is a type of graph database where nodes represent real-world objects such as people, places, things, events, organizations, and concepts, and edges represent relations between them.

A KG is a directed graph whose vertices are represented by RDF triples.

Each triple consists of three parts:


seen in another way:

entity-attribute-value, or entity-relationship-entity.

This last formulation emphasizes the interconnection between entities and how each of them can be part of the description (an attribute) of other entities, thus forming a dense network.

What differentiates a KG from a knowledge base is the presence of a “reasoning engine” capable of generating new knowledge. 

What’s in Google Knowledge Graph?

In 2010 Google acquired Metaweb, the company that created Freebase, the first brick of the knowledge graph. According to each topic/entity category, other fundamental sources of information are Wikipedia, wikidata, the CIA World fact, Linkedin, and much other public or licensed databases (even not present on the Internet), including dynamic information updated in real-time from news sources. 

In an official article from May 2020, Google states that its KG then contained over 500 billion facts about five billion entities.

Knowledge Graphs and SERPs: the Knowledge Panels

The information contained in the knowledge graph is presented in SERPs in cards, the so-called Knowledge Panels triggered by specific queries (want more? Read How is a Knowledge Panel for an entity triggered? by Bill Slawsky).

The usefulness of Knowledge Panels is, in Singhal’s words, to allow users to find the right thing (in case of ambiguous queries); get the best summary (about a topic/entity); go deeper and broader (through the suggestion of related topics).

Over the years, Knowledge Panels have been populated with dynamic information, such as the results of a sporting event. In some cases, they have become interactive too.

KPs share some discrete units of content, such as the name of the entity, its description (usually from Wikipedia or other authoritative human-curated sources), a set of attributes, and other multimedia content units (photos, videos, music) that change based on the category to which the entity belongs. There are several templates for presenting information related to the topic’s category. These templates consist of different content types and placeholders that define the position of the content items belonging to these types.

The importance of the Knowledge Graph

Beyond the information that returns to users, the KG also plays a decisive “backend” job, so to speak, related to its capacity of “inferential reasoning.”

One crucial function is to disambiguate queries; that’s where the Knowledge Graph comes in. It allows Google to go beyond keyword matching and return more relevant results by extracting entities to understand the search context. 

Query extraction is done by reading the on-page semantic metadata (the schema markup) and through Google’s Natural Language Processing (NLP) algorithms.

Google uses the extracted knowledge to train machine learning algorithms to predict missing relationships between entities. A confidence score is assigned to the various alternatives and is then used to answer natural language queries (e.g., posed to the Voice Assistant) using information present in the KG.

The entities extracted from the queries allow google to form “Augmented queries” and to present in SERP a set of results related to them.

The KG update is partly an automatic process. While crawling the Internet and indexing documents, Google extracts facts about the entities it finds in these documents. Facts correspond to predicate-object pairs (or attribute-value or relation-entity depending on the angle from which we look).

According to the many patents analyzed by the indefatigable Bill Slawsky, the KG would be able to self-update by answering self-generated queries starting from the missing information in a data set. 

Knowledge Graphs and SEO

As explicitly stated in the title of Singhal’s article, SEOs should no longer be focused on keywords and their correlates present in various elements of a page. Still, instead, Google goes on the hunt for facts about entities. Therefore, adding topic analysis at the core of the daily SEO job becomes indispensable. This analysis output is a topical map to be covered exhaustively at a single page or content-cluster level.

We could say that keyword research is to SEO as topic modeling is to Semantic SEO.

This doesn’t mean that keywords will disappear as they remain how topics are expressed.

Being present within the Knowledge Graph and having your own KP in the brand SERP gives you enormous visibility, both from desktop and mobile.

If you have a KP, it’s important to claim ownership of it to have more control (through feedback) in case of incorrect information.

User-specific Knowledge Graphs

A series of exciting patents highlights how Google could generate a series of mini KGs linked to each specific user. “This reminds me of personalized search but tells us that it is looking at more than just our search history – It includes data from sources such as emails that we might send or receive or posts that we might make to social networks. This knowledge graph may contain information about the social connections we have, but it also contains knowledge information about those connections. The patent tells us that personally identifiable information (including location information) will be protected, as well.” Bill Slawsky

Google Knowledge Graph volatility

SEOs associate the concept of “volatility” with SERPs (SERP volatility) and core updates, and you hardly ever hear about Knowledge Graph volatility. Kalicube Pro is the only SaaS platform to provide a sensor for tracking KG updates. Here’s what Jason Barnard, “The Brand SERP Guy,” shared with us on this topic:

“The Knowledge Graph is Google’s understanding of the world. As it moves “from strings to things,” the Knowledge Graph is the “things” Google is talking about. Just like the “traditional” core search algorithms, Google’s understanding of things is driven by algorithms and data, which are both regularly updated. Tracking those updates will become as important as tracking core algorithm updates. In a world where Google has truly moved from strings to things, these Knowledge Graph updates may prove to be even more significant.

Kalicube has been tracking Knowledge Graph volatility and updates since 2015. Up until late-2021, those updates were generally between 2 weeks and 2 months apart and never coincided with core algorithm updates or high SERP volatility (as measured by tools such as SEMrush and Rank Ranger). However, end-of-year 2021 saw a significant increase in Knowledge Graph update frequency (8 updates in a 6 week period) coupled with multiple Google core algorithm updates and massive SERP volatility. I believe that 17th November 2021 will prove to be a key date that we will come to see as massively significant in semantic SEO. That is the first time an announced core algorithm update coincided with a Knowledge Graph update in this way. Moving into 2022 and beyond, I am sure it will not be the last.”


Word2Vec – the power of Vectors

Makoto Academy | 2013 - Word2Vec - the power of Vectors

Word2Vec is a powerful technique for training neural networks to generate distributed vector representations of words. These vectors encapsulate syntactic and semantic similarities among words, enabling us to represent any word as a point in a high-dimensional space, where nearby points signify semantically related words. Models like Word2Vec are honed on vast text corpora and subsequently employed to represent words in novel contexts. They stem from the concept that words appearing together in a document are likely to share an underlying semantic connection.

Word embeddings, which were initially introduced by Bengio et al., provide a numerical representation of words and entire sentences. They were developed to address the so-called curse of dimensionality, a common issue in statistical language modeling. Bengio’s algorithm enabled a neural network to represent words by their semantic neighbors rather than solely by their position in a word sequence.

This groundbreaking algorithm comprises a simple three-layer architecture: an embedding layer, a hidden layer, and a softmax output layer. The embedding layer maps words into a low-dimensional vector representation, while the hidden layer conducts nonlinear transformations on the inputs to produce a set of intermediate features. Finally, the softmax output layer generates a probability distribution over the vocabulary.

In 2013, Mikolov et al. introduced Word2Vec, a model resembling Bengio’s original design, but with the hidden layer removed. A standard Word2Vec implementation consists of three stages: preprocessing, training, and inference. Preprocessing entails tokenizing the corpus, eliminating punctuation, lowercasing the tokens, and converting them into one-hot encodings. Training involves feeding the resulting matrix through the neural network, and inference consists of generating a new vector for each word in the vocabulary.

Word2Vec has become a widely adopted approach for learning word embeddings through shallow neural networks. Developed at Google in 2013 by Tomas Mikolov, Word2Vec marked a crucial milestone in the advancement of Natural Language Processing.

Word2Vec creates vector representations, or embeddings, of words. These embeddings represent each word in a sentence as a point in a high-dimensional space. While computers excel at processing numbers, they struggle with interpreting human languages. By converting words into vectors, we can effectively teach computers to comprehend our language.

Word2Vec employs a distributed representation of words, creating a vector with several hundred dimensions for each word. These words are represented by distributions of weights across these elements, and when comparing two words, the distances between their corresponding distributions are assessed.

Words situated close together in this space generally exhibit semantic similarity. The quality of the vectors hinges on various factors, such as the volume of training data needed to learn word meanings and the size of the vectors themselves.

Moreover, these word vectors can be utilized to derive word clusters from extensive datasets. In the context of SEO, word embeddings can play a critical role in analyzing and understanding textual data, allowing experts to optimize their content and strategies accordingly.

The power of word embedding for SEOs

It is indeed possible to create embeddings for all queries taken from Google Search Console (SC) and subsequently analyze them using visualization tools like Google’s Embedding Projector. This approach can provide valuable insights into user search behavior and intent, helping SEO experts optimize their content and strategies more effectively.

By generating embeddings for each query in Google SC, you can create a high-dimensional representation of the search terms that users employ to find your website or content. Once you have these embeddings, you can utilize Google’s Embedding Projector, an open-source tool for interactive visualization and analysis of high-dimensional data, to explore the relationships between these queries.

With Google Embedding Projector, you can apply techniques such as t-SNE, PCA, or UMAP to reduce the dimensionality of your embeddings while preserving their overall structure. This allows you to visualize the semantic similarities and relationships between queries in a more comprehensible way, revealing patterns and clusters that might not be apparent otherwise.

By identifying these patterns and clusters, you can gain a better understanding of the topics, keywords, and search intent driving users to your content. This knowledge can inform your SEO strategy, helping you create and optimize content that better aligns with user needs and interests.

Moreover, analyzing query embeddings can also help you uncover gaps in your content strategy and identify opportunities for improvement. For instance, you might discover a cluster of related queries for which you have little or no content, indicating an area where you could develop new, targeted content to capture additional organic search traffic.

In summary, creating embeddings for queries from Google SC and analyzing them with tools like Google Embedding Projector can provide valuable insights for SEO experts. By understanding the relationships between user queries, you can optimize your content strategy to better meet the needs of your audience and improve your website’s overall search performance.

Makoto Academy | 2013 - Hummingbird - a User-focused Algorithm

Hummingbird – a User-focused Algorithm

Makoto Academy | 2013 - Hummingbird - a User-focused Algorithm

In September 2013, Google introduced its new search algorithm, the Humming Bird. At their 15th anniversary special event, Google announced that this new algorithm was specially designed to handle complex queries and would allow search engines to interpret queries much like a human in a more nuanced manner. This marks the beginning of Google’s attempt to give more emphasis to natural language queries, considering the context than the individual words.   

The Hummingbird – A User-focused Algorithm 

Unlike the earlier algorithm updates that focused on making Google better at retrieving information, Hummingbird focuses on the user. The prime focus of this algorithm is to make Google understand the intent behind the user’s query and help them find the most relevant answers to their queries. 

An interesting tidbit – this update was named the Hummingbird update because, well, it is believed it made Google’s core algorithm more accurate and quick.

With this update, Google claimed it could better judge the search context and thus be able to ascertain accurately what the user really wanted to find out. The idea was to give results based more on the meaning of the whole query (semantic understanding) than the exact keyword match. 

So, let us assume you searched for “Which place can I buy iPhone 5s cheapest near me?”. Earlier search engines would try to return exact matches for the words, say “buy”, “iPhone 5s”, and so on. Thus, any webpage that had these words might appear in the results.

Hummingbird, on the other hand, works on the meaning of the query, the semantics. It understands place as a physical location and the iPhone 5s as a device. The knowledge of these semantics helps the search engine understand the intent of the query better, and thus the probability of more relevant results is higher.

How Does the Humming Bird Work?

When queried about the Hummingbird algorithm, Matt Cutts, an ex-software engineer at Google, described the update as a “complete re-write of the core algorithm”. 

This may sound phenomenal, and Hummingbird may seem like a brand-new algorithm. However, it’s not. By re-write, Cutts meant that, with Hummingbird, the core algorithm has been reviewed and rewritten to make it more effective and efficient in its job. 


“Hummingbird is a rewrite of the core search algorithm.

Just to do a better job of matching the users queries with documents, especially for natural language queries, you know the queries get longer, they have more words in them and sometimes those words matter and sometimes they don’t.”

  • Matt Cutts, 

Former Software Engineer, Google


Thus, Hummingbird was a paradigm shift in how search engines deciphered and understood long, conversational, natural language queries. To do this, the Hummingbird relies on a query expansion strategy that comprehends lengthy natural language queries similar to how people speak. From searching word-by-word and matching a webpage that has all the words in a search query, Hummingbird empowered search engines to ignore some words and understand the context and intent instead. 

But how exactly does Hummingbird achieve this? Let us explore!

The Google Patent on Query Revision

Google uses different signals to understand what exactly the user wants to know from their search query. The most significant among these is a history or pattern of what the earlier searchers had wanted.

Google’s patent on query revision techniques using highly-ranked queries seems to hold the key to how this is done. The patent explains how search engines use “a system and a method that uses session-based user data to more correctly capture a user’s potential information need based on analysis of strings of queries other users have formed in the past”. 

In simple terms, Google uses information on what previous searchers have searched, the results they clicked or hovered over, and other such information to determine what the current user really wanted to know and include those signals to identify pages that ranked for the query. 

Bill Slawski, Author of the “SEO by the Sea” blog, explained this Google patent in his blog. He decoded that the patent shows how a co-occurrence measure is used to determine search term-synonym pairs, depending on the frequency in which those terms appear together or tend to occur together in other related searches. Bill believed that while evaluating the searcher’s intent, Google could consider several synonyms from its synonym database that could match the query’s context. This database could also include other measures like the confidence level for a term to represent a synonym or substitute.

For eg: for a query like “what’s the best place to eat Chicago-style pizza?” the following image may describe how the query is evaluated. 

Google Hummingbird Patent

 Image source –

What’s the Impact of the Hummingbird Update?

In his interview, Matt Cuts from Google claimed that the Hummingbird update impacted close to 90% of the search queries (at 1:20:16), but not many SEOs could notice any change. 

This is because, according to Google, there is nothing special or new that SEOs need to focus on. Google’s guidance on creating original, high-quality, relevant and useful content remains unaltered. To reiterate, the Hummingbird merely enables Google to understand the searcher’s intent and search context to show more relevant and useful results. 

But the following might be a few areas to focus on:

  1. Use keyword research not to identify the keywords to load content with but to understand the search intent – what exactly the audience wants.
  2. Avoid traditional SEO writing, where you write a specified combination of words in a specific number of occurrences. Focus instead on adding more value to the content with related topics.
  3. Avoid keyword stuffing. Hummingbird update was focused on understanding what the user’s search intent was. This means web content should be optimized with the goal of enabling Google to understand what your site is all about. Use synonyms and related terms to increase the co-occurrence measure. 
  4. Create high-quality, useful content. It should be noted that the Hummingbird update was also Google’s means of sifting through unwanted content and discovering useful, relevant content. 

Final Thoughts

The Hummingbird update is considered a real breakthrough in Google’s attempts to better understand search queries. It is also a step in the right direction regarding how modern searches are made – more commonly known as conversational searches. Although the initial effect of the Hummingbird update was more subtle, this is bound to lead to a robust search experience – one where the user can query in their spoken language and yet find the most suitable results. 




Another important step towards an increasingly sophisticated Natural Language Understanding (NLU)

Taking a significant stride towards increasingly sophisticated Natural Language Understanding (NLU), the Seq2Seq model has become an essential tool for various tasks.

Originally developed by Google for machine translation, the Seq2Seq model has since been repurposed for numerous other applications, including summarization, conversational modeling, and image captioning. As stated on the Seq2Seq GitHub page, this framework can be used or extended for any problem that involves encoding input data in one format and decoding it into another.

The sequence-to-sequence model is a type of Recurrent Neural Network (RNN) architecture. It processes one sequence of tokens as input and generates another sequence of tokens as output. The model comprises two primary components: an encoder and a decoder.

The encoder processes the input sequence and summarizes the information in an internal state, commonly referred to as a context vector. The output from the encoder is discarded, and only the context vector is retained for further processing.

The purpose of the context vector is to encapsulate information about all input elements to enable the decoder to make accurate predictions. By effectively summarizing the input sequence, the context vector serves as a bridge between the encoder and decoder, ensuring a smooth transition of information and facilitating the generation of coherent output sequences.


RankBrain – Driving the Future of Searc


Introduced in October 2015, Google RankBrain is now the third-most significant ranking factor among Google Algorithms, as confirmed by Andrew Lippattsev, a senior search strategist at Google. 

What is RankBrain? 

In simple terms, RankBrain is essentially a component of Google’s core algorithm. 

According to Google, RankBrain helps the search engine “understand how words are related to concepts. It means we can better return relevant content even if it doesn’t contain all the exact words used in a search, by understanding the content is related to other words and concepts.”

It uses machine-learning technology to help Google deliver more accurate results for any given search. It does this by allowing Google:

  • To better interpret the searcher’s intent behind a search term.
  • To understand how words can be related to concepts. 
  • To return relevant content even if they don’t contain all of the words given in the search query. 

Thus, Google RankBrain empowers the search engine to move beyond just “strings” and focus on “things.”

Why did Google Introduce RankBrain?

Until RankBrain happened, much of Google’s algorithm was hand-coded. Google engineers would constantly be involved in testing an update and implementing the same if it worked. 

In fact, it is no exaggeration to claim that RankBrain revolutionized how search results were determined. As a machine learning system, RankBrain can now adjust its algorithm based on how satisfied users are with the search results it delivers.

How Does the RankBrain Work?

Before RankBrain, Google would interpret each of the words in the string and bring back “matching” content. Today, Google seems to understand what you are asking. Rather, RankBrain makes it easy for Google to try and actually figure out what the user wants to know. How? By helping Google match the given search term with terms, Google is already familiar with. 

Before we elaborate on how RankBrain does this magic, let us understand some associated terms.


As defined by Google, entities represent “A thing or concept that is singular, unique, well-defined, and distinguishable.”  Entities can be made of just single words or phrases. 

Dwell Time

It denotes the duration of time that a user spends on any webpage after clicking on its link from the search results and before returning to the SERP.

Using entity references in unstructured data

In a patent from 2013 titled “Question answering using entity references in unstructured data,” Google elaborated a method that it used to answer queries. For any given query:

  • Google submitted the entities for index
  • It then reviews these entities with their top 10 search queries
  • Then, it extrapolates the various entities that are expected to be related to each other or to the top results of the queries.


RankBrain is a machine learning system. This means it has the inherent ability to evaluate, track and adjust the algorithm on the basis of search success metrics it gleans. The algorithm determines how it weighs different signals, looks for specific patterns, and then develops rule blocks to manage any such query. 

Click-through Rate (CTR)

This denotes the percentage of users who click on a link in search results and immediately return to the SERP.

Bounce Rate

Bounce rate is the percentage of users who leave a website after checking a single page. 

Pogo Sticking

This is a search behavior where a user clicks on one result from the SERP and then immediately returns to click on a different result.

Now, let us return to how RankBrain uses these concepts to evaluate search queries.

There are three parts to how RankBrain functions:

  • Understanding and interpreting search terms (keywords).
  • Measuring searchers’ interaction with search results.
  • Tweaking and fine-tuning its algorithm for future searches

Understanding Queries

One of the primary techniques that RankBrain employs to understand queries is Entity Recognition. 

When RankBrain sees a query, it checks if it has the same entities they have seen before. There could be little variations in terms of other qualifiers or stop words, but RankBrain interprets these as identical entities. Now, this assessment gives the algorithm an indication that the results may also be identical, similar, or arrived from the same set of shortlisted URLs. 

Measuring User Interaction

Next to understand the user’s intent, RankBrain evaluates signals like click-through rates, dwell time, bounce rates, etc. 

Among the search results, if a website has a higher click-through rate, the algorithm understands that the site has relevant information for the search query entered. Similarly, if the dwell time is low or if the bounce rate is high, it signals to the algorithm that the user did not find helpful information on the site. Thus, the site will be excluded from future search results for the search query. 

Adjusting the Algorithm

RankBrain is a system that is consistently learning. Therefore, it uses the insights derived from the above metrics and refines its algorithm to reflect the learning. It identifies patterns in user behavior concerning search results and enables new rules to manage future queries of similar intent. 

In short, RankBrain can be perceived more as a pre-screening system for search queries. 

What Queries are Impacted by RankBrain

Originally, when RankBrain was introduced, it was meant to affect only 15% of all Google searches. Subsequently, when RankBrain started performing exceptionally well in delivering valuable results, Google’s confidence in this machine-learning system grew. Yet, RankBrain was not processing all search queries. It processed only those queries that Google had no clue about earlier. 

But in 2016, an article in WIRED claimed that “RankBrain does involve every query”.

Let’s recap a bit on what we know about RankBrain

  • RankBrain is a technology that impacts the ways Google returns results for a search query.
  • RankBrain enables accurate search results by understanding the search query as a whole, and the intent behind the search
  • RankBrain measures users’ interactions with the search results and uses these metrics to fine-tune its algorithm, thereby, its results. 

RankBrain does play an important role in search results. Experts suggest that it might actually impact the search rankings. The same article in WIRED also goes on to claim that RankBrain affects rankings “probably not in every query but in a lot of queries.”

Also, Google’s Gary Ilyes has this to say about RankBrain:

“RankBrain is a PR-sexy machine learning ranking component that uses historical search data to predict what would a user most likely click on for a previously unseen query.”

Final Thoughts

Google continues to optimize and fine-tune its algorithms to create a super-efficient search experience, and RankBrain is no different. But it does well for everyone to understand that RankBrain is all about comprehending the user’s search intent and creating relevant and useful content for them. 

Thus, as Gary Ilyes points out:


The Knowledge Vault


Knowledge Vault is an implementation of Google’s algorithm that underwent a rollercoaster-like introduction, full of thriller-worthy twists.
On August 25, 2014, on Search Engine Land, Greg Sterling reported what New Scientist had previously announced about the brand new Knowledge Vault project. To put it in a nutshell, he talked about “the largest store of knowledge in human history” and said that “Knowledge Vault autonomously gathers and merges information from across the web into a single base of facts about the world, and the people and objects in it.”
In an addendum to his article, however, Sterling himself broke the news that Google had corrected what he had previously published, claiming that Knowledge Vault had been misunderstood and that the link was to a research paper from May 2014 and not to a specific product currently under development.
The aforementioned paper introduced the concept of this kind of probabilistic foundation of knowledge, which combines data from previous analyses stored in databases with the latest information taken from the Internet.
In other words, in order to expand the amount of available information, we must find out new automated ways to build foundations of knowledge.
Until the recent past, information was mainly extracted from the text directly, but this kind of approach also led to collecting the annoying background noise, that is the useless, redundant or false parts of a text. Knowledge Vault was developed as a digital storage facility for knowledge where information is not only stored, but also assessed and analyzed in order to verify its correctness by using probabilistic methods and the neural network. In conclusion, this type of procedure can provide us with perfectly plausible generalizations from a semantic point of view but it requires constant updating.

Makoto Academy | 2015 - Tensor Flow

Tensor Flow

Makoto Academy | 2015 - Tensor Flow

TensorFlow: the free open-source library for machine learning and AI

TensorFlow is a versatile open-source software library for numerical computation using data flow graphs. It facilitates the construction of computational graphs, where nodes represent mathematical operations, and edges connect the nodes, enabling data to flow from one node to another. A typical TensorFlow program consists of two main components: one defining the computational graph with nodes signifying mathematical operations, and the other providing input data and computing outputs based on the specified graph. Nodes in a TensorFlow graph can embody various mathematical operations, such as matrix multiplication, linear regression, or convolution. The nodes are interconnected by “pipelines” that transmit data, while the graph edges denote the multidimensional data arrays (tensors) communicated between them.

The core concept behind TensorFlow is to allow structured execution of different operations (referred to as ops) on data, rather than relying on ad hoc code. A “graph” visually represents these operations and is implemented as a directed acyclic graph (DAG). This structure simplifies the reuse of existing ops and their combination into more intricate ones while enabling parallelism—an essential feature for ongoing artificial intelligence research, including machine learning. The flexible architecture permits the deployment of computation on one or more CPUs or GPUs across desktops, servers, or mobile devices through a single API.

Originally developed by Google engineers Fernando Pereira, Vincent Vanhoucke, Ian Goodfellow, and others at the Google Brain Team within Google’s Machine Intelligence research organization, TensorFlow was initially intended for internal use before its public release in November 2015. In 2017, it became an Apache open-source project. The developers later discovered that the system was versatile enough to be applicable across various other domains, including computer graphics, machine learning, and bioinformatics. TensorFlow can be utilized for automatic feature extraction and serves as a high-level tool for constructing scalable architectural models for machine learning applications.

The Google engine processes natural language explicitly, enabling easy interpretation and analysis. This can be achieved by providing appropriate tools and pre-processing data using NLP techniques, resulting in more accurate representations from words to vectors/features. Consequently, the combination of TensorFlow and the Google engine can be employed to develop a semantic search engine with multimodal input capabilities, encompassing Natural Language Processing and speech recognition.


Google NLP API

Makoto Academy | 2016 - Google NLP API

The Natural Language Application Programming Interface (API) uses state-of-the-art deep-learning technology to extract rich structured data from text. It enables programs to understand the meaning of texts in various formats (e.g., news articles, social media posts, etc.). The API is powered by Google and has been trained on a massive amount of documents and queries.

The Natural Language API is a set of RESTful web services that handle most common Natural Language Processing (NLP) tasks, including sentiment analysis. The API can be used to build apps that understand the meaning of what people write, and developers use it to incorporate natural language understanding (NLU) into their applications. It uses machine learning models and pre-trained statistical models to perform Part-of-speech analysis, breaking down sentences into their parts, such as nouns and verbs, with the implication that we can then reason about those parts meaningfully. Calls made to the API also detect and return the language if the caller does not specify a language in the initial request.

For example, given a sentence like “I love my new phone,” the pre-trained model implies that the “love” sentiment must be positive. Once we understand which words carry sentiment and which don’t, we can take action on that knowledge, such as translating the sentiment of a sentence into a sentiment score from 1-5 or into a binary True/False value.

What’s great about this API is that it’s based on years of work in natural language processing, and it’s available off the shelf without needing to train a model from scratch. This advantage makes it an excellent choice for companies who need to serve as many customers as possible but don’t have any way to customize the solution for each one.

The architecture of this service consists of two main components:

  • A natural language analyzer breaks down sentences into parts and assigns them scores based on how well they match each other (e.g., nouns versus verbs). This component also provides information about the entities within a sentence (e.g., person names).
  • An entity extraction model provides structured metadata about people, places, and things mentioned in the text, providing their Knowledge Graph URL, if the entity has an entry.

Google NLP API demo is not online anymore. You can test the API using The Entities Swiss Knife, a Streamlit powered Python app by Max Geraci and Israel Gaudette, fully devoted to Entity-linking and Semantic Publishing.


Transformer Architecture


“Transformers were introduced in 2017 by a team at Google Brain and are increasingly the model of choice for NLP problems, replacing RNN (Recurrent Neural Networks) models such as long short-term memory (LSTM). […] This led to the development of pre-trained systems such as BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer), which were trained with large language datasets, such as the Wikipedia Corpus and Common Crawl, and can be fine-tuned for specific tasks.” (Wikipedia).

In 2017, Google published a paper titled “Attention is All You Need.” This paper introduced a new neural network architecture called Transformer, and the context is that of machine translation of the text. As explained in the title, at the heart of this new architecture is the so-called Attention Mechanism and the new concept called multi-headed Attention.

“We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train” Ashish Vaswani, Noam Shazeer, Niki Parmar and the other authors write in the paper abstract.

This revolutionary milestone is disclosed through the Google official AI blog in an article entitled “Transformer: A Novel Neural Network Architecture for Language Understanding.” but let’s take a step back.

The first appearance of the Attention Mechanism

Dzmitry Bahdanau and colleagues first introduced the Attention Mechanism about neural networks for automatic text translation.

Before that, machine translation relied on RNNs/LSTMs (Recurrent Neural Networks/Long Short Term Memory) encoder-decoders.

Said simply, there are two recurrent neural networks. One is called the encoder, which reads the input sentence and attempts to summarize it. Then the decoder takes this summary as an input and outputs the translated version of the original sentence.

The main drawback of this method is that if the encoder makes a poor summary, the output will also be poor. The longer the sentence, the higher the probability of having a bad translation.

In that Bahdanau’s paper, the authors proposed “a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.”

The problem Bahdanau and colleagues were trying to solve is that, in previous years, words were represented as fixed-length vectors considering it a bottleneck in improving the performance of this basic encoder-decoder architecture,

Now, word embeddings allow us to represent words as continuous numbers, and these numbers give us new ways to understand how words relate to each other.

Words are not discrete but strongly related to each other. We can find correlations among them, and projecting words into a continuous Euclidian space allows us to find relationships between them.

In the paper published in 2017, Google researchers demonstrate that recurrent data processing is not necessary to achieve the benefits of RNNs with Attention. Transformer architectures do not require an RNN but instead process all tokens simultaneously and calculate attention weights between them in each layer. Tokens are used to break up words or sentences into smaller pieces, and these pieces are then fed as individual inputs into the network.

In recurrent networks, the network learns information about the past and the future. However, the Transformer Network uses self-attention mechanisms to focus on different parts of the input sequence.

The encoder maps the input sequence onto an n-dimensional vector.

The decoder then uses this vector as input to produce the output sequence.

Transformer looks at all the elements simultaneously but pays more Attention to the most crucial element in the sentence.

Transformers learn by example, and they tend to remember what they’ve seen before. This means that they can process sequences much faster than other methods. In addition, they’re better at understanding relationships between elements that are far away from each other.

Multi-headed Attention

A multi-head attention mechanism allows the model to focus on different parts of the input sentence at once. There are three reasons for doing this. First, if we were using a single head, we could not get any information about what was said before or after the current sentence. Second, because we want to use both the encoder and decoders’ outputs as input to the next layer, we need to know how much each of them contributed to the final output. Third, we need to ensure that the attention mechanism does not forget anything. We accomplish this by having multiple heads.

While predicting each word individually, we need to consider the previous words as well. So, instead of using attention mechanism on every single word independently, we use attention mechanism on the entire sequence of previously predicted tokens.

What Transformer models can do

The idea behind the Transformer is to create a model that can be trained using data and then used as a kind of ‘generator’ to generate new information.

Transformer models are designed to take any kind of sequential data and turn them into something else. For example, you could use a transformer model to analyze your Twitter feed and see if there were any patterns in how often you tweet about specific topics. You could also use a transformer model to create new content by analyzing your tweets. Or you might want to train a machine-learning algorithm to read DNA and then write computer programs using the information encoded in our genetic material.

Anomaly detection is often applied to security systems, but it could also monitor factory processes or other business activities. Transformer technology makes it easy to distinguish anomalies from normal behavior.

Makoto Academy | 2017 - Google Colaboratory

Google Colaboratory

Makoto Academy | 2017 - Google Colaboratory

A Google colaboratory notebook (or simply Google Colab) is a cloud computing service offered by Google Cloud Platform. It allows you to create interactive Jupyter Notebooks directly within your browser and execute it relying on google cloud resources (GPUs).

For now only in some countries, there is also a paid version, with more resources for the user, called Google Colab Pro

A Jupyter Notebook is a free and open-source software package for creating interactive notebooks. A notebook is a collection of cells containing code, output, and documentation. Cells may contain Markdown-formatted text, inline Python code, mathematical formulas, images, links, references, tables, plots, and many other types of content. Each cell contains a single execution context, allowing multiple cells to be executed simultaneously. The output of one cell is used in another cell when the former cell is run.

With Google Colaboratory, you can share these notebooks with others through the internet (for using it, save a copy onto your Google Drive). A Colaboratory notebook consists of a server-side part and a client-side part. The server-side component runs on the Google Colab platform and contains the code that will run when the notebook is executed. The client-side part runs inside your browser and displays the output of the code. The client-side part communicates with the server-side component via HTTP requests and responses.

Once again, Google is providing students and data scientists with innovative tools for Machine Learning applications.

These tools have a plethora of great applications for SEOs.


Topic Layer


In order to better understand user queries, Google needs to understand the search intent of a query very thoroughly. This is achieved by establishing the contextual meaning of the words in the query. For this reason, the search engine must necessarily study the set of existing relationships between events, people, topics, and places through its own Knowledge Graph. The system then identifies an entire set of different subtopics related to the main query and displays them among the search results.

This is exactly how Topic Layer, introduced in 2018, works.

Nick Fox, Vice President of Product & Design, has described it on Google’s own blog as “built by analyzing all the content that exists on the web for a given topic” and has added that it “develops hundreds and thousands of subtopics.”

Danny Sullivan, public liaison of Google Search, called it “a way of leveraging how the Knowledge Graph knows about people, places and things into topics.”

Since the search is no longer related only to the main keyword, but to the whole topic of the query, it was necessary to expand the semantic scope and develop an algorithm, such as Topic Layer. This technology is built around the analysis of contents associated with a given topic and the simultaneous development of hundreds and thousands of subtopics.

In other words, a set of semantically related concepts is produced in order to provide the user with the opportunity to move on to the next step and refine or expand their search.

As he introduced Topic Layer, Nick Fox brought special attention to this concept, stating that it is “engineered to deeply understand a topic space and how interests can develop over time as familiarity and expertise grow.”

Google has set aside the single phoneme search mode for quite some time now and it has evolved into a full-fledged semantic search engine. The use of Topic Layer is one extra step toward this direction. The algorithm aims at understanding how different secondary topics, pertinent to the main topic of the query, relate to each other. That is to say, its goal is to bring forth and highlight the content that the user wishes to explore next.

Barry Schwartz said on Twitter that Topic Layer “enables dynamic categorization of search results.” Dynamic organization means that the search results are not showed within predefined categories. On the contrary, the search engine shows them while highlighting the most relevant subtopics. Google adjusts the results by emphasizing mainly articles, images and videos that show greater relevance to the user’s query.

However, the last statement by Nick Fox is perhaps the most innovative and it paves the way for a brand new concept of web search and search results.

At the end of his own presentation, he says, “All of this enables experiences that make it easier than ever to explore your interests” and concludes, “even if you don’t have your next search in mind.” Therefore, it is no longer just a matter of being able to answer to the current user’s query as efficiently as possible, but to foresee their next, semantically related queries.



Makoto Academy | 2018 - Discover

Discover non-search content

For several years, the people at Google have been working hard in order to develop a search engine that could actually anticipate user queries and foresee the user’s next request. It was not by chance that in 2010 Eric Schmidt — Google’s former CEO — told Wall Street Journal:
“I actually think most people don’t want Google to answer their questions. They want Google to tell them what they should be doing next.”
Google Discover is the update that translates this idea into a working feature.
Back in 2018, Karen Corby, the company’s Group Product Manager, wrote a post on Google’s official blog, introducing the brand new Google Discover. She called it an update with a “new name, a fresh look, and a brand-new set of features” and added that “with this new name comes a fresh design that makes exploring your interests easier than ever” and that it aims “to help you uncover fresh and interesting content about things that matter to you.”
Extremely simple and clear words to introduce a Google Feed update that is entirely developed on a topic- and entity-based technology. This kind of groundbreaking technology aims at suggesting articles and news relevant to the user’s own interests. For the time being, this feature is only available on mobile phones running Android and iOS.
We live in a time when news, as well as all other kinds of content, should be customized, personalized, and tailored to each user’s needs and interests. Google has decided to do this, using analysis of all the searches made, over time, by the user and all their movements on the Web. The algorithm, using AI inferences, creates the flow of news to be presented to the user, based, therefore, on the sites visited and the queries made previously.

Since we are talking about news, you could start thinking that Discover is pretty similar to Google News; however, the two features are actually quite different. A large part of the content showed by Discover is, indeed, news but it does not stop there. YouTube videos and the typical cold contents, such as evergreen articles, which Karen Corby define as “articles and videos that aren’t new to the web, but are new to you,” also appear within the personalized feed. Since they address a particularly specific need, evergreen articles have a high level of importance, similar to the most up-to-date news.
In a September 2021 tweet, Barry Schwartz stressed just how well Google Discover succeeds in showing the users any article or different piece of information they might be interested in, saying “Google Discover showing me an article from a year ago; in case you missed it.”
The idea behind this latest update to Google’s algorithm is keep the user from missing anything they may be interested in reading. The search engine is no longer limited to an algorithm that provides generic answers — correct and relevant to the query, but still not tailored specifically to each user. The first step forward was to personalize answers; now, the search engine must find a way to anticipate queries before the user inputs them. As Karen Corby has written on Google official blog, “Discover is unique because it’s one step ahead: it helps you come across the things you haven’t even started looking for.”
While working on Discover’s technology, Google engineers have also tried to make it as user-friendly as possible. For this purpose, users have the chance to personalize it, by hiding content that they are not interested in, by following a topic and — in order to increasingly shift the focus toward multimodal communication — by sharing the aforementioned article or video with other users.
Also interesting is the way contents are showed to the user: not in a strict order, such as a chronological one, but in a way that highlights their respective usefulness and usability. News flow freely according to context and to the analysis of the entities.
As for any other feature added by Google to a section of its code, questions arise about the volume of traffic generated. Specifically, when asked why rather wide fluctuations often take place, John Mueller answered during Google Search Central’s February 2021 SEO hangout, “So that’s something where, if you do see a lot of visibility from Google Discover, I think that’s fantastic. I just would be careful and kind of realize that this is something that can change fairly quickly.” According to Mueller, then, the inconsistent Discover traffic has no specific cause — it is within Google Discover’s own nature.
However, this update, too, aims once again at transforming Google’s search engine into a semantic and multimodal engine feature.


BERT: the revolution

Makoto Academy | 2018 - BERT: the revolution

The Bidirectional Encoder Representations from Transformers (BERT) language model is an open-source tool introduced in a paper published by Google researchers in June 2018 for natural language processing (NLP) tasks. It is designed to help computers understand the meaning of words, phrases, and sentences by taking a larger context into account. It has been adopted as the standard model for many Natural Language Processing (NLP) tasks such as question answering, semantic parsing, machine translation, and language modeling. In one year after BERT language model was published, it was implemented in the Google search engine.

The idea behind the model is that when computers read text, they should be able to understand the context of the sentences they are reading — not just the sequence of words but also their meaning and the relationships between them. This concept is referred to as semantic understanding. The introduction of bidirectionality into previous language models, which only read text in one direction, allowed them to understand the context of sentences better and handle multi-step tasks.

BERT was created to improve upon previously existing models. Its design is based on Transformers, a deep learning model in which every output element is connected to every input element, and the weightings between them are dynamically calculated based on their connection. In NLP, this process is called Attention.

These types of transformers use attention to adjust the weightings between the input and output elements dynamically, as opposed to most other NLP models, which use a static configuration of connections between layers.

BERT biderectionality VS other NLP models

The architecture was designed to overcome a limitation of language models: most of them can only process forward-facing or backward-facing text asynchronously. BERT’s bidirectionality is enabled by introducing Transformers, which are neural modules that can transform inputs from one representation to another. In this way, Transformers enable BERT to read in both directions simultaneously, overcoming the limitation of forward-only language models and making them more robust and accurate than ever before. As a result, a language model can read in both directions at once for the first time, creating a more accurate result.

BERT was trained on two related but distinct NLP tasks. The first is known as Masked Language Modeling (MLM). In MLM training, a word in a sentence is hidden, and the program has to predict what word has been masked based on the context of the masked word. The second task BERT has been trained on is Next Sentence Prediction (NSP). In NSP training, two sentences are given. Then, the program predicts whether they have a logical, sequential connection or whether their relationship is simply random.

Makoto Academy | 2018 - Neural Matching

Neural Matching


Neural networks are mathematical models in which simple processing units are connected to each other, just like in a network. They learn and improve their accuracy through training.

Neural matching is an analysis method that processes information as the human brain does and, even though it was initially developed for image analysis, it has long since been adapted for use in many other fields.

Pandu Nayak, Google Fellow and Vice President, has stated, “Neural matching helps us understand the less defined representations of concepts in queries and pages and associate them with each other; an entire query or page is examined instead that the keywords alone, developing a better understanding of the basic concepts represented.” Neural Matching’s neural networks were integrated into Google’s algorithm in late 2018 in order to allow a clear comprehension of how words match the user’s searches.

The algorithm employs AI technologies and neural networks to link the words mentioned in the query to the search carried out by the web engine.

Danny Sullivan, Google’s public liaison for Search, posted many tweets to introduce it on September 24, 2018, as he described it as a system of “super synonyms”.

Even if the user inputs vague or unrecognizable words in their query because they are spelled incorrectly, the search engine is still able to output relevant results. In other words, Neural Matching manages to link each single word to the general concepts they are meant to express. It is indeed a complex kind of artificial intelligence, and it clearly shows how a search engine is ready to overcome the fixed and restrictive boundaries of keywords.

Specific words and/or phrases are no longer necessary to go on with the search; all it takes is to use synonyms. Semantic breadth becomes the foundation upon which the search engine can deliver increasingly relevant results.

The introduction of Neural Matching lead Google engineers to revise and improve the previous system, RankBrain, as Barry Schwartz explains.

This improvement has quickly become a topic of debate, but a tweet from @searchliaison dated March 21, 2019 clearly explains the difference between the two systems, by stating: “In summary:

– RankBrain helps Google better relate pages to concepts

– Neural matching helps Google better relate words to searches.

And there’s nothing special searchers or webmasters need to do. These are part of our core systems designed to naturally increase understanding.”

Neural Matching examines the details to find out how words relate to searches.

For instance, if we looked at the way 100 people write their searches, we would notice that only 20 or, at most, 30 of them use the exact same words to express their query. Everyone else would use quite different words, phrases, and sentences, often resembling wordplays and puns. Neural Matching was designed in order to better understand any hidden meanings and show better results that are more relevant to the initial query.


BERT: the Search algorithm update


BERT expands to Bi-directional Encoder Representations from Transformers. It is a technique based on Transformer neural networks architecture that is used to pre-train natural language processing models. 


“BERT is a natural language processing pre-training approach that can be used on a large body of text. It handles tasks such as entity recognition, part of speech tagging, and question-answering, among other natural language processes. BERT helps Google understand natural language text from the WEB.”

  • Bill Slawski – Search algorithm patent specialist. 


It is important to note that BERT is just an algorithm – a method or approach that can be used in any natural language processing application. With Google, BERT helps the search engine better understand the user’s intent when they search for specific information. 

Dissecting the BERT

The BERT algorithm is based on a machine learning framework built by Google in 2018.  The goal was to enable computer systems to decipher the meaning of any ambiguous text using the context surrounding it. Essentially, BERT improves natural language understanding by allowing systems to predict text that may appear before or after other text. BERT was pre-trained using a large plain text corpus. 

To better understand how the algorithm works, let us try and decode what the word BERT means:

B – Bi-directional

Conventionally, most language models were trained to read input text only sequentially – left to right or right to left. They weren’t equipped to read text both ways at the same time. Thus, they were uni-directional. BERT, however, is different. It’s the FIRST language modeling approach that’s deeply bi-directional and unsupervised. This means it can read the text in either direction. BERT can process the whole text on either side of the word, as well as contextually understand the sentence as a whole at once. 

E – Encoder

The encoder encodes the input text in a format the language model can understand. During the encoding process, every symbol representation of the input sequence is mapped to a sequence of continuous representation. The encoder is a stack of six identical layers. Each layer is composed of two sub-layers – a self-attention mechanism and a feed-forward network. The model then employs a residual connection around these sublayers to concatenate the results, which are further improved by applying layer normalization. The normalized results are then passed on to the next layer. 

Before we move on, let us first understand what the attention mechanism is all about.

Attention Mechanisms 

 Earlier NLPs used Recurrent Neural Networks (RNNs), where data is fed in a recurrent manner in the neural network. However, this caused the algorithm to slow down. Further, RNNs did not help when processing longer sentences. 

This issue was solved thanks to the paper, “Attention is all you need.” The paper recommended replacing RNNs with what was called the attention mechanism – an architecture that was able to process a sentence word by word and as a sentence as a whole in parallel. Here, each word in the input text is given as input to 2 different neural networks. The first network deciphers the meaning of the word, while the second network identifies the type of words the initial word can be related to. Thus, a correlation is created between a word, and the rest of the phrase in the sentence, thereby improving the contextual understanding. 

R – Representation

With BERT, natural language processing applications get the ability to handle a variety of downstream tasks. Application using BERT gets the flexibility of being able to represent input as either a single sentence or a pair of sentences unambiguously – for instance: Question & Answer pairs.

T – Transformer

BERT derives its architecture from the Transformer model. The original transformer model uses an encoder-decoder mechanism. But BERT essentially is just the encoder of the transformer. 

Pre-training BERT

While training any language model, a common challenge is to define how much to predict. Most models predict the next word in a sequence of words. However, this might be a uni-directional approach and limits contextual learning and understanding. To overcome this challenge, BERT was trained using two strategies – Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). 

Masked Language Modeling is used for bi-directionality. In every sentence, some of the words are masked. The model is trained to predict the masked words using the context provided by the unmasked words. This improves the contextual learning capabilities of the model.  

The Next Sentence Prediction task is used to understand the relationships between a pair of sentences. Assuming two different sentences X and Y, when the model encounters sentence X, it will have to predict if sentence Y naturally follows X. 

While MLM trains BERT to decipher the connections between words, NSP enables BERT to comprehend the dependencies between pairs of sentences. 

How Does BERT Help Search Queries

While announcing the BERT model, Google claimed that BERT would help the search engines understand at least 1 in 10 search queries

The model becomes particularly useful for queries that are more conversational and includes prepositions like “to” or “for,” which may carry a lot of contextual meaning to the query. This way, BERT allows users to search in a way that’s natural to them. 

Assume that the search query was “2019 brazil traveler to usa need visa”. In this particular query, the word “to” carries a specific meaning when understood in relation to the words surrounding it. Earlier, the search engine could not understand this relation and would throw results about US folks traveling to Brazil, while the requirement is the other way round. 

With BERT, Google understands the nuances of the query much better and deciphers that, though common, the word “to” carries additional weight here. Thus the results mostly match the search requirement.  

Google had cited other such examples in their rollout announcement. These demonstrate how the advancements in language processing enable Google to understand the user’s intent better when they enter some text query or give voice commands to their digital assistants. 

But, does Google need BERT to make sense of every type of search it encounters? No. It may not. Understandably, not all queries are conversational in nature. If there is a branded search or if the query uses much shorter phrases, BERT may not apply. 

What Else Does BERT Impact?

BERT also impacts featured snippets. This is Google’s attempt to improve searches and search results for people worldwide. A unique and powerful characteristic of BERT is that it can apply learnings it derived from one language to any other language seamlessly. Thus BERT is being used to improve featured snippets in at least 25 languages for which Google already lists them.

While BERT pertains only to Search, Google predicts that there may be some impact on queries searched through Google Assistants, especially when these queries trigger featured snippets as a result.

Why is the BERT Important for Google?

The length of the queries has increased over time. Rather than broken short phrases, long-tail searches have become commonplace. People post fully articulated sentences as queries and expect Google to understand their requirements. This is now possible, thanks to significant contributions from BERT. 

2020, Feb

T5 framework

Makoto Academy | 2020, Feb - T5 framework

The Text-to-Text Transfer Transformer (T5) framework, designed to convert language problems into a text-to-text format, is one of the most recent and impressive models in natural language processing. Developed by a team at Google Brain, it is a variation of the Transformer architecture which was introduced in 2017 as a way to make deep neural networks more efficient by using additional bidirectional layers. It has been shown that T5 can set the state of the art on many prominent text classification tasks – a well-known use case is Google’s search engine result pages, as well as question answering and summarization tasks.

The Transformer is a neural network architecture that uses weighted sums of word embeddings to represent words and phrases in a document. BERT (Bidirectional Encoder Representations from Transformers) extends the Transformer by adding a bidirectional encoder-decoder layer, using attention mechanisms to create context during encoding and decoding. T5, or Text-to-Text Transfer Transformer, is an extension of BERT that allows for the use of the same model across a diverse set of tasks.

The T5 framework follows a three-step process that consists of:

1) Extracting features from the source text using a set of tools and techniques, such as those currently used in the field of conversational systems;

2) Modeling these features using an existing statistical machine translation model trained on a large corpus of text; and

3) Fine-tuning the model to improve its accuracy on a specific domain, such as academic prose or conversational speech or dialogue, by re-training it on corpora specific to that domain.


In the T5 architecture, the encoder can be trained on one set of tasks and then retrained on another task. For example, we can train the encoder on translation and then retrain it on question answering. The main difference between T5 and BERT is that in T5, the encoder receives more information than just word representations; it also receives representations of character contexts. This helps improve performance on our tasks by generating more informative features during training. The causal decoder acts as a bridge between the generation and inference layers. There is also a loss function that encourages words with high information content to appear as output at test time.

The T5 network contains a large number of hidden layers, and uses dilated convolutions and maxout units. It is a “transfer learning” framework, which means that it has been trained on a mixture of unlabeled web text and existing datasets. This input data is used to fine-tune the T5 model individually for each task rather than training it end-to-end on a single task. The model architecture was inspired by the Transformer model, but with additional dilated convolutions and fewer attention layers. T5 is trained with a mixture of unlabeled web text from C4 collection.

2020, Nov
Makoto Academy | 2020, Nov - Subtopics ranking

Subtopics ranking


Back in January 2021, Barry Schwartz, officially announcing the launch of Subtopic Ranking by Google, described it as “Neural nets to understand subtopics around an interest, which helps deliver a greater diversity of content when you search for something broad.”

Schwartz also showed how this algorithm — actually launched in mid-November 2020 — works through an example: the results of the search for “home exercise equipment.” In this case, as the renowned search engine technologist says, we receive a great number of relevant subtopics, such as the cheapest equipment available or ideas for furnishing small spaces dedicated to fitness training. Subtopic Ranking, then, can show us a much wider range of research-related contents.

The thing that actually worried users was whether all this could also eventually lead to a change to the visual appearance of the search results in the SERP, with all its likely consequences. Danny Sullivan has provided the answer to this worrisome question through one of his tweets: “subtopics don’t change the look of search results, only broaden the diversity of content, when useful”.

Google Subtopic Ranking

Therefore, from a purely operational point of view, there has been no change, but the idea of Subtopics is definitely a game-changer. In addition to the contents closely related to the original query, the search results also show other contents, called secondary. Those secondary contents are the product of the neural network’s implementation: starting from the original query, the network has figured out the user’s current, and future, objective and showed relevant results. A chief role in this process is obviously played by artificial intelligence. The latter also uses semantic search to broaden the set of results.

The introduction of Subtopic Ranking is closely linked to Google’s main objective: to give an answer not only to the query but also to the general topics that fall within the user’s macro interest, even if they did not generate from his request. Thanks to this semantic and neural refinement of the algorithm, the user has the opportunity to find even more information — still related to the topic at hand —, even if he had not yet thought about searching for them.

In any case, the user still received useful and relevant answers, although he was not looking out for them. All this is certainly very interesting from the user’s point of view but, at the same time, it is also quite interesting for content creators. Even if a piece of written text is not the most relevant answer to the query, when the site that contains it shows topics relevant to the search, then it is evaluated as a subtopic and it shows up on the SERP. In other words, this system is not easy to control but provides significant possibilities to receive good positioning.

The Subtopic Ranking update definitely backs up with determination the famous Google EAT: Expertise, Authoritativeness and Trustworthiness. Being able to show the user high quality results and content in getting increasingly essential for search engines. In order to achieve this goal, each update of the algorithm aims at turning it into an innovative, semantic and multimodal search engine.

2021, Feb

Passage Ranking


In order to determine whether the content of a web page is indeed relevant to a user’s query, Google has always taken into account the whole content of the page.
Passage Ranking, released on February 10, 2021, is an algorithm designed to improve the relevance of the response, even in the case of appropriate pieces of information that are not the main topic of the text in terms of structure. This can happen, for example, in the event of long forms, i.e. lengthy web pages where the author discusses the topic at hand in a detailed and exhaustive way while they take into consideration more topics, as it happens in classic journalistic insights.
The experts at Google have realized that this type of text, of an informative nature, did not catch the due attention of the search engine, which only focused on the text as a whole. Such an article would actually confuse the algorithm to some extent.
In an effort to overcome this issue, Google implemented the Passage Ranking, i.e. literally the classification of the passages, that is, those sections of a text that contain elements with important content for the user and relevant to his query. These passages will create the semantic core that translates into the matching positioning on the SERP.
Martin Splitt, in charge of relations between the Mountain View company, developers and SEO specialists, has always described Passage Ranking as “an internal change” within the search engine and has made very clear that it will not be necessary to optimize the texts again. Splitt argues that it is essential for the contents to be as informative as possible and that the algorithm is only aimed at better understanding the semantic structure of the text in order to give the best possible answer to the user.
Essentially, it helps any good content creator who does not have an extensive knowledge of SEO techniques and strategies. In the past, these people had to give up a positive positioning; now, though, the usage of Passage Ranking to analyze long texts allows the identification of key passages, even when essential pieces of information are hidden through walls of text and the conventional semantic structures, which are the foundation of SEO optimization, are missing.
Furthermore, Splitt wanted to clarify that the main objective of Passage Ranking is lending a hand to all the content creators who write articles with informative content — despite not having a professional knowledge of SEO techniques. That is why they must have the opportunity to receive a good positioning on the SERP, whenever they are relevant to a query.
This update is, therefore, designed for content-rich websites and, in order to get the most out of it, it is necessary to integrate better structured and more comprehensive information within the pages; thus allowing the algorithm to recognize the value of the content and to give the website a more appropriate ranking.
Passage Ranking is a further step towards a semantic search engine, whose activity will focus on the semantic field of each word, while expanding it in order to understand the word’s relevance to the query at hand.

2021, Mar

Mobile-first indexing


Mobile-first indexing is an update of the indexing element of the Google algorithm. First introduced in April 2018, it is completely active starting from March 2021, even though its functions had already been announced in 2016. Since March 2021, then, Google uses only the content for the mobile version of the site for indexing purposes and the resulting positioning among the search results —after the necessary calibrations, of course.

Google’s announcement, on the day of its official presentation, was:

“To recap, our crawling, indexing, and ranking systems have typically used the desktop version of a page’s content, which may cause issues for mobile searchers when that version is vastly different from the mobile version. Mobile-first indexing means that we’ll use the mobile version of the page for indexing and ranking, to better help our – primarily mobile – users find what they’re looking for.”

The use of smartphones and tablets is constantly increasing and the percentage of people who search on mobile devices has largely exceeded the percentage of the users who turn to the desktop functionality of the computer version. Precisely because of this major shift in users’ preference from desktop to mobile devices, Google has started a process of transformation of its own algorithm that now gives top priority to the information showed on the mobile version of the websites.

Therefore, every time Google indexes a website or a blog, it ends up prioritizing its mobile version over the desktop one.

Google’s John Mueller wrote in a tweet on March 12, 2021: “My guess is mobile-first indexing has been ongoing for so many years now that it’s more like a” part of life “:)”, and added, “Most sites are moved over, so I don’t expect giant fluctuations “.

However, the key concept that he highlighted is first addressed in a 2019 video, when Mueller himself stated that: “So, first off, again mobile usability is completely separate from mobile-first indexing.

A site can or cannot be usable from a mobile point of view, but it can still contain all of the content that we need for mobile-first indexing.

An extreme example, if you take something like a PDF file, then on mobile that would be terrible to navigate.

The links will be hard to click, the text will be hard to read. But all of the text is still there, and we could perfectly index that with mobile-first indexing.

Mobile usability is not the same as mobile-first indexing.”

That means mobile-optimized layouts are not required for mobile-first indexing.

Bridget Randolph said on Moz “mobile first indexing is exactly what it sounds like.” Mobile-first indexing means that Google adds to its index the information and content it gets from the mobile version of the websites.

In addition, the word first means that, whenever a webpage does not have a mobile-friendly version, the algorithm will automatically turn to the desktop version and take into account the contents available there. However, every webpage — either the mobile or desktop version — is indexed by a smartphone crawler, with the natural consequences.

That said, Cindy Krum, CEO & Founder of MobileMoxie, has examined mobile-first indexing and, in her opinion, there is much more than meets the eye in this algorithm.

Cindy Krum argues that merely talking about a variation of user-agent and crawler would be an understatement and that we should actually talk about Entity-first indexing.

In order to describe an entity, we can think about the traditional keyword; this keyword does not have to be a word, but it can also be an image, a sound or a concept.

From a search engine’s point of view, on the other hand, an entity is a domain and, in most cases, there is a larger superior entity, such as an international brand, that groups together several smaller entities.

Entity-indexing will allow Google to gather sites within a single level, providing the most relevant response to the user’s request. Which is to say, the search engine can handle all the websites of a brand as a single larger entity and provide the individual users with the most appropriate answer for them, according to their current, or preferred, country and language.

Cindy Krum has also highlighted Google’s ever-growing focus on entities, rather than keyword linguistic matching. As a matter of fact, Google relies on country-specific ccTLDs for language switching.

Moreover, in John Mueller’s Reddit AMA, the attention was not placed on topics closely related to mobile devices. On the other hand, the discussion was mostly focused on hreflang, the traditional HTML attribute that tells the search engine which page to show according to the user’s language and geographic area.

All these remarks have led to the belief that mobile-first indexing has a much broader scope than the already historic change from desktop to mobile version indexing.

However, this conceptual change has already taken place and the domains have already been re-indexed. Since the bot only follows links that can be accessed on a smartphone, the mobile user-agent contents are the only ones that will be processed. The desktop-only content will not end up as “lost in translation”, even though the mobile crawler will not find it. This content will simply stay within the index, without a mobile-first rating.

As for Cindy Krum’s belief that mobile-first indexing is actually an entity-first indexing, Danny Sullivan gave his answer through an analogy. Google’s Public Liaison for Search has stated that mobile-first indexing is like removing old paper books from a library and replacing them with the e-book version of same volumes. For all practical purposes, this means that there are not two separate indexes — one for smartphones and one for desktops — but only one comprehensive index. This statement, however, is at odd with John Mueller’s claim that this is nothing but a change to the user-agent. If this were the only change, Cindy Krum wonders, and if only the user agent had actually changed, then how could the same crawler that has provided us with paper books until yesterday, now give us e-books?

2021, May

MUM (first announcement)


At the beginning of this year, Google I/O reminded its users that it was implementing a Multitask Unified Model (MUM) that would mark a new milestone in the way people search and access information. This comes after a series of experiments to ascertain MUM’s capabilities in helping Google transform its services and make its products and services more helpful to its customers. Google affirmed that MUM would enable new ways of doing things and change how people search and access information (Raghavan, 2021). As stated by Google during the launch of this new capability, MUM will not only be 1000 more powerful than BERT but will also have additional features. Its three main characteristics include multimodality and the ability to handle complex tasks and overcome language barriers.

New ways of Searching

Google MUM can allow individuals to combine texts with pictures to find the needed information. For instance, if one wants to search how to repair a broken window, they may take a photo of the part of the window that they want to be fixed, then use it with some text to search and reach for relevant information.

Google argues that MUM is an advanced AI technology that will help the organization redesign its Google Search. The newly developed search page is meant to make the search process as natural as possible. Google states that its new technology is redesigning its search engine to have unique features that will help refine and connect users with content they could not have found using older tools.

Pandu Nayak, Google Fellow and Vice President, Search, stated, “In the coming months, we’ll introduce a new way to search visually, with the ability to ask questions about what you see” (Nayak, 2021).

The company is also bringing new experiences to users by making a results page that displays multiple suggestions and recommendations that users can search to learn more about a particular topic. Users can scroll through numerous articles, videos, and images displayed within specific sections of the results page (Raghavan, 2021).

Google MUM

Prabhakar Raghavan, Senior Vice President of Google, also said that the company is using advanced AI systems to help users get more experiences when watching videos, like if they want to notice key moments within a film. MUM will make it easy to identify related topics within a video and even provide links to specific information to allow users to search and learn more about the identified topics. MUM is also designed to understand the information in a video, enabling it to find related topics and suggest them to the users. Thus, MUM is specifically designed to help users find more information by suggesting relevant web pages, images, and videos (Raghavan, 2021).

Language Barriers

Google Mum operates on the T5 text-to-text framework, and it is believed to be a more powerful tool than BERT because it does not only recognize language but can also generate it. Also, Google MUM is designed to understand 75 different languages and perform various tasks at ago, allowing it to have a more precise and elaborate understanding of information than the other older systems (Nayak, 2021).

Google MUM has also been developed to address the language barrier, a significant problem that can hinder people’s progress in accessing relevant information. Since the technology can understand different languages, it can fetch information across these boundaries and provide the user with relevant material based on the language they used to search. For instance, if information about a specific issue only exists in the French language, a person will likely not access it if they search using a different language. However, MUM comes with capabilities that enable users to access relevant information even when searching only in French. This is due to its ability to integrate and understand similar information from different languages and display results per search terms or search language (Nayak, 2021).


Google MUM also offers a multi-modal system that can understand varied types of information, including images and text, with the possibility of additional features that can cover other forms like audio and video. Since MUM has a deeper understanding of the world than the previous technology, it can offer relevant insights based on an individual’s search terms and criteria. It can fetch various information on a given topic and direct people to related articles, images, and videos available on the internet. MUM can understand any form of information used in the search form. For instance, if a person takes a photo of a given object and uses it to look for related information, MUM will use the image to find related content and display it to the reader. As stated by Nayak,

“you might be able to take a photo of your hiking boots and ask, can I use these to hike Mt. Fuji?” (Nayak, 2021).

MUM will understand this query by connecting the image and the search terms to connect the user to a blog or webpage with related information.

Technical characteristics

With MUM, Google seeks to replace its old system dependent on the retrieve-then-rank procedure. MUM brings a unified system that does the retrieval and ranking within a given component. MUM is also believed to be using model units such as Transformers, GRUs, and attentions, though Google does not usually reveal the technical aspects of its new systems, making it difficult to understand how the whole model functions.

Although Google does not publish direct information about the technical capabilities of MUM, this information can be extracted from its published research articles. For example, one of its research papers, HyperGrid Transformers: Towards A Single Model for Multiple Tasks, explains a new multi-task learning system that may have been incorporated in MUM. However, it does not mean that MUM is only built on the two technologies described in the article, as its functionality points to other more advanced algorithms. However, the main thing to note is that MUM is an advanced AI system. It has also not been ascertained whether MUM is based on MoSE technology, as it can be part of it or have nothing to do with it (Montti, SEJ, 2021).

Google MUM

2021. Oct

Pathways: A next-generation AI architecture


In October 2021, Google announced a new artificial intelligence architecture called Google Pathways, which is meant to improve the way machine learning performs tasks that would typically involve multiple models. The main goal of the new architecture is to handle many different tasks at once and learn new tasks quickly.

“Today’s AI models are typically trained to do only one thing. Pathways will enable us to train a single model to do thousands or millions of things” says Jeff Dean, senior fellow and SVP of Google Research and Google Health, on the Google official blog. There are several challenges with the existing approach to solving real-world issues with AI, and Google is developing the pathways architecture to solve them. By building a model that’s capable of tackling multiple, diverse tasks, Google might have found a way to create artificial intelligence that doesn’t get confused when confronted with something it hasn’t seen before.

“We want a [single] model to have different capabilities that can be called upon as needed, and stitched together to perform new, more complex tasks – a bit closer to the way the mammalian brain generalizes across tasks.” This means that the technology isn’t just limited to specific tasks but can do many different things well. So, for example, a self-driving car could recognize objects in its path using one pathway, then predict where those objects would be after one second using another pathway, and finally plan how it should avoid those objects using yet another pathway without having to run three separate models for each task.

In the Pathways architecture, activation flows through a single model in a highly sparse manner, with only small pathways called into action as needed. In fact, the model dynamically learns which parts of the network are good at which tasks — it learns how to route tasks through the most relevant parts of the model.

The human brain is a complex network of neurons and synapses that allows us to think, feel, and process the information we take in on a daily basis. Yet, despite this complexity, our brains use only a small amount of its processing power to complete tasks, using the specialized parts and not the entire brain network. Google’s new Pathways AI will similarly accomplish tasks, making it more energy-efficient, learning more, and doing it all faster than older models.

There are several other benefits as well: since only a small part of the network is needed for each task, it will be easier for the system to learn how to handle new problems. It will also process much faster than older systems. This means Google could eventually use this technology in its search algorithm to make it much more efficient, amongst several other possible use cases.

The Pathways architecture is presented in more detail on the paper “Pathways: asynchronous Distributed Dataflow for ML“, published on March 2022.

2022, Apr
Makoto Academy | 2022, Apr - PaLM, The Pathways Learning Model 

PaLM, The Pathways Learning Model 


In 2022 announced its Pathways model and followed it up with its vision to 

“Enable a single AI system to generalize across thousands or millions of tasks, to understand different types of data, and to do so with remarkable efficiency.” 

An interesting breakthrough towards realizing this vision is the release of PaLM – the Pathways Learning Model. PaLM, in the words of Google, is a massive 540-billion parameter, densely activated, decoder-only Transformer model, trained on the Pathways system. 

In their official AI blog, Google observes that their PaLM 540B model 

  • Is highly scalable
  • Demonstrates supreme efficiency over other large language learning models (LLM)
  • Has achieved state-of-the-art (SOTA) few-shot learning capabilities on several tasks 
  • Is capable of handling English and multilingual data sets

Training the Model

The PaLM learning model was trained using the Pathways system – a new ML system that can enable training a single model across tens and thousands of accelerator chips (TPU v4 Pods). Pathways was announced on the Google AI blog in October 2021 and the Pathways System used for training the model was explained on an academic paper entitled Pathways: Asynchronous Distributed Dataflow for ML, by Paul Barham, Aakanksha Chowdhery, Jeff Dean et al.

Google researchers believe this is a breakthrough capability since most of the previous LLMs have been trained either on a single TPU circuit or several TPU v3 Pods achieving max scalability of 4096 TPU v3 chips.

Let us learn more about the various aspects that went into the training of the model.

The Training Dataset

PaLM 540B was trained on superior quality data sets comprising at least 780 billion tokens spread across various NL (natural language) use cases. The datasets had both English and multi-lingual components and included Wikipedia, news articles, web pages, books, code from repositories such as GitHub, social media conversations, and reactions, among others. 

The team that worked on the model also created a “lossless” and reversible vocabulary. This means the system preserves whitespaces and breaks down the Unicode characters into UTF-8 bytes. It’s an important requirement when handling code. Numbers are split into individual digits. For Eg: 1345.3 -> 1 3 4 5 . 3. The vocabulary was generated using the training data sets, which improved the system’s efficiency.

 The Evaluation

The Google team believes that PaLM 540B exhibits breakthrough abilities in handling extremely complex tasks. The model was evaluated in at least three main areas:

  1. Language understanding & generation
  2. Complex reasoning
  3. Code-related tasks

The model was evaluated on 29 commonly used English NLP (natural language processing) tasks in 1-shot and few-shot settings. 

Language Understanding & Generation

The evaluation consisted of several complex tasks, including:

  • Cloze Tests
  • Winograd-style tasks
  • Sentence completion 
  • In-context reading comprehension samples
  • Common sense reasoning problems
  • Natural language inference exercises
  • SuperGlue tasks

It was found that the PaLM 540B outperformed earlier LLMs with superior few-shot performances. The model outperformed previous SOTA outcomes in 24 of 29 tasks in 1 shot setting and 28 of 29 tasks in a few-shot setting. 

Google PALM performances


The model also outshined the average human performance on the more recent BIG (Beyond the Imitation Game) benchmark. PaLM could understand and respond to several BIG-bench tasks such as distinguishing cause and effect or identifying a movie name from a set of emojis. 

Common-sense Reasoning

The model has also been trained on a technique called “Chain of thought prompting”. This allows the model to solve multi-step reasoning problems such as arithmetic or common sense reasoning.

Multi-lingual tasks

While the performance of the PaLM 540B on English tasks was indeed outstanding, the model also exhibited a breakthrough performance on multi-lingual NLP standards and benchmarks that included language translation. 

Coding Tasks

It has already been well-established that LLMs are quite effective for solving coding tasks, which include code completion and synthesizing programs from NL instructions. PaLM 540B has demonstrated efficient performance in most of the coding tasks. This is despite having only a 5% code sample in the pre-training data sets. Google researchers witnessed a further increase in performance by fine-tuning the model on a Python-only data set known as PaLM-Coder. For instance, in a code fix task that requires fixing a C program until it compiles successfully, PaLM 540B achieved a compile rate of 82%. 

Google team is optimistic that this can throw open several opportunities for fixing complex problems in software development.

Ethical Considerations

The AI blog has highlighted the ethical considerations and potential harm that training large language learning models can pose. Their paper details datasheets, model cards, and other reported biases in training the model. Google has also noted that domain-specific analysis would be required to completely calibrate the risks and mitigate potential threats. It will be a matter of ongoing research before the research team can establish safety standards and guidelines to prevent using the models with malicious intent. 

The Future

The Google research has firmly established the scalability of the Pathways system. The breakthrough capabilities achieved by PaLM 540B seem to have given hope to the research team to derive even better models by blending the scalability with unique training schemes based on unique architectural frameworks.

2022, May
Makoto Academy | 2022, May - LaMDA 2


Makoto Academy | 2022, May - LaMDA 2

In May 2022, at its annual developer conference, Google announced the release of LaMDA 2, the second version of the Language Model for Dialogue Applications, after the engineers had already described its new features on the Google AI blog on January 21.
LaMDA, the conversational AI system, is built on the open source architecture of the Transformer neural network.
The system has undergone a long and complex training, by using 1.56 trillion words taken from public consultation documents available on the Internet, in order to be able to have “natural, sensible and specific” conversations.
This new version of the linguistic model fully meets such requirements, so much so that Blake Lemoine, an engineer at Google who has worked in the field of artificial intelligence for a long time, went so far as to claim that it was a sentient system. The company has eventually fired Lemoine as the executives believed their employee had suffered some kind of mental conditioning. Although it is certainly not sentient, LaMDA has proven capable of engaging in complex dialogs.
During the presentation of Google LaMDA, CEO Sundar Pichai discussed two of its most innovative creative functions, or ‘experiences’: ‘Imagine It’ and ‘Talk About It’. The first feature will allow the user to submit a creative idea as the artificial intelligence takes it and generates imaginative and relevant descriptions while asking follow-up questions.
As an example, Google Lambda 2 answered the prompt “Imagine I’m at the deepest part of the ocean” with a specific location: “You’re in the Mariana Trench”. LaMDA also generated relevant questions on-the-fly about the location, such as what kind of creatures live there or what it smells like. The topics at hand were not hand-programmed: the AI has actually synthesized them from its training data.
Google LaMDA’s second feature is its ability to stay on topic. When the user asks questions about a specific subject matter, the AI is able to stay on-topic and to keep talking about the matter at hand, even if the user inputs unrelated follow-up questions.
Senior Director Josh Woodward introduced and demonstrated live the third functionality of the system, ‘List It’. Starting from a complex topic, LaMDA is able to break it down into simpler subtasks; in other words, it takes into account related concepts and analyzes them, eventually generating tips and offering easy solutions to complex problems.
Google’s CEO has also reminded everyone that this technology still needs to be refined and it still has a long way to go before people can use it for their daily tasks.
Therefore, these functions needs to be improved and that is why having feedback is essential to make significant progress. With this in mind, Pichai also introduced an app, only available internally right now, called AI Test Kitchen, which allows the users to give their feedback and share their opinion. The LaMDA 2 presentation ended with Sundar Pichai’s statement about Google’s objective: making sure their AI achieves socially useful development while being safe and responsible.
As the developers have stated on the Google AI blog, language models are becoming more and more effective and, amongst them, the open-domain dialog, a system able to talk about any topic, is the biggest challenge ahead. In the years to come, even more so than now, Google will become a multimodal search engine.

2023, Feb

Google Bard – the AI revolution


in February 2023 Google has launched Bard, its response to OpenAI’s ChatGPT. Bard, powered by LaMDA, is a generative AI chatbot designed to perform text-based tasks such as providing answers, summarizing information, and creating various forms of content. Despite a rocky start due to a factual error in the demo, Bard’s potential lies in its ability to ground itself factually with external knowledge sources and its evaluation based on sensibleness, specificity, and interestingness. Bard is not intended to replace Google Search, but to serve as a feature within it, providing an interactive method for exploring topics.

Google Bard: A Response to ChatGPT

Google’s Bard is a generative AI chatbot powered by LaMDA, Google’s large language model. It’s designed to perform a variety of text-based tasks, including providing answers, summarizing information, and creating content. Bard also assists users in exploring topics by summarizing information found on the internet and providing links for further exploration. Despite its potential, Bard’s launch was not without controversy. A factual error in the demo led to a significant loss in Google’s market value, reflecting a loss of confidence in Google’s ability to navigate the looming era of AI. However, Bard is not an algorithm itself, but a product powered by LaMDA, and its name is purely marketing-driven.

The Functioning of Google Bard

Bard is powered by a “lightweight” version of LaMDA, which is trained on datasets consisting of public dialogue and web data. Two key factors contribute to its functioning: safety and groundedness. The model achieves safety by tuning it with data annotated by crowd workers. Groundedness is achieved by enabling the model to consult external knowledge sources, such as an information retrieval system, a language translator, and a calculator. Google used three metrics to evaluate LaMDA’s outputs: sensibleness, specificity, and interestingness. All three metrics were judged by crowdsourced raters, and that data was fed back into the machine to keep improving it. Read more about Google BARD here.

Contrary to initial perceptions, Bard is not intended to replace Google Search, but rather to serve as a feature within it. Google’s February 2023 announcement of Bard stated that Google will integrate AI features into Search, distilling complex information into easy-to-digest formats. Bard is not search, but an interactive method for accessing knowledge about topics. It draws on information from the web to provide fresh, high-quality responses, making it a useful tool for users seeking deeper insights and understanding.

Future Directions and Access to Google Bard

Google is currently accepting new users to test Bard, which is still labeled as experimental. The future of Bard and similar AI technologies is a topic of ongoing research and development. Google’s research into attributed large language models, which can cite the sources for their information, and technologies for editing responses for factuality, are examples of the directions being explored. Understanding Bard and its capabilities is crucial for anyone involved in SEO or online publishing, as it provides insights into the future of AI and its potential impact on these fields.

April, 2023

Google Generative Search Experience


Google’s Search Generative Experience (SGE) is a groundbreaking innovation that is reshaping the way we interact with search engines. By leveraging the power of generative AI, SGE provides more comprehensive and nuanced responses to user queries. This new feature, which was announced at the Google I/O event in May 2023, is currently available to users who sign up for the Search Labs waitlist.

The Mechanics of SGE

SGE works by generating a unique response to a user’s query, synthesizing information from multiple sources. This is a departure from traditional search results that simply list relevant web pages. Upon entering a query, users are presented with AI conversation prompts below the search bar. These prompts guide users through a more interactive and conversational search experience.

As Kevin Indig, a renowned SEO expert, noted in his article, “Google generates AI answers by grounding LLMs (large language models) in search results with a process called Retrieval augmented generation (RAG).” This process allows Google to create a more comprehensive and nuanced response to user queries.

The Cost Efficiency of SGE

One of the key considerations for Google in implementing SGE is cost efficiency. As Indig points out, “Generative AI results are much more expensive than classic web results.” However, Google has found ways to mitigate these costs. For instance, Google does not show AI results by default. Instead, users need to hit a “generate” button most of the time. This allows Google to test engagement and save money.

Furthermore, not every query needs AI answers. Simple searches that yield direct answers today don’t need AI. Google might also cache a significant amount of AI answers to save costs. As Indig explains, “Google might cache 30-60% of queries. Generative AI answers might be harder to cache than web results, but even a rate of 10-20% of lower cost significantly.”

SGE and E-commerce

SGE’s impact extends to e-commerce as well. For product queries, Google shows two rows of products on desktop and two products on mobile. Google seems to pull these product attributes from written reviews on websites, turning unstructured data into structured data. This means that product, local, and brand reviews are gaining significant importance for AI results.

For local searches, Google pulls the full local pack into AI results. Businesses shown in the AI Snapshot are not the same as in classic local packs. Instead, SGE tries to customize the list based on reviews from sites like Yelp, Eater, or Google itself.

SGE and YMYL Queries

SGE also handles YMYL (Your Money or Your Life) queries differently. Google seems to stay away from giving specific advice for sensitive topics like loans or serious diseases, which are regulated in many countries. However, for less sensitive topics, SGE provides a correct answer and even leverages research papers.

The Future of SGE

The future of SGE is not just limited to Google Search. As Indig suggests, “The biggest opportunity for SGE is not in Google Search but in becoming an assistant for Google’s whole ecosystem.” This could include integration with Chrome, Gmail, YouTube, Android, Pixel phones, and all of Google’s other properties.

In conclusion, Google’s SGE represents a significant step forward in the evolution of search technology. By leveraging generative AI, Google is able to provide more comprehensive, nuanced, and interactive search results. While this new feature is still in its early stages, it’sclear that Google is committed to pushing the boundaries of what’s possible in search. As Google’s SGE continues to evolve, we can expect to see even more innovative features that enhance the search experience and reshape the future of information discovery.

The Implications of SGE for SEO

The advent of SGE has significant implications for SEO. As Indig points out, “Ranking in AI answers is a matter of a) understanding the different angles SGE covers and b) answering them explicitly in your content.” This means that SEO strategies will need to adapt to the new search landscape that SGE is creating.

For instance, the sites shown in the AI Snapshot carousel are not the same links as shown in the classic search results. This suggests that Google uses different signals to put the carousel together. As Indig theorizes, “By explicitly writing about an angle AI Snapshots highlight, websites might increase the chance of ranking in the carousel.”

The Potential of SGE in E-commerce

SGE’s capabilities in e-commerce are particularly noteworthy. For product queries, Google shows two rows of products on desktop and two products on mobile. Google seems to pull these product attributes from written reviews on websites, turning unstructured data into structured data.

This suggests that product, local, and brand reviews are gaining significant importance for AI results. As Indig notes, “Product descriptions might be an important driver of clicks in AI product carousels.” This presents a significant opportunity for e-commerce businesses to optimize their product descriptions and reviews to rank higher in AI results.

SGE also has a significant impact on local search. For local searches, Google pulls the full local pack into AI results. Businesses shown in the AI Snapshot are not the same as in classic local packs. Instead, SGE tries to customize the list based on reviews from sites like Yelp, Eater, or Google itself.

This suggests that local businesses can optimize their online presence by focusing on garnering positive reviews on these platforms. As Indig notes, “Sites that provide local reviews might actually get more traffic from Google based on the SGE beta.”

The future of SGE is not just limited to Google Search. As Indig suggests, “The biggest opportunity for SGE is not in Google Search but in becoming an assistant for Google’s whole ecosystem.” This could include integration with Chrome, Gmail, YouTube, Android, Pixel phones, and all of Google’s other properties.

This suggests that the future of search may not be confined to a single platform or website. Instead, search could become an integral part of our digital ecosystem, seamlessly integrated into the various apps and platforms we use every day.

In conclusion, Google’s SGE represents a significant step forward in the evolution of search technology. By leveraging generative AI, Google is able to provide more comprehensive, nuanced, and interactive search results. While this new feature is still in its early stages, it’s clear that Google is committed to pushing the boundaries of what’s possible in search. As Google’s SGE continues to evolve, we can expect to see even more innovative features that enhance the search experience and reshape the future of information discovery.

For more in-depth analysis and insights on SGE, you can read Kevin Indig’s article on his experience testing SGE. And of course, you can learn more about “Google Search Generative Experience”, all from the official Google blog post on the topic.