Google has unveiled a new feature within its robots.txt indexing file, known as Google-Extended, which empowers publishers to determine whether their content will be utilized to enhance Bard and Vertex AI generative APIs, including future iterations of the models that power these products. This move is in response to the growing demand from web publishers who seek more choice and control over how their content is utilized in emerging generative AI applications.
Google-Extended: Publishers’ Control Over AI Usage
Google-Extended represents a major step towards providing transparency and control to content creators. Publishers can incorporate this crawler into their site’s documentation, enabling them to instruct Google not to use their content for the specified generative AI APIs. Danielle Romain, Google’s Vice President of Trust, emphasizes the importance of such control, highlighting that all providers of AI models should offer similar transparency and choices.
Generative AI chatbots, as they become increasingly integrated into search results, have raised concerns among publishers about how their content is consumed. While these AI systems may attribute their sources, they aggregate information from various websites and present it within user conversations. This aggregation could potentially reduce individual website traffic, impacting ad revenue and overall business models.
Impact on AI Model Training and Content Discovery
Google clarifies that opt-outs apply to the next generation of models for Bard and Vertex AI. Publishers seeking to keep their content away from platforms like Search Generative Experience (SGE) should continue to employ the Googlebot user agent and the NOINDEX meta tag in their robots.txt document. Romain also points out that “as AI applications expand, web publishers will face the increasing complexity of managing different uses at scale.”
This development aligns with the broader trend of rapid advancements in generative AI tools throughout the year. Given that search is a primary means by which people discover content, Google’s introduction of this control mechanism is timely and indicative of its proactive approach to addressing the impact of its products on the web.