Blog/Industry

AI Policy for E-commerce: How to Protect Your Product Catalog Without Losing AI Traffic

9 min read

Visa says 2025 was the last year consumers shopped alone. Mastercard calls the next phase of commerce “from digital to intelligent.” PayPal, Google, and Amazon have all launched protocols for AI agents to make purchases on behalf of users.

AI agents are going to become a meaningful traffic source for online stores. Some already are. When someone asks ChatGPT “what's the best running shoe under $150?” and your product appears in the answer, that's a potential customer. When Perplexity recommends your store in a shopping comparison, that's real traffic with purchase intent.

But here's the tension. The same AI systems that can recommend your products to customers are also crawling your site to train their models. Your product descriptions, your pricing, your images, your reviews. All of it is being collected, potentially to generate similar content for competitors or to power AI tools that bypass your store entirely.

The question for e-commerce isn't whether to engage with AI. It's how to set the terms so AI works for you rather than against you.

What's actually at stake for online stores

Your product catalog is one of your most valuable assets. You invested time and money into writing unique product descriptions, shooting product photos, building review systems, and optimizing your pages. That content is what makes your store yours.

When an AI training crawler copies your catalog, that content enters a model that can then generate similar descriptions for anyone who asks. A competitor could use AI to produce product pages that closely mirror yours. Your investment in original content becomes everyone else's starting point.

Pricing data is even more sensitive. Competitive intelligence has always been a thing, but AI-powered bulk scraping makes it possible to monitor your pricing in real time and undercut you automatically. If your robots.txt doesn't restrict AI scrapers, your pricing data is essentially public.

Images are another concern. AI models trained on product images can generate similar visuals. If your product photography is a differentiator, having it absorbed into a training dataset dilutes that advantage.

But blocking everything is the wrong move

Here's where most guides get it wrong. They tell you to block all AI bots and call it done. For e-commerce, that's a mistake.

AI search is becoming a real discovery channel. When ChatGPT browses the web to answer a shopping question, it needs to access your product pages. When PerplexityBot indexes your store for AI-powered search, it needs to read your content. If you block these bots entirely, your products disappear from a growing slice of how people find things to buy.

The stores that figure out the right balance will have a genuine advantage. They'll be visible in AI shopping recommendations while their competitors are either invisible (because they blocked everything) or unprotected (because they blocked nothing).

See where your store stands

Scan your domain to check which AI bots can access your product pages right now.

Scan your site free

The right AI policy for an online store

The sweet spot for most e-commerce sites comes down to four decisions.

AI Training: Block. There's almost no scenario where an online store benefits from its catalog being used as AI training data. Block GPTBot, CCBot, Google-Extended, Bytespider, and the other training crawlers. Your product descriptions and images should stay yours.

AI Search: Allow. You want ChatGPT-User, PerplexityBot, and OAI-SearchBot to access your product pages. When an AI assistant recommends your products in response to a user's shopping query, that's high-intent traffic. Blocking these bots is like delisting yourself from a search engine.

AI Inference: Allow. When someone pastes your product URL into an AI assistant and asks “is this a good deal?” or “how does this compare to [competitor]?”, you want the AI to be able to read your page and answer. That interaction often leads to a purchase. Blocking inference means the AI says “I can't access that site” and the user moves on to a competitor.

Bulk Scraping: Block. Automated large-scale copying of your catalog, pricing, and reviews should be restricted. This protects you from competitive intelligence scrapers and dataset collectors.

How to implement this on your store

The implementation depends on your platform.

Shopify

Shopify generates your robots.txt automatically, but you can customize it through the robots.txt.liquid template in your theme. Go to Online Store > Themes > three-dot menu > Edit code, and look for the robots.txt.liquid file. If it doesn't exist, you can create it.

Add the AI-specific directives at the end. Block the training crawlers (GPTBot, ClaudeBot, CCBot, Google-Extended, Bytespider, Meta-ExternalAgent, and others) and keep the search bots allowed by default.

WooCommerce / WordPress

If you're running WooCommerce on WordPress, you can edit robots.txt through an SEO plugin like Yoast or Rank Math (both have robots.txt editor features). Or you can create a physical robots.txt file in your WordPress root directory via SFTP or cPanel.

Add the same training-bot-blocking directives. WordPress also supports header-level controls through plugins if you want to add HTTP headers for additional standards like TDMRep.

Custom platforms (headless, Vercel, Netlify, etc.)

If you're running a custom storefront, add a robots.txt file to your public directory and include the AI-specific rules. For Next.js on Vercel, this goes in the /public folder. For Netlify, it goes in your publish directory.

For all platforms

Beyond robots.txt, consider creating an agent-permissions.json file at /.well-known/agent-permissions.json on your domain. This is an emerging standard that expresses AI permissions in a structured JSON format that's much richer than robots.txt. It can distinguish between training, search, inference, and scraping permissions, which is exactly the nuanced control e-commerce sites need.

You can generate all of these files through our policy creation wizard. Pick “E-commerce” as your site type, and we'll pre-fill the recommended settings for online stores and generate robots.txt additions, agent-permissions.json, ai.txt, and TDMRep for you to deploy.

Set up in 2 minutes

Pick E-commerce as your site type and get pre-filled policy files ready to deploy.

Build your AI policy

Check your store right now

Before making any changes, see where your store currently stands. Scan your domain and you'll see exactly which AI bots can access your product pages, whether there are conflicts between your Terms of Service and your technical setup, and what your overall AI exposure looks like.

We scanned a sample of top e-commerce sites while researching this piece. The results were eye-opening. The majority have no AI-specific protections at all. Product catalogs, pricing data, customer reviews... all of it wide open to AI training crawlers.

The bottom line

AI shopping agents are coming. They're already here for some categories. The stores that set clear AI policies now will be the ones that benefit from AI-driven commerce without giving away their competitive advantages.

Block training. Allow search. Protect your catalog. Stay visible to AI shoppers.

That's the playbook. And you can set it up in about five minutes.

Take action

Protect your content today

Scan your site to see your AI exposure, or build your policy in 2 minutes. Free, no signup required.