---
title: Your AI Vendor Is Learning From Your Team. Here Are the Three Contract Clauses Most SMBs Forgot to Include.
description: A viral video of Indian factory workers wearing head cameras to train AI made the training-data pipeline visible. Most SMB AI vendor contracts are silent on what their tools learn from your team. Three redlines a practitioner asks for on every renewal, plus a fifteen-minute audit you can run this week.
canonical: https://richardbatt.com/blog/ai-vendor-training-data-clauses-smb
date: 2026-04-29
author: Richard Batt
tags: [AI Strategy, AI Contracts, SMB AI, Vendor Management]
type: blog_post
---

# Your AI Vendor Is Learning From Your Team. Here Are the Three Contract Clauses Most SMBs Forgot to Include.

_A viral video of Indian factory workers wearing head cameras to train AI made the training-data pipeline visible. Most SMB AI vendor contracts are silent on what their tools learn from your team. Three redlines a practitioner asks for on every renewal, plus a fifteen-minute audit you can run this week._

**Richard Batt** — AI implementation specialist. 120+ projects across 15+ industries, serving SMBs (5-200 employees) worldwide from Middlesbrough, UK (working globally). Contact: richard@richardbatt.com · https://richardbatt.com

A video went viral on 13 April showing workers at an Indian garment factory wearing head-mounted cameras while they cut, stitched and packed clothes. The cameras were not for safety. They were not for training the workers. The footage was being used to teach an AI system how to perform the same tasks. The reporting in Financial Express and SlashGear pulled at the obvious thread: the workers were, in effect, training the model that would replace them.

If you run a small or mid-sized business (SMB) and your reaction was "that is bad, but it is not my problem," I would ask you to look at one thing on your desk. The contract you signed with whichever AI vendor your team is using right now. Then I would ask you the question that contract probably does not answer. What is your vendor learning from your team's work, and who owns the model it builds out of that learning?

I have read a few hundred AI vendor contracts in the last two years across SMB clients of mine. The clause about training data is missing more often than not. When it is present, it usually favours the vendor by default. The factory-camera story is the public, visceral version of a much quieter pattern that is already inside most businesses.

This post is not a legal document. It is a practitioner's read on the three contract questions every SMB owner should be asking before they renew or sign anything new. None of these are exotic. All of them are commercially sensible. They just rarely make it into the version the vendor sends you first.

## What the camera story actually shows

The two reports worth reading on this are the SlashGear piece from 19 April and the Financial Express piece from 13 April. They cover the same source video, with different framing.

The mechanics are straightforward. A factory floor in India runs cutting, sewing and finishing operations performed by skilled workers. Each worker wears a small camera at head or chest level for the duration of a shift. The footage covers exactly what their hands do, in what order, with what tools, for which type of garment. That dataset is then fed into an AI system designed to replicate human task sequences, often for use in robotics or process automation models.

The workers are not being paid an additional rate to be a training input. The factory has agreed to it as part of a research arrangement with an AI company. The data, once captured, leaves the factory and lives inside the AI company's training pipeline. Whatever model emerges from that pipeline belongs to the AI company.

Reading that, most SMB owners have one of two reactions. The first reaction is that this is a manufacturing story and not relevant to their professional services or retail or SaaS business. The second is that any tool they use is built on data from someone else, so this kind of arrangement is fine. Both reactions miss what is actually happening.

The shift is not that AI is being trained on data. It always was. The shift is that the training data is now extracted from active workplaces in real time. Cameras, screen recorders, call transcripts, ticket logs, sales-pipeline activity. If your team uses an AI tool, your team is, by default, generating training data in some form. The question is what your vendor is allowed to do with it.

## The three questions your contract probably does not answer

Pull out the most recent AI vendor contract in your business. It might be the agreement for an AI sales assistant, an AI customer-support tool, an AI meeting-notetaker, an AI document-extraction service, or an AI coding assistant. Most SMB owners will have at least three live contracts in this category right now.

Search the contract for the words "training," "fine-tune," "model improvement," and "derived data." If you find none of them, you have a problem already, because the vendor's silence is not your protection. It is theirs. Default contract law in most of our trading jurisdictions treats data you supply to a vendor as something the vendor can use to improve their service unless you say otherwise. Silence in your favour does not exist in the standard SaaS contract.

If you do find those words, ask three questions of the clause that contains them.

### Question 1: Who owns the data your team generates while using the tool?

There are three possible answers, and the contract should pick one explicitly. The data could belong to you (the customer). It could belong to the vendor. Or it could be jointly owned with each party having defined rights.

You want option one as the default and option three only with limits. If the contract says the vendor "may use customer-generated content to improve the service," that is option two in disguise, and it is doing more work than most SMB owners realise. It means every email your team writes through their AI assistant, every support ticket their AI bot reads, every meeting transcript their notetaker captures, can be fed into a training dataset that improves a model the vendor will sell to your competitors next year.

The redline is short. Add a sentence stating that all data, derived data, and prompt-response logs generated by your account remain the property of your business. Add a second sentence stating that the vendor may not use that data to train, fine-tune, evaluate or develop any model except where you have given written consent for a specific named purpose with a defined scope and time limit.

I have asked for that redline on roughly fifty contracts in the last 18 months. Roughly thirty came back with the redline accepted. About fifteen came back with a counter-proposal we negotiated. Five vendors refused, and in three of those cases my client walked away. The redline is not unreasonable. It is a useful filter for which vendors are building responsibly versus which ones are quietly betting their next product cycle on data they did not pay for.

### Question 2: What happens to your data if you cancel the contract?

Most SMB AI contracts contain a deletion clause that sounds reassuring. Customer data will be deleted within 30 days of contract termination. That deletion clause usually covers your data inside the vendor's product database. It almost never covers anything that has already been ingested into a training pipeline.

Once your data has been used to train or fine-tune a model, it does not exist in a row in a database any more. It exists as a small statistical contribution to the weights of a neural network. You cannot delete that contribution. The model continues to behave the way it does because of, in part, your data, even after you cancel the contract.

The redline here is that the deletion obligation must explicitly include any model artefacts derived from your data. The cleanest version of the clause says the vendor will, on termination, either delete your data from any training corpus and retrain the affected models, or commit not to retain the affected model in production. The realistic version says the vendor will not use any of your data in any training run after the termination date, and will provide a written confirmation that all such uses have been cancelled.

Both versions need a stronger clause than "we will delete your data." Without it, the vendor walks away with the operational knowledge of how your business runs, and you walk away with a 30-day data deletion certificate that does not refer to anything that actually matters.

### Question 3: Are you informed when the vendor uses your data to build something new?

Almost no contracts answer this question well. If the vendor decides to use anonymised customer data to build a new feature, a new model variant, or a new product offering, are you notified? Do you have an opt-out? Or is the right buried in a "service updates" clause that you implicitly accept by continuing to use the tool?

The garment-factory story is the public version of this question. The workers were not informed about what was being built from their footage. They learned about it when the video went viral.

For an SMB, the equivalent looks like this. Your AI customer-support vendor uses six months of your support tickets, in aggregate with similar data from other customers, to launch a new "industry-specific support copilot." That copilot is sold to your competitors at a discount because they are now in the same vertical. Your data trained the product that is now being marketed against you. Nobody emailed you about it because the contract did not require that they did.

The redline is a notification clause. The vendor agrees to inform you in writing before any use of your data, even in aggregated or anonymised form, in any new model, feature or product. They agree to give you a defined window, usually 30 days, to opt out without penalty. They agree that the opt-out applies retroactively, meaning your data is removed from the training corpus for the new artefact even if you only learned about it after the fact.

Vendors push back hardest on this clause. It creates real operational friction for them. But it is also the clause that has the largest commercial value for you, because it is the only one that gives you visibility into how your data flows through the vendor's product roadmap.

## The boring middle ground

Here is where I have to be fair, because the camera story has a sharper edge than most of what I see in practice.

Most AI vendors are not trying to extract value from your data in any malicious sense. They are trying to ship a product, and customer data is the cheapest, fastest fuel for product improvement. The reason their default contracts favour them is the same reason any standard form contract favours the party that drafted it. They wrote the form. Nobody pushed back. The next round of customers got the same form. Within a year the standard had hardened.

You are not negotiating against an evil counterparty. You are negotiating against inertia. The redlines above succeed because the vendor has not actually thought about each clause for years and is now being asked to. Most of them respond reasonably when the customer is specific. The ones who do not respond reasonably are the ones whose product strategy depends on the data extraction you are now blocking, and you should walk.

I have also had to talk a few clients down from over-rotating in the other direction. Refusing to share any operational data with any AI vendor under any circumstances is a defensible position, but it usually means you cannot adopt the tool at all. Most AI vendors do need some training signal from your account to make the product useful for your specific work. The point is not to ban the data flow. It is to make it visible, scoped and reversible.

A clean version of the three clauses leaves room for the vendor to do useful product improvement work on data you have explicitly named. It locks the door against silent training pipelines, retroactive use of cancelled data, and surprise feature launches built on what your team typed last quarter. That is a fair settlement for both sides.

## What to do this week

Two actions, in order.

The first is the fifteen-minute audit. Pull every active AI vendor contract in the business. There are usually three to five of them. For each contract, search for "training," "fine-tune," "derived data" and "improve." Note which contracts are silent on the question, which ones favour the vendor, and which ones have a workable customer-data clause already.

The second is the renewal-window decision. Most SMB AI contracts are 12-month annual renewals. Look at the next contract due to renew. Send the vendor an email asking for a clarification of the training-data clause and proposing the three redlines above. Send it 60 days before the renewal date so you have time to walk if the answer is unsatisfactory. The conversation itself is informative. Vendors who take a week to come back with a thoughtful response are usually safe. Vendors who deflect or send a generic "we take privacy seriously" reply rarely improve their position on the second pass.

If your business does not have legal counsel, a local commercial solicitor will redline these clauses for between £400 and £800 per contract. That is roughly the equivalent of three months' subscription fees for the average SMB AI tool. It is also the cheapest insurance you can buy against finding your operational knowledge in someone else's product launch a year from now.

## The wider point

The camera story is a useful reminder that the training pipeline is real, it is opaque, and it has commercial consequences for the people whose work feeds it. The garment workers had far fewer options than your business does. Three of those options are exactly what this post has been about: redlining a contract, threatening to walk away from a renewal, and proposing a notification clause. They could not do any of those things, and you can do all three.

If you would rather not run the audit yourself, the AI Roadmap audit covers vendor and contract review as one of its modules, and we redline the standard AI vendor contracts we see most often as part of the deliverable. If you would rather start with the templates, the AI Ops Vault includes the three model clauses above with worked examples from real SMB negotiations and the email scripts to send to vendors at renewal.

Whichever route you take, the action is the same. Read the contract, mark the silence, propose the three clauses, and decide which vendors deserve another year of your team's data. The factory workers in the video did not get to make those calls. You do.

---

## More about Richard Batt

Richard Batt is an AI implementation specialist who helps businesses deploy working AI automation in days, not months. 120+ projects across 15+ industries.

### Key pages

- [Home](https://richardbatt.com/)
- [About Richard](https://richardbatt.com/about)
- [Blog](https://richardbatt.com/blog)
- [Contact](https://richardbatt.com/contact)
- [Subscribe](https://richardbatt.com/subscribe)

### Contact

- Email: richard@richardbatt.com
- Location: Middlesbrough, UK (working globally)
- Website: https://richardbatt.com