AutoAnnotate: label sensor data without weeks of manual work

The bottleneck in machine learning is rarely the model. It is labeled data. In computer vision you draw bounding boxes. In NLP you tag entities. But with sensor data — microphone, radar, touch, IMU — labeling time series is slow, expensive, and error-prone. You scroll through hours of measurements and need domain expertise to know what happened at each moment.

During my master's project at R2R Engineering, I ran into this head-on. We wanted to make a care robot smarter with machine learning, but got stuck labeling tens of thousands of sensor readings. That experience led to an approach we developed together — now available as AutoAnnotate, an initiative by IMeTech Engineering and R2R Engineering. In this article I explain how it works and why it matters. For a deeper dive into the underlying strategies, see also Automating Sensor Data Annotation for Machine Learning.

An initiative by IMeTech Engineering & R2R Engineering

The project: from static robot to context-aware interaction

R2R Engineering develops socially assistive robots, including Maatje Pop — a humanoid doll for elderly users and people with cognitive impairments. The robot provides support through daily reminders, caregiver messages, and simple interactions. The existing behaviour was predictable: a sensor trigger led to a fixed response, without context or memory.

The project goal was an adaptive interaction model: the robot should learn when to respond and which interaction type is appropriate, based on multi-modal sensor data (microphone, radar, touch, IMU, ambient light). That required two ML models: a binary model deciding whether the robot should interact, and a multi-label model choosing the interaction type from ten defined categories.

Both models need thousands of labeled examples to generalize reliably. We built a web interface for observers to label in real time while watching the robot and user. It worked — but it did not scale. Manual labeling took weeks and kept the team stuck in repetitive work instead of moving forward.

Maatje Pop is the use case here, not the subject of this article. The pattern is familiar for any ML project with sensor data: you need labels, manual labeling is the bottleneck, and you need a way to scale without sacrificing quality.

The insight: sensors do not see everything

Embedded sensors register that something happens — sound, movement, touch — but not always what happens. A microphone records that there is sound, but not whether someone picks up the doll, comforts it, or walks past passively. Radar detects movement, but not whether that movement means a greeting or physical activity.

The breakthrough was a camera as an external observation source, alongside the sensors. The camera streams in parallel and records visual events: a person entering, a hand grabbing the doll, someone sitting still for a long time. Those events are linked to the sensor data at the same timestamp. That produces labeled training examples without someone manually annotating every second.

Technically this is weak supervision: heuristics and vision models generate labels programmatically instead of manually per sample. We built a two-tier pipeline:

Raw sensor data without context: numbers on a timeline without clear meaning

Tier 1: network camera analytics

Built-in detections from the network camera: person detected → greeting; speech activity → conversation starter; loitering → physical activity. No extra infrastructure needed — the camera sends events via webhooks to the annotation server.

Tier 2: custom vision model

For fine-grained interactions Tier 1 cannot see: hand + doll overlap → picking up or comforting; person + doll proximity without contact → passive feedback. A custom-trained object detection model recognizes doll, hand, and person in the camera feed.

Both tiers write timestamped labels to the same database, synchronized with sensor measurements. The result: a supervised learning dataset where each sensor snapshot is linked to the correct interaction label — generated by the camera, not by a human scrolling through time series for hours.

How AutoAnnotate works

From that collaboration came AutoAnnotate — set up together by IMeTech Engineering and R2R Engineering to make this approach available to other teams with the same problem. The platform translates the technical pipeline into four steps:

Record

Cameras alongside your sensors capture image and measurements simultaneously, on one synchronized timeline.

Recognize

The software recognizes what happens in the image: a person, a gesture, an action.

Link

What the camera sees is automatically linked to the sensor data from that same moment.

Export

Labeled data in standard formats, ready to train your model — without extra conversion steps.

AutoAnnotate links camera footage and sensor data on one timeline into labeled training data

Three things make AutoAnnotate different from generic annotation tools:

Privacy by design. Image and measurement data stay with you. AutoAnnotate is built to run locally — you are not required to send anything to a third-party cloud. Critical in environments where personal data and GDPR matter, such as healthcare.

Consistency. Manual labeling leads to varying quality: everyone writes things slightly differently. An automated pipeline generates repeatable labels with the same rules, every time.

Scale. Collect more data without proportionally more people. Where manual labeling takes weeks, the same period with AutoAnnotate produces a much larger, consistent dataset.

Manual vs. AutoAnnotate

Manual labeling

Weeks to months for a usable dataset
Domain expertise required per annotation
Varying quality between annotators
Costs grow linearly with each project
Hard to scale without extra people

With AutoAnnotate

Labels in hours instead of weeks
Camera + AI does the heavy lifting
Consistent, repeatable annotations
Runs locally, privacy by design
More data without equally more capacity

Proven with Maatje Pop

Maatje Pop care robot with sensor and camera setup for AutoAnnotate

AutoAnnotate is not a theoretical concept — it was tested and validated in practice at SAR Maatje Pop. Together with R2R Engineering, we deployed the auto-annotation pipeline alongside the existing sensors and robot. The result:

More than 160,000 labeled samples collected, with manual review confirming 96% accuracy of auto-generated labels. Labeling went up to 100× faster than fully manual work. The exported data fit directly into the existing training process — no extra conversion steps needed.

The ML models trained on this data now run on the robot's embedded hardware. From raw sensor data to a working on-device model — the full chain is proven.

To be honest: not every interaction type was equally easy to label automatically. Simpler events such as greetings and physical activity scored highest. More complex types — such as starting a conversation with partial camera occlusion — required refined detection rules or manual review of edge cases. That is normal with weak supervision and exactly why we recommend a hybrid approach: automate where you can, review manually where you must.

The Maatje Pop team noticed the difference immediately. As the project coordinator put it: "Since we started using AutoAnnotate, we need to label much less manually. We get usable data faster and annotation quality stays stable."

Where else AutoAnnotate works

The approach is not limited to care robots. Wherever sensor data feeds an ML model and context is visually observable, AutoAnnotate can speed up labeling:

AutoAnnotate application areas: healthcare, retail, agriculture, and wearables

Care robots

Sensors on the robot or in the room, with cameras: automatic clear examples of gestures, postures, and interactions.

Wearable sensors & fall detection

Body-worn sensors with cameras: automatic annotations to make fall alarms smarter.

Retail & behaviour

In-store sensors with cameras: automatic insight into footfall and movement patterns, useful for planning.

Agriculture

Livestock sensors with cameras: behaviour and health signals automatically combined.

Why we made AutoAnnotate available

The annotation problem is universal. Every engineer building an ML model on sensor data hits the same wall: the model is feasible, the data is available, but the labels are missing. Our solution did not come from a product idea on paper, but from necessity during the Maatje Pop project.

Together with R2R Engineering, we built and validated the pipeline: IMeTech on the technical side (data pipeline, computer vision, ML), R2R on the care robot and practice environment. Because we saw the same bottleneck in other projects, we set up AutoAnnotate together as a platform — an initiative by both companies.

AutoAnnotate now makes this approach accessible to teams with the same problem. Whether you are developing a care robot, an IoT product, or an industrial sensor setup: if you want to train ML models on sensor data, labeled data is your bottleneck. AutoAnnotate helps remove it.

Curious how this works for your project? Book a free demo at autoannotate.nl. You can also contact IMeTech to discuss your ML or sensor data challenge.