Portfolio Spotlight: DeepSense's Multimodal Vision System

One of the genuine privileges of seed-stage venture investing is the opportunity to watch technically exceptional companies build from nearly nothing. When Neuron Factory invested in DeepSense in February 2025, the company had a compelling technical demo, two enterprise letters of intent, and a team of eight. Six months later, they have a deployed product, three paying customers, and a pipeline that has grown faster than any of us anticipated.

DeepSense builds AI perception systems for autonomous industrial robots and logistics vehicles. Their core platform integrates radar, lidar, and RGB-D camera inputs through a proprietary sensor fusion architecture based on a modified vision transformer. The result is a perception system that maintains reliable environmental understanding under conditions — heavy rain, dense smoke, complete darkness, reflective surfaces — that routinely defeat conventional camera-centric perception stacks.

The Perception Problem

Industrial and logistics environments present fundamentally harder perception challenges than consumer autonomous driving applications, a fact that is not always appreciated by observers of the field. Consider a large e-commerce fulfillment center operating around the clock: the lighting conditions range from bright overhead fluorescents to near-total darkness in unoccupied aisles. Surfaces include highly reflective metal shelving, matte cardboard boxes with uniform appearance, translucent plastic wrapping, and occasionally wet concrete floors. The facility operates continuously, meaning dust accumulation on sensors is a practical issue that must be handled gracefully rather than treated as an edge case.

Camera-only perception systems, including those based on state-of-the-art vision transformers, fail systematically in these conditions. The fundamental problem is that visible-light cameras lose information in a way that cannot be recovered algorithmically: a camera pointed at a dark scene does not contain the information needed to reconstruct the scene, regardless of how sophisticated the downstream model is. You cannot infer what you cannot observe.

Radar and lidar are not subject to this limitation. Radar in particular is virtually impervious to environmental conditions — it functions reliably in complete darkness, through smoke and dust, and in the presence of reflective surfaces that scatter optical sensors. But radar imagery is coarse and lacks the semantic richness of camera data. The insight underlying DeepSense's architecture is that the right approach is not to choose the best single sensor but to fuse multiple sensor modalities in a way that preserves the complementary strengths of each.

The Technical Architecture

DeepSense's MultiSense architecture is built on a cross-modal transformer that processes tokenized representations of radar, lidar, and camera inputs simultaneously. The model was trained on a proprietary dataset of over 2.4 million annotated scenes collected across six months of operation in partner facilities, covering the full range of environmental conditions encountered in real industrial deployments.

A key innovation in the MultiSense architecture is a learned confidence weighting mechanism that dynamically adjusts the relative contribution of each sensor modality based on inferred environmental conditions. In good lighting and clear conditions, camera data is upweighted because of its superior semantic resolution. In low light or high-dust conditions, the model automatically increases the weight of radar and lidar channels. This dynamic rebalancing happens at inference time without any manual configuration, making the system robust to the full diversity of conditions encountered in real facilities without requiring operators to manage sensing modes.

The system runs on a compact edge compute unit developed in partnership with a major semiconductor manufacturer. The inference pipeline achieves 30 Hz update rates with a total latency under 50 milliseconds, well within the requirements of real-time robotic control systems operating at typical industrial speeds.

Commercial Deployment

DeepSense's first three commercial deployments are in operation at facilities representing different sectors of the industrial automation market. The first is a 280,000 square foot fulfillment center operated by a major logistics company in New Jersey. The second is an automated parts retrieval system at a tier-one automotive component supplier in Michigan. The third is a materials handling application at a chemical processing plant in Texas where the combination of poor lighting, reflective metal surfaces, and intermittent dust represents an extreme test case for any perception system.

Across all three deployments, the system has logged over 180,000 operational hours with zero perception-related incidents causing equipment damage or operational interruption. This is the metric that matters most to enterprise customers: not benchmark accuracy on a standardized dataset, but reliable operation in their actual environment without continuous maintenance overhead.

We knew from the beginning that the real test was not the research lab. The real test was running for six months in a chemical plant without someone adjusting the sensors every week. That is what we built for, and that is what we have proven.

— Dr. Elena Vasquez, CEO, DeepSense

What Is Next

DeepSense is currently in discussions with two additional enterprise customers and is expanding its engineering team to accelerate the development of MultiSense 2.0, which will add 4D radar support and an enhanced semantic scene understanding layer targeting picking and manipulation applications. The company expects to begin a Series A fundraising process in Q1 2026.

As their lead seed investor, Neuron Factory has been privileged to support DeepSense's journey from research prototype to commercial product. We look forward to continuing that partnership through their next stage of growth.