How Zero-Shot AI Helps Robots “See” Grapes in Real Vineyards

Working in a vineyard means dealing with constant variation—sunlight shifts, leaves hide clusters, and every plant carries its own surprises. For farmers, this makes estimating bunch count and monitoring growth a hands-on task. For researchers, it means hours of labeling grape images before any AI model can even begin learning. In this new work, Rosa Pia Devanna, Giulio Reina, Fernando Auat Cheein, and Annalisa Milella explore how to break that cycle and let modern AI do the heavy lifting.

Their study looks at how two cutting-edge “zero-shot” models—Segment Anything (SAM) and GroundingDINO—can detect and segment grape bunches without the endless manual labeling usually required. Instead of training from scratch, these models can look at an image and identify objects they’ve never seen before. The researchers tested three different strategies: one that sharpens traditional depth-based segmentation, one that relies only on visual cues, and one that fully automates the process from detection to instance segmentation. They ran all methods on an RGB-D dataset collected with the Polibot farmer robot in a commercial vineyard in Southern Italy.

What stands out is how well the zero-shot tools perform under real field conditions. Even in the examples on pages 10–11, where lighting is poor and shadows hide part of the grapes, methods like AutoSAM-DINO still manage to detect clusters that classic models miss. And while the fully automated approach doesn’t always match the highest precision in counting, it dramatically reduces the time farmers and engineers spend labeling data—a major advantage when scaling to larger vineyards or multiple seasons.

For farmers, this kind of progress means quicker insights into how many bunches are on each row, how evenly vines are developing, or where problems may be emerging—all without needing to walk every row with a notebook. For AgRibot researchers, the study shows a promising path toward robots that can adapt to changing vineyard conditions, using flexible AI tools that don’t require heavy retraining every time the environment changes.

Learn more about the publication at Zenodo.