
What weâre about
đ This virtual group is for data scientists, machine learning engineers, and open source enthusiasts who want to expand their knowledge of computer vision and complementary technologies. Every month weâll bring you two diverse speakers working at the cutting edge of computer vision.
- Are you interested in speaking at a future Meetup?
- Is your company interested in sponsoring a Meetup?
Contact the Meetup organizers!
This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more about FiftyOne, visit the project page on GitHub: https://github.com/voxel51/fiftyone
đŁ Past Speakers
* Sage Elliott at Union.ai
* Michael Wornow at Microsoft
* Argo Saakyan at Veryfi
* Justin Trugman at Softwaretesting.ai
* Johannes Flotzinger at Universität der Bundeswehr Mßnchen
* Harpreet Sahota at Deci,ai
* Nora Gourmelon at Friedrich-Alexander-Universität Erlangen-Nßrnberg
* Reid Pryzant at Microsoft
* David Mezzetti at NeuML
* Chaitanya Mitash at Amazon Robotics
* Fan Wang at Amazon Robotics
* Mani Nambi at Amazon Robotics
* Joy Timmermans at Secury360
* Eduardo Alvarez at Intel
* Minye Wu at KU Leuven
* Jizhizi Li at University of Sydney
* Raz Petel at SightX
* Karttikeya Mangalam at UC Berkeley
* Dolev Ofri-Amar at Weizmann Institute of Science
* Roushanak Rahmat, PhD
* Folefac Martins
* Zhixi Cai at Monash University
* Filip Haltmayer at Zilliz
* Stephanie Fu at MIT
* Shobhita Sundaram at MIT
* Netanel Tamir at Weizmann Institute of Science
* Glenn Jocher at Ultralytics
* Michal Geyer at Weizmann Institute of Science
* Narek Tumanya at Weizmann Institute of Science
* Jerome Pasquero at Sama
* Eric Zimmermann at Sama
* Victor Anton at Wildlife.ai
* Shashwat Srivastava at Opendoor
* Eugene Khvedchenia at Deci.ai
* Hila Chefer at Tel-Aviv University
* Zhuo Wu at Intel
* Chuan Guo at University of Alberta
* Dhruv Batra Meta & Georgia Tech
* Benjamin Lahner at MIT
* Jiajing Chen at Syracuse University
* Soumik Rakshit at Weights & Biases
* Jiajing Chen at Syracuse University
* Paula Ramos, PhD at Intel
* Vishal Rajput at Skybase
* Cameron Wolfe at Alegion/Rice University
* Julien Simon at Hugging Face
* Kris Kitani at Carnegie Mellon University
* Anna Kogan at OpenCV.ai
* Kacper Ĺukawski at Qdrant
* Sri Anumakonda
* Tarik Hammadou at NVIDIA
* Zain Hasan at Weaviate
* Jai Chopra at LanceDB
* Sven Dickinson at University of Toronto & Samsung
* Nalini Singh at MIT
đ Resources
* YouTube Playlist of previous Meetups
* Recap blogs including Q&A and speaker resource links
Sponsors
Upcoming events
10
- Network event â˘Online â˘OnlineOct 28 - Getting Started with FiftyOne for Agriculture Use CasesOnline21 attendees from 16 groupsThis special AgTec edition of our âGetting Started with FiftyOneâ workshop series is designed for researchers, engineers, and practitioners working with visual data in agriculture. Through practical examples using a Colombian coffee dataset, youâll gain a deep understanding of data-centric AI workflows tailored to the challenges of the AgTec space. Date and Location * Oct 28, 2025 
 * 9:00-10:30 AM Pacific
 * Online. Register for the Zoom!Want greater visibility into the quality of your computer vision datasets and models? Then join us for this free 90-minute, hands-on workshop to learn how to leverage the open source FiftyOne computer vision toolset. 
 At the end of the workshop, youâll be able to:- Load and visualize agricultural datasets with complex labels
- Explore data insights interactively using embeddings and statistics
- Work with geolocation and map-based visualizations
- Generate high-quality annotations with the Segment Anything Model (SAM2)
- Evaluate model performance and debug predictions using real AgTec scenarios
 Prerequisites: working knowledge of Python and basic computer vision concepts. Resources: All attendees will get access to the tutorials, videos, and code examples used in the workshop. Learn how to: - Visualize complex datasets
- Explore embeddings
- Analyze and improve models
- Perform advanced data curation
- Build integrations with popular ML tools, models, and datasets
 2 attendees from this group
- Network event â˘Online â˘OnlineOct 30 - AI, ML and Computer Vision MeetupOnline54 attendees from 16 groupsJoin the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 
 9 AM Pacific
 Online. Register for the Zoom!The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include: - The Scale Challenge: E-commerce environments expose the limitations of single-point AI solutions, which create fragmented ecosystems lacking cohesion and efficient resource sharing across complex, knowledge-based work.
- Root Cause Analysis Success: Flipkartâs initial AI agent transformed business analysis from days-long investigations to near-instantaneous insights, proving the concept while revealing broader platform opportunities.
- Platform Strategy Evolution: Success across Engineering (SDLC, SRE), Operations, and Commerce teams necessitated a unified, multi-tenant platform serving diverse use cases with consistency and operational efficiency.
- Architectural Foundation: Leveraging framework-agnostic design principles we were able to emphasize modularity, which enabled teams to leverage different AI models while maintaining consistent interfaces and scalable infrastructure.
- The âAgent Gardenâ Vision: Flipkartâs roadmap envisions an internal ecosystem where teams discover, deploy, and contribute AI agents, providing a practical blueprint for scalable AI agent infrastructure development.
 About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in productionâgrade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, weâll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. Weâll cover what works (and what doesnât) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. 
 Sumanâs expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG).
 Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide.Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. 4 attendees from this group
- Network event â˘Online â˘OnlinePhysical AI Data Pipelines with NVIDIA Omniverse NuRec, Cosmos and FiftyOneOnline55 attendees from 16 groupsJoin Voxel51 and NVIDIA as they unveil a breakthrough thatâs changing how Physical AI systems are built. In this first-ever demo featuring NVIDIA Omniverse NuRec and NVIDIA Cosmos with FiftyOne, youâll learn how to create validated, simulation-ready data pipelinesâcutting testing costs, eliminating manual data audits, and accelerating development from months to days. Date and Location Nov 5, 2025 
 9:00-10:30 AM Pacific
 Online. Register for the ZoomDeveloping autonomous vehicles and humanoid robots requires rigorous simulations that capture real-world complexity. The critical barrier that keeps teams from achieving success isnât the simulation engine itself, but the data that powers it. As Physical AI systems ingest petabytes of multisensor data, converting this raw input into validated, simulation-ready data pipelines remains a hidden bottleneck. A camera-to-LiDAR projection off by a few pixels, timestamps misaligned by a few milliseconds, or inaccurate coordinate systems will cascade into flawed neural reconstructions and synthetic data. Without a well-orchestrated data pipeline, even the most advanced simulation platforms end up consuming imperfect data, wasting weeks of effort and thousands of dollars in testing and compute costs. In a first-ever demo featuring NVIDIA Omniverse NuRec and NVIDIA Cosmos with FiftyOne, youâll discover how to: - Eliminate manual data audits with an automated workflow that calibrates, aligns, and ensures data integrity across cameras, LiDAR, radar, and other sensors
- Curate and enrich the data for neural reconstructions and synthetic data generation
- Reduce Physical AI testing and QA costs by up to 80%
- Accelerate Physical AI development from months to days
 Who should attend: - Data Engineers, MLOps & ML Engineers working with Physical AI data
- Technical leaders and Managers driving Physical AI projects from prototype to production
- AV/Robotics Researchers building safety-critical apps with cutting-edge tech
- Product & Strategy leaders seeking to accelerate development while reducing infra costs and risks.
 About the Speakers Itai H Zadok is a Senior Product Manager l Autonomous Vehicles Simulation at NVIDIA Daniel Gural is a Machine Learning Engineer and Evangelist at Voxel51 4 attendees from this group
- Network event â˘Online â˘OnlineNov 6 - Visual Document AI: Because a Pixel is Worth a Thousand TokensOnline12 attendees from 16 groupsJoin us for a virtual event to hear talks from experts on the latest developments in Visual Document AI. Date and Location Nov 6, 2025 
 9-11 AM Pacific
 Online. Register for the Zoom!Document AI: A Review of the Latest Models, Tasks and Tools In this talk, go through everything document AI: trends, models, tasks, tools! By the end of this talk you will be able to get to building apps based on document models About the Speaker Merve Noyan works on multimodal AI and computer vision at Hugging Face, and she's the author of the book Vision Language Models on O'Reilly. Run Document VLMs in Voxel51 with the VLM Run Plugin â PDF to JSON in Seconds The new VLM Run Plugin for Voxel51 enables seamless execution of document vision-language models directly within the Voxel51 environment. This integration transforms complex document workflows â from PDFs and scanned forms to reports â into structured JSON outputs in seconds. By treating documents as images, our approach remains general, scalable, and compatible with any visual model architecture. The plugin connects visual data curation with model inference, empowering teams to run, visualize, and evaluate document understanding models effortlessly. Document AI is now faster, reproducible, and natively integrated into your Voxel51 workflows. About the Speaker Dinesh Reddy is a founding team member of VLM Run, where he is helping nurture the platform from a sapling into a robust ecosystem for running and evaluating vision-language models across modalities. Previously, he was a scientist at Amazon AWS AI, working on large-scale machine learning systems for intelligent document understanding and visual AI. He completed his Ph.D. at the Robotics Institute, Carnegie Mellon University, focusing on combining learning-based methods with 3D computer vision for in-the-wild data. His research has been recognized with the Best Paper Award at IEEE IVS 2021 and fellowships from Amazon Go and Qualcomm. CommonForms: Automatically Making PDFs Fillable Converting static PDFs into fillable forms remains a surprisingly difficult task, even with the best commercial tools available today. We show that with careful dataset curation and model tuning, it is possible to train high-quality form field detectors for under $500. As part of this effort, we introduce CommonForms, a large-scale dataset of nearly half a million curated form images. We also release a family of highly accurate form field detectors, FFDNet-S and FFDNet-L. About the Speaker Joe Barrow is a researcher at Pattern Data, specializing in document AI and information extraction. He previously worked at the Adobe Document Intelligence Lab after receiving his PhD from the University of Maryland in 2022. Visual Document Retrieval: How to Cluster, Search and Uncover Biases in Document Image Datasets Using Embeddings In this talk you'll learn about the task of visual document retrieval, the models which are widely used by the community, and see them in action through the open source FiftyOne App where you'll learn how to use these models to identify groups and clusters of documents, find unique documents, uncover biases in your visual document dataset, and search over your document corpus using natural language. About the Speaker Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. Heâs got a deep interest in VLMs, Visual Agents, Document AI, and Physical AI. 1 attendee from this group
Past events
143




























