Madrid Computer Vision Workshop #
This 2-day intermediate workshop will cover how to apply computer vision technologies to investigatory research. It will provide an introduction, theoretical background, survey of trends and capabilities, exploration of training datasets, and show how to build a computer vision application. There are no prerequisites, but some familiarity with coding is recommended or participants can partner with each other for the technical sections. The workshop is based on research and code developed for the Exposing.ai and VFRAME.io computer vision projects, but the workshop covers a wide of computer vision technologies and topics. Python code examples will be provided in Jupyter notebooks that can be run locally.
Instructor: Adam Harvey / https://adam.harvey.studio

Custom AO-2.5RT cluster munition object detection algorithm developed for VFRAME
Instructor Bio
Day 1 #
Session 1A: Introduction and Theory (11:00 - 13:00)
- Introduction to VFRAME.io
- Introduction to Exposing.ai projects
- Essay On Computer Vision
- Essay on What is a Face
- Essay on Origins and Endpoints of Datasets
- Introductions, project ideas, casual chat followed by lunch
(Lunch break 13:00 - 14.00)
Session 1B: Coputer Vision Now (14:00 - 16:00)
- Commercial and open source computer vision:
- Amazon Rekognition: image and video analysis API
- Azure: paid computer vision APIs
- Azure face recognition demo: Azure face recognition demo and service
- Google OCR: document OCR demo and paid srvice
- PimEyes: face recognition demo and paid service, popular with journalists
- FindClone: face recognition servcie (RU), requires registration, recommeed by Bellingcat
- TinEye: reverse image lookup, useful to find origin of an image
- InVID: reverse image lookup (via Google, Bing, etc…)
- For discussion:
- What are other commercial computer vision demos or services?
- What are advantages/disadvantages of commercial API services?
- Are the capabilities useful for any project you have in mind?
- What would you be willing to pay for these services?
- What happens when you’re working on sensitive material?
- Are there any open source versions of these products?
- Open-source computer vision projects and libraries:
- OpenCV: The most widely used computer vision backend library for >≈ 20 years. Though it’s struggling to keep up with DNN libraries and GPU (NB: OpenCV uses BGR, not RGB)
- Pillow: An easy to use and effecient library for working with images, changing contrast, resizing, display
- Numpy: Data matrix library, since images are matrices, Numpy is widely used for image processing and is highly performant
- Pandas: popular data management, analysis library, like Excel but for Python (can also write Excel documents), uses CSV data format
- ONNX: Open Neural Network Exchange, a cross-platform model format that can be used in many different frameworks. Not as effecient as some platform-specific formats (eg .pt, .pb, .coreml) but more portable
- YOLOV5: popular object detection library. There are many versions of YOLO, this library is updated often and has an active community of contributors (used by VFRAME)
- Machine Learning Frameworks:
- PyTorch: most popular AI/DNN framework (see PyTorch vs Tensorflow (recommended)
- TensorFlow: very capable but somewhat clumsy framework
- PaddlePaddle: Bing’s ML framework
- MXNet: Apache’s ML/DL framework:
- Gluon, by AWS and Microsoft but not very popular
- Models (aka ModelZoos):
- ModelZoo.ca: lists popular models from GitHub
- Modzy: MLOps with overlap to defense contractors
- RunwayML: GUI tool for running many ML models
- ONNX: common moels for ONNX runtime
- HuggingFace: place to run CV/ML demos
- CV tutorials:
- PyImageSearch: informative tutorials on wide range of CV topics. Most are free, paid access for more tutorials (recommended)
- LearnOpenCV: informative tutorials on wide range of CV topics, also offers paid courses (recommended)
- Algorithms:
- Object Detection: YOLOV5, YOLOV4, YOLO, PaddlePaddle
- Image classification: basic example of image classification
- Pose estimation: one of many human pose estimation libraries
- Face recognition: “easy to use” face recognition library
- OCR: Tesseract, EasyOCR, docTR
- Colorization: convert b&w images to color
- Perceptual Hash: library to detect if two images are similar
- Not really used anymore but interesting:
- Histogram of Oriented Gradients: HoG, old school object detection
- Haarcascace detectors: visualized, visualized, CV Dazzle
- Computer vision applications:
- Search engine: Example
- Coffee break and discussions
Session 1C: Datasets (16:30 - 18:00)
- What is a dataset?
- Main sources:
- Papers With Code Datasets (recommended)
- RoboFlow datasets: https://public.roboflow.com/
- Academic Torrents: https://academictorrents.com
- Other, unique datasets:
- Snapshot Serengeti: https://www.zooniverse.org/projects/zooniverse/snapshot-serengeti/talk/subjects/34653937
- TACO dataset of litter: http://tacodataset.org/annotate
- Litter dataset: https://www.imageannotation.ai/litter-dataset
- Beach litter: images of litter on beaches used for semantic segmentation or
- UAV detection: UAV surveillance detection
- FFHQ: face dataset used to generate fake faces (e.g. thispersondoesnotexist.com)
- Places to search for datasets:
- Arxiv.org: pre-print academic research papers
- Semantic Scholar: https://semanticscholar.org, alternative to Google Scholar
- Make your own dataset:
- Issues with datasets:
- Open Images: generic, biased
- COCO: generic, biased
- Exposing.ai: stories about datasets origins and endpoints
- Ugly Truth About Facial Recogntion Datasets
- Short break
Session 1D: Annotations (19:00 - 20:00)
- Installing Conda locally:
- Try installing conda navigator via https://www.anaconda.com/products/individual
- Install git if you don’t already have it
- Download Conda installer https://docs.conda.io/en/latest/miniconda.html
- open terminal and run
bash Miniconda3-latest-MacOSX-x86_64.sh.sh(change this to the name of your .sh file first)
- Tools:
- Assignment: what kind of dataset would you want to create?
Day 2 #
- Session 2A: How to build an Object Detection Algorithm Part 1 (11:00 - 13:00)
- Continue working on annotations
- Convert annotations to unique name to avoid collisions
- Upload/share files
- Setup
cv-workshophttps://github.com/adamhrv/cv-workshop/ - Try running notebooks
- Try running yolo
- Session 2B: How to build an Object Detection Algorithm Part 2 (14:00 - 16:00)
- Upload your YOLO files
here
- create a subfolder with your name (eg
adam) - then upload your folders with .jpg and .txt files
- should look like
adam/1234abcd/1234abcd_00001.jpg,adam/1234abcd/1234abcd_00001.txtetc…
- create a subfolder with your name (eg
- Cloud GPU services
- Verifying data
- Upload your YOLO files
here
- Session 2C: Face Recognition Demo (16:30 - 18:00)
- DeepFace, InsightFace, notebooks
- Session 2D: VFRAME demo (19:00 - 20:00)
- Discussions
- Try to get vrame running
- Test our model