Madrid Computer Vision Workshop #

This 2-day intermediate workshop will cover how to apply computer vision technologies to investigatory research. It will provide an introduction, theoretical background, survey of trends and capabilities, exploration of training datasets, and show how to build a computer vision application. There are no prerequisites, but some familiarity with coding is recommended or participants can partner with each other for the technical sections. The workshop is based on research and code developed for the Exposing.ai and VFRAME.io computer vision projects, but the workshop covers a wide of computer vision technologies and topics. Python code examples will be provided in Jupyter notebooks that can be run locally.

Instructor: Adam Harvey / https://adam.harvey.studio

Custom AO-2.5RT cluster munition object detection algorithm developed for VFRAME

Instructor Bio

Adam Harvey is a computer vision researcher, software developer, and the founder of the VFRAME.io project. He developed what could be considered the first well known computer vision hack in 2010 by reverse engineering face detection algorithms to create computer vision camouflage. In 2017 Harvey started VFRAME to help bridge the gap between commercial computer vision technologies and their application to human rights research. He also runs the Exposing.ai research project that investigates the origins and endpoints of training data used in face recognition systems. His computer vision research has appeared in the New York Times, Financial Times, Wall Street Journal, and received an Award of Distinction from Ars Electronica.

Day 1 #

Session 1A: Introduction and Theory (11:00 - 13:00)

Introduction to VFRAME.io
Introduction to Exposing.ai projects
Essay On Computer Vision
Essay on What is a Face
Essay on Origins and Endpoints of Datasets
Introductions, project ideas, casual chat followed by lunch

(Lunch break 13:00 - 14.00)

Session 1B: Coputer Vision Now (14:00 - 16:00)

Commercial and open source computer vision:
- Amazon Rekognition: image and video analysis API
- Azure: paid computer vision APIs
- Azure face recognition demo: Azure face recognition demo and service
- Google OCR: document OCR demo and paid srvice
- PimEyes: face recognition demo and paid service, popular with journalists
- FindClone: face recognition servcie (RU), requires registration, recommeed by Bellingcat
- TinEye: reverse image lookup, useful to find origin of an image
- InVID: reverse image lookup (via Google, Bing, etc…)
- For discussion:
  - What are other commercial computer vision demos or services?
  - What are advantages/disadvantages of commercial API services?
  - Are the capabilities useful for any project you have in mind?
  - What would you be willing to pay for these services?
  - What happens when you’re working on sensitive material?
  - Are there any open source versions of these products?
Open-source computer vision projects and libraries:
- OpenCV: The most widely used computer vision backend library for >≈ 20 years. Though it’s struggling to keep up with DNN libraries and GPU (NB: OpenCV uses BGR, not RGB)
- Pillow: An easy to use and effecient library for working with images, changing contrast, resizing, display
- Numpy: Data matrix library, since images are matrices, Numpy is widely used for image processing and is highly performant
- Pandas: popular data management, analysis library, like Excel but for Python (can also write Excel documents), uses CSV data format
- ONNX: Open Neural Network Exchange, a cross-platform model format that can be used in many different frameworks. Not as effecient as some platform-specific formats (eg .pt, .pb, .coreml) but more portable
- YOLOV5: popular object detection library. There are many versions of YOLO, this library is updated often and has an active community of contributors (used by VFRAME)
Machine Learning Frameworks:
- PyTorch: most popular AI/DNN framework (see PyTorch vs Tensorflow (recommended)
- TensorFlow: very capable but somewhat clumsy framework
- PaddlePaddle: Bing’s ML framework
- MXNet: Apache’s ML/DL framework:
- Gluon, by AWS and Microsoft but not very popular
Models (aka ModelZoos):
- ModelZoo.ca: lists popular models from GitHub
- Modzy: MLOps with overlap to defense contractors
- RunwayML: GUI tool for running many ML models
- ONNX: common moels for ONNX runtime
- HuggingFace: place to run CV/ML demos
CV tutorials:
- PyImageSearch: informative tutorials on wide range of CV topics. Most are free, paid access for more tutorials (recommended)
- LearnOpenCV: informative tutorials on wide range of CV topics, also offers paid courses (recommended)
Algorithms:
- Object Detection: YOLOV5, YOLOV4, YOLO, PaddlePaddle
- Image classification: basic example of image classification
- Pose estimation: one of many human pose estimation libraries
- Face recognition: “easy to use” face recognition library
- OCR: Tesseract, EasyOCR, docTR
- Colorization: convert b&w images to color
- Perceptual Hash: library to detect if two images are similar
- Not really used anymore but interesting:
  - Histogram of Oriented Gradients: HoG, old school object detection
  - Haarcascace detectors: visualized, visualized, CV Dazzle
- Computer vision applications:
  - Search engine: Example
Coffee break and discussions

Session 1C: Datasets (16:30 - 18:00)

What is a dataset?
Main sources:
- Papers With Code Datasets (recommended)
- RoboFlow datasets: https://public.roboflow.com/
- Academic Torrents: https://academictorrents.com
Other, unique datasets:
- Snapshot Serengeti: https://www.zooniverse.org/projects/zooniverse/snapshot-serengeti/talk/subjects/34653937
- TACO dataset of litter: http://tacodataset.org/annotate
- Litter dataset: https://www.imageannotation.ai/litter-dataset
- Beach litter: images of litter on beaches used for semantic segmentation or
- UAV detection: UAV surveillance detection
- FFHQ: face dataset used to generate fake faces (e.g. thispersondoesnotexist.com)
Places to search for datasets:
- Arxiv.org: pre-print academic research papers
- Semantic Scholar: https://semanticscholar.org, alternative to Google Scholar
Make your own dataset:
- https://skybot.cam
Issues with datasets:
- Open Images: generic, biased
- COCO: generic, biased
- Exposing.ai: stories about datasets origins and endpoints
- Ugly Truth About Facial Recogntion Datasets
Short break

Session 1D: Annotations (19:00 - 20:00)

Installing Conda locally:
- Try installing conda navigator via https://www.anaconda.com/products/individual
- Install git if you don’t already have it
- Download Conda installer https://docs.conda.io/en/latest/miniconda.html
- open terminal and run bash Miniconda3-latest-MacOSX-x86_64.sh.sh (change this to the name of your .sh file first)
Tools:
- LabelImg: popular tool for image labeling on local machine (recommended)
- VGG Via: easy to start with browser based image annotation
- CVAT: Intel’s open source annotation tool, too complex for simple projects, but popular for larger projects
- RoboFlow: one of many paid image annotation tools
Assignment: what kind of dataset would you want to create?

Day 2 #

Session 2A: How to build an Object Detection Algorithm Part 1 (11:00 - 13:00)
- Continue working on annotations
- Convert annotations to unique name to avoid collisions
- Upload/share files
- Setup cv-workshop https://github.com/adamhrv/cv-workshop/
- Try running notebooks
- Try running yolo
Session 2B: How to build an Object Detection Algorithm Part 2 (14:00 - 16:00)
- Upload your YOLO files here
  - create a subfolder with your name (eg adam)
  - then upload your folders with .jpg and .txt files
  - should look like adam/1234abcd/1234abcd_00001.jpg, adam/1234abcd/1234abcd_00001.txt etc…
- Cloud GPU services
- Verifying data
Session 2C: Face Recognition Demo (16:30 - 18:00)
- DeepFace, InsightFace, notebooks
Session 2D: VFRAME demo (19:00 - 20:00)
- Discussions
- Try to get vrame running
- Test our model