The document describes an automatic image filtering pipeline for social media images posted during crises. The pipeline uses deep learning and perceptual hashing to (1) filter out irrelevant and duplicate images, (2) categorize remaining images into damage levels (severe, mild, little-to-none), and (3) save over 60% of annotation budget by reducing image volume. Evaluation shows the relevancy filter achieves 0.98 AUC and 0.98 F1, while damage assessment achieves 0.72 AUC and 0.67 F1. The pipeline aims to extract actionable information from images in real-time to assist crisis response.
27. Duplicate Filtering (Cont.)
Image Processing Pipeline
• Selected 1100 images for the experiment
• Approach:
– Compute perceptual hash for each image
– Compute hamming distance between a pair of images
Two images are duplicate if:
distance(Vi, Vj) 10
Vi = pHash(Imagei)
Vi ∈ Rn
distance(Vi, Vj) =
n
k=0
|(Vi,k − Vj,k)|
Vi = pHash(Imagei)
Vi ∈ Rn
distance(Vi, Vj) =
n
k=0
|(Vi,k − Vj,k)|
hamming distance(Vi, Vj) =
n
k=1
(Vi,k ̸= Vj,k)
(Lei et al. 2011)
36. Summary
• We propose a real >me image processing pipeline,
which filters
– duplicate, near duplicate, and
– irrelevant images
and categories
– images into severe, mild and liIle-to-no damage
• Relevancy filter model: AUC: 0.98; F1: 0.98
• Damage assessment model: AUC: 0.72; F1: 0.67
• We can save 62% of the whole budget
Summary and Future Works
62% images are
irrelevant and
duplicates
D. T. Nguyen, F. Alam, F. Ofli, M. Imran, “Automa>c image filtering on social networks
using deep learning and perceptual hashing during crises”, ISCRAM 2017.