SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Title of presentation
Subtitle
Name of presenter
Date
Cross-modal Networks, Fine-Tuning, Data Augmentation and Dual
Softmax Operation for MediaEval NewsImages 2023
Antonios Leventakis, Damianos Galanopoulos, Vasileios Mezaris
CERTH-ITI, Thermi - Thessaloniki, Greece
MediaEval 2023 Workshop
1-2 Feb. 2024
2
Our takeaway message
• Our contributions
• Data augmentation: Generated one extra text for every training and testing pair
• Used pre-trained CLIP models
• Also tested fine-tuning CLIP model
• Dual-softmax similarity revision
• Our observations
• Fine-tuning improves performance
• The official results contrast with our internal experiments; important to consider
data’s nature when selecting pre-trained/fine-tuned CLIP model
3
Motivation
• CLIP’s proven capabilities in image-text association
• Fine-tuning’s potential in capturing unique relationships between
images and texts in the news domain
• Data Augmentation could enhance models’ robustness
• Dual softmax as results re-ranking method can improve
performance (also shown in last year’s findings[1])
[1] D. Galanopoulos, V. Mezaris, Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022, in: Working Notes
Proceedings of the MediaEval 2022 Workshop, volume 3583, CEUR Workshop Proceedings, 2023.
4
Approach: CLIP fine-tuning
• Training data collection
• 4.8M image-caption pairs from public datasets in the news domain: NYTimes800K,
N24News, BreakingNews, Al Jazeera Newsi, CNN Newsii, BBC UK Newsiii, Huffpost
Newsiv and Bloombergv
• Data augmentation
• One additional caption was generated for every image via the “T5” attention-based
transformer model[2]; 9.6M image-text pairs in total for training
• Fine-tuning of pre-trained CLIP model
• The “ViT-L/14@336px” model was fine-tuned with the original and the augmented
data with a learning rate of 3e-7 for 1 epoch
ihttps://data.world/opensnippets/al-jazeera-news-dataset, iihttps://data.world/opensnippets/cnn-news-dataset, iiihttps://data.world/opensnippets/bbc-uk-news-dataset,
ivhttps://data.world/crawlfeeds/huffspot-news-dataset vhttps://data.world/crawlfeeds/bloomberg-quint-news-dataset
[2] R. Colin, S. Noam, R. Adam, L. Katherine, N. Sharan, M. Michael, Z. Yanqi, W. Li, P. J. Liu, Exploring the Limits of Transfer Learning with a
Unified Text-to-Text Transformer, in: Journal of Machine Learning Research, 2020, pp. 1–67.
5
Approach: using CLIP
• Pre-trained CLIP models (in addition to fine-tuned one)
• The “ViT-H/14” model of openCLIP and the “ViT-L/14@336px” model of CLIP were
used directly for retrieval
• Inference-stage scores aggregation
• Same data augmentation applied on test data; the similarity scores from the original
and augmented pairs were aggregated via mean pooling to obtain final predictions
• Dual softmax similarity revision
• Dual softmax operations were applied at inference stage to investigate effects on
performance
6
Submitted Runs
Model Fine-tuning Dual Softmax
Run #1 ViT-H/14  ✓
Run #2 ViT-L/14@336px  
Run #3 ViT-L/14@336px  ✓
Run #4 ViT-L/14@336px ✓ 
Run #5 ViT-L/14@336px ✓ ✓
7
Results
8
Results
• ViT-H/14 is more suitable for the
GDELT-P2 dataset
9
Results
• ViT-H/14 is more suitable for the
GDELT-P2 dataset
• Fine-tuning benefits performance
10
Results
• ViT-H/14 is more suitable for the
GDELT-P2 dataset
• Fine-tuning benefits performance
• Different pre-trained CLIP versions
significantly affect the final
performance
11
Results
• ViT-H/14 is more suitable for the
GDELT-P2 dataset
• Handling significant amount of
synthetic images (GDELT-P2) is
probably important to consider
when selecting CLIP version
• Fine-tuning benefits performance
• Different pre-trained CLIP versions
significantly affect the final
performance
• Dual softmax results are mixed
12
Results
• Official results contrast, in part, with our
internal findings:
• Both fine-tuning and dual softmax
benefit performance
13
• CLIP fine-tuning improves performance
• Utilizing different pre-trained CLIP/openCLIP versions could reveal
further possibilities
• Further exploration of fine-tuning strategies could lead to a deeper
understanding on how to effectively adapt pre-trained models to
specific domains and tasks
• Future research could delve into understanding the capabilities and
limitations of pre-trained models in processing synthetic data and
develop strategies to improve performance in such scenarios
Lessons Learned
14
Thank you for your attention!
Questions?
Vasileios Mezaris, bmezaris@iti.gr
This work was supported by the EU’s Horizon Europe and Horizon 2020 research and innovation
programs under grant agreements 101070190 AI4Trust and 101021866 CRiTERIA, respectively.

Weitere ähnliche Inhalte

Ähnlich wie CERTH-ITI at MediaEval 2023 NewsImages Task

NS-CUK Seminar: J.H.Lee, Review on "Scaling Law for Recommendation Models: T...
NS-CUK Seminar: J.H.Lee,  Review on "Scaling Law for Recommendation Models: T...NS-CUK Seminar: J.H.Lee,  Review on "Scaling Law for Recommendation Models: T...
NS-CUK Seminar: J.H.Lee, Review on "Scaling Law for Recommendation Models: T...ssuser4b1f48
 
Crepe Complete -- Slides CMSEBA2014
Crepe Complete -- Slides CMSEBA2014Crepe Complete -- Slides CMSEBA2014
Crepe Complete -- Slides CMSEBA2014Steffen Zschaler
 
Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...
Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...
Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...Ashnikbiz
 
Knowledge Distillation for Federated Learning: a Practical Guide
Knowledge Distillation for Federated Learning: a Practical GuideKnowledge Distillation for Federated Learning: a Practical Guide
Knowledge Distillation for Federated Learning: a Practical GuideXiachongFeng
 
Building a guided analytics forecasting platform with Knime
Building a guided analytics forecasting platform with KnimeBuilding a guided analytics forecasting platform with Knime
Building a guided analytics forecasting platform with KnimeKnoldus Inc.
 
Deep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDeep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDatabricks
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
Multi Layer Federated Learning.pptx
Multi Layer Federated Learning.pptxMulti Layer Federated Learning.pptx
Multi Layer Federated Learning.pptxTimePass43152
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Maurice Nsabimana
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindsporeijdms
 
Government GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsGovernment GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsNeo4j
 
Application-oriented ping-pong benchmarking: how to assess the real communica...
Application-oriented ping-pong benchmarking: how to assess the real communica...Application-oriented ping-pong benchmarking: how to assess the real communica...
Application-oriented ping-pong benchmarking: how to assess the real communica...Trieu Nguyen
 
Ssipa presentation blendhill_gmb_h_ch_v3
Ssipa presentation blendhill_gmb_h_ch_v3Ssipa presentation blendhill_gmb_h_ch_v3
Ssipa presentation blendhill_gmb_h_ch_v3Ciprian Matei
 
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Codemotion
 
SathishKumar Natarajan
SathishKumar NatarajanSathishKumar Natarajan
SathishKumar NatarajanSathish Kumar
 
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfDagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfHong Ong
 
IMAGE CAPTION GENERATOR USING DEEP LEARNING
IMAGE CAPTION GENERATOR USING DEEP LEARNINGIMAGE CAPTION GENERATOR USING DEEP LEARNING
IMAGE CAPTION GENERATOR USING DEEP LEARNINGIRJET Journal
 

Ähnlich wie CERTH-ITI at MediaEval 2023 NewsImages Task (20)

NS-CUK Seminar: J.H.Lee, Review on "Scaling Law for Recommendation Models: T...
NS-CUK Seminar: J.H.Lee,  Review on "Scaling Law for Recommendation Models: T...NS-CUK Seminar: J.H.Lee,  Review on "Scaling Law for Recommendation Models: T...
NS-CUK Seminar: J.H.Lee, Review on "Scaling Law for Recommendation Models: T...
 
Crepe Complete -- Slides CMSEBA2014
Crepe Complete -- Slides CMSEBA2014Crepe Complete -- Slides CMSEBA2014
Crepe Complete -- Slides CMSEBA2014
 
Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...
Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...
Polyglot Persistence and Database Deployment by Sandeep Khuperkar CTO and Dir...
 
short-story.pptx
short-story.pptxshort-story.pptx
short-story.pptx
 
Knowledge Distillation for Federated Learning: a Practical Guide
Knowledge Distillation for Federated Learning: a Practical GuideKnowledge Distillation for Federated Learning: a Practical Guide
Knowledge Distillation for Federated Learning: a Practical Guide
 
Building a guided analytics forecasting platform with Knime
Building a guided analytics forecasting platform with KnimeBuilding a guided analytics forecasting platform with Knime
Building a guided analytics forecasting platform with Knime
 
Deep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDeep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreath
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
Multi Layer Federated Learning.pptx
Multi Layer Federated Learning.pptxMulti Layer Federated Learning.pptx
Multi Layer Federated Learning.pptx
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindspore
 
Government GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsGovernment GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 Standards
 
Application-oriented ping-pong benchmarking: how to assess the real communica...
Application-oriented ping-pong benchmarking: how to assess the real communica...Application-oriented ping-pong benchmarking: how to assess the real communica...
Application-oriented ping-pong benchmarking: how to assess the real communica...
 
Ssipa presentation blendhill_gmb_h_ch_v3
Ssipa presentation blendhill_gmb_h_ch_v3Ssipa presentation blendhill_gmb_h_ch_v3
Ssipa presentation blendhill_gmb_h_ch_v3
 
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
 
SathishKumar Natarajan
SathishKumar NatarajanSathishKumar Natarajan
SathishKumar Natarajan
 
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfDagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
 
IMAGE CAPTION GENERATOR USING DEEP LEARNING
IMAGE CAPTION GENERATOR USING DEEP LEARNINGIMAGE CAPTION GENERATOR USING DEEP LEARNING
IMAGE CAPTION GENERATOR USING DEEP LEARNING
 
CV_Mike Yan
CV_Mike YanCV_Mike Yan
CV_Mike Yan
 
IBM Think Milano
IBM Think MilanoIBM Think Milano
IBM Think Milano
 

Mehr von VasileiosMezaris

Multi-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and LocalizationMulti-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and LocalizationVasileiosMezaris
 
Spatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees VideosSpatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees VideosVasileiosMezaris
 
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...VasileiosMezaris
 
TAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsTAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsVasileiosMezaris
 
Explaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionExplaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionVasileiosMezaris
 
Combining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video SearchCombining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video SearchVasileiosMezaris
 
Explaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersExplaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersVasileiosMezaris
 
Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...VasileiosMezaris
 
Are all combinations equal? Combining textual and visual features with multi...
Are all combinations equal?  Combining textual and visual features with multi...Are all combinations equal?  Combining textual and visual features with multi...
Are all combinations equal? Combining textual and visual features with multi...VasileiosMezaris
 
CA-SUM Video Summarization
CA-SUM Video SummarizationCA-SUM Video Summarization
CA-SUM Video SummarizationVasileiosMezaris
 
Video smart cropping web application
Video smart cropping web applicationVideo smart cropping web application
Video smart cropping web applicationVasileiosMezaris
 
PGL SUM Video Summarization
PGL SUM Video SummarizationPGL SUM Video Summarization
PGL SUM Video SummarizationVasileiosMezaris
 
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal RetrievalHard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal RetrievalVasileiosMezaris
 
Misinformation on the internet: Video and AI
Misinformation on the internet: Video and AIMisinformation on the internet: Video and AI
Misinformation on the internet: Video and AIVasileiosMezaris
 
PoR_evaluation_measure_acm_mm_2020
PoR_evaluation_measure_acm_mm_2020PoR_evaluation_measure_acm_mm_2020
PoR_evaluation_measure_acm_mm_2020VasileiosMezaris
 
GAN-based video summarization
GAN-based video summarizationGAN-based video summarization
GAN-based video summarizationVasileiosMezaris
 
Migration-related video retrieval
Migration-related video retrievalMigration-related video retrieval
Migration-related video retrievalVasileiosMezaris
 

Mehr von VasileiosMezaris (20)

Multi-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and LocalizationMulti-Modal Fusion for Image Manipulation Detection and Localization
Multi-Modal Fusion for Image Manipulation Detection and Localization
 
Spatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees VideosSpatio-Temporal Summarization of 360-degrees Videos
Spatio-Temporal Summarization of 360-degrees Videos
 
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
Masked Feature Modelling for the unsupervised pre-training of a Graph Attenti...
 
TAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for ExplanationsTAME: Trainable Attention Mechanism for Explanations
TAME: Trainable Attention Mechanism for Explanations
 
Gated-ViGAT
Gated-ViGATGated-ViGAT
Gated-ViGAT
 
Explaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionExplaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attention
 
Combining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video SearchCombining textual and visual features for Ad-hoc Video Search
Combining textual and visual features for Ad-hoc Video Search
 
Explaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersExplaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiers
 
Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...
 
Are all combinations equal? Combining textual and visual features with multi...
Are all combinations equal?  Combining textual and visual features with multi...Are all combinations equal?  Combining textual and visual features with multi...
Are all combinations equal? Combining textual and visual features with multi...
 
CA-SUM Video Summarization
CA-SUM Video SummarizationCA-SUM Video Summarization
CA-SUM Video Summarization
 
Video smart cropping web application
Video smart cropping web applicationVideo smart cropping web application
Video smart cropping web application
 
PGL SUM Video Summarization
PGL SUM Video SummarizationPGL SUM Video Summarization
PGL SUM Video Summarization
 
Video Thumbnail Selector
Video Thumbnail SelectorVideo Thumbnail Selector
Video Thumbnail Selector
 
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal RetrievalHard-Negatives Selection Strategy for Cross-Modal Retrieval
Hard-Negatives Selection Strategy for Cross-Modal Retrieval
 
Misinformation on the internet: Video and AI
Misinformation on the internet: Video and AIMisinformation on the internet: Video and AI
Misinformation on the internet: Video and AI
 
LSTM Structured Pruning
LSTM Structured PruningLSTM Structured Pruning
LSTM Structured Pruning
 
PoR_evaluation_measure_acm_mm_2020
PoR_evaluation_measure_acm_mm_2020PoR_evaluation_measure_acm_mm_2020
PoR_evaluation_measure_acm_mm_2020
 
GAN-based video summarization
GAN-based video summarizationGAN-based video summarization
GAN-based video summarization
 
Migration-related video retrieval
Migration-related video retrievalMigration-related video retrieval
Migration-related video retrieval
 

Kürzlich hochgeladen

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 

Kürzlich hochgeladen (20)

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 

CERTH-ITI at MediaEval 2023 NewsImages Task

  • 1. Title of presentation Subtitle Name of presenter Date Cross-modal Networks, Fine-Tuning, Data Augmentation and Dual Softmax Operation for MediaEval NewsImages 2023 Antonios Leventakis, Damianos Galanopoulos, Vasileios Mezaris CERTH-ITI, Thermi - Thessaloniki, Greece MediaEval 2023 Workshop 1-2 Feb. 2024
  • 2. 2 Our takeaway message • Our contributions • Data augmentation: Generated one extra text for every training and testing pair • Used pre-trained CLIP models • Also tested fine-tuning CLIP model • Dual-softmax similarity revision • Our observations • Fine-tuning improves performance • The official results contrast with our internal experiments; important to consider data’s nature when selecting pre-trained/fine-tuned CLIP model
  • 3. 3 Motivation • CLIP’s proven capabilities in image-text association • Fine-tuning’s potential in capturing unique relationships between images and texts in the news domain • Data Augmentation could enhance models’ robustness • Dual softmax as results re-ranking method can improve performance (also shown in last year’s findings[1]) [1] D. Galanopoulos, V. Mezaris, Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022, in: Working Notes Proceedings of the MediaEval 2022 Workshop, volume 3583, CEUR Workshop Proceedings, 2023.
  • 4. 4 Approach: CLIP fine-tuning • Training data collection • 4.8M image-caption pairs from public datasets in the news domain: NYTimes800K, N24News, BreakingNews, Al Jazeera Newsi, CNN Newsii, BBC UK Newsiii, Huffpost Newsiv and Bloombergv • Data augmentation • One additional caption was generated for every image via the “T5” attention-based transformer model[2]; 9.6M image-text pairs in total for training • Fine-tuning of pre-trained CLIP model • The “ViT-L/14@336px” model was fine-tuned with the original and the augmented data with a learning rate of 3e-7 for 1 epoch ihttps://data.world/opensnippets/al-jazeera-news-dataset, iihttps://data.world/opensnippets/cnn-news-dataset, iiihttps://data.world/opensnippets/bbc-uk-news-dataset, ivhttps://data.world/crawlfeeds/huffspot-news-dataset vhttps://data.world/crawlfeeds/bloomberg-quint-news-dataset [2] R. Colin, S. Noam, R. Adam, L. Katherine, N. Sharan, M. Michael, Z. Yanqi, W. Li, P. J. Liu, Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, in: Journal of Machine Learning Research, 2020, pp. 1–67.
  • 5. 5 Approach: using CLIP • Pre-trained CLIP models (in addition to fine-tuned one) • The “ViT-H/14” model of openCLIP and the “ViT-L/14@336px” model of CLIP were used directly for retrieval • Inference-stage scores aggregation • Same data augmentation applied on test data; the similarity scores from the original and augmented pairs were aggregated via mean pooling to obtain final predictions • Dual softmax similarity revision • Dual softmax operations were applied at inference stage to investigate effects on performance
  • 6. 6 Submitted Runs Model Fine-tuning Dual Softmax Run #1 ViT-H/14  ✓ Run #2 ViT-L/14@336px   Run #3 ViT-L/14@336px  ✓ Run #4 ViT-L/14@336px ✓  Run #5 ViT-L/14@336px ✓ ✓
  • 8. 8 Results • ViT-H/14 is more suitable for the GDELT-P2 dataset
  • 9. 9 Results • ViT-H/14 is more suitable for the GDELT-P2 dataset • Fine-tuning benefits performance
  • 10. 10 Results • ViT-H/14 is more suitable for the GDELT-P2 dataset • Fine-tuning benefits performance • Different pre-trained CLIP versions significantly affect the final performance
  • 11. 11 Results • ViT-H/14 is more suitable for the GDELT-P2 dataset • Handling significant amount of synthetic images (GDELT-P2) is probably important to consider when selecting CLIP version • Fine-tuning benefits performance • Different pre-trained CLIP versions significantly affect the final performance • Dual softmax results are mixed
  • 12. 12 Results • Official results contrast, in part, with our internal findings: • Both fine-tuning and dual softmax benefit performance
  • 13. 13 • CLIP fine-tuning improves performance • Utilizing different pre-trained CLIP/openCLIP versions could reveal further possibilities • Further exploration of fine-tuning strategies could lead to a deeper understanding on how to effectively adapt pre-trained models to specific domains and tasks • Future research could delve into understanding the capabilities and limitations of pre-trained models in processing synthetic data and develop strategies to improve performance in such scenarios Lessons Learned
  • 14. 14 Thank you for your attention! Questions? Vasileios Mezaris, bmezaris@iti.gr This work was supported by the EU’s Horizon Europe and Horizon 2020 research and innovation programs under grant agreements 101070190 AI4Trust and 101021866 CRiTERIA, respectively.