SlideShare ist ein Scribd-Unternehmen logo
1 von 4
Im2Text: Describing Images Using
1 Million Captioned Photographs
 Vicente Ordonez (presenter), Girish Kulkarni, Tamara L. Berg
                   Stony Brook University

                                                  sky
                                                  trees
                                                  water
                             Computer Vision      building
                                                  bridge

                            Our Goal
                                        An old bridge over dirty green water.

                                        One of the many stone bridges in town
                                        that carry the gravel carriage roads.

                                        A stone bridge over a peaceful river.
Harness the Web!
                                                                          Matching using Global
                                                                          Image Features
                           SBU Captioned Photo Dataset                    (GIST + Color)
                           1 million captioned images!




                                  Smallest house in paris    Bridge to temple in   A walk around the
                                  between red (on right)     Hoan Kiem lake.       lake near our house
                                  and beige (on left).                             with Abby.




  Transfer Caption(s)
e.g. “The water is clear                                     Hangzhou bridge in    The daintree river by
enough to see fish                      The water is clear
                                                             West lake.            boat.

swimming around in it.”                 enough to see
                                        fish swimming
                                        around in it.             ...
Use High Level Content to Rerank
(Objects, Stuff, People, Scenes, Captions)




                             The bridge over the       Iron bridge over the Duck
                             lake on Suzhou Street.    river.




  Transfer Caption(s)
e.g. “The bridge over the   The Daintree river by boat. Bridge over Cacapon river.
lake on Suzhou Street.”
                                                      ...
Results
                        Good                                          Bad




                                A female Mallard duck in
                                the lake at Luukki Espoo.
Amazing colours in the sky at
sunset with the orange of                                       The cat in the window.
the cloud and the blue of the
sky behind.




                                                            The boat ended up a kilometre
                                Fresh fruit and
                                                            from the water in the middle of
                                vegetables at the market
                                                            the airstrip.
         Cat in sink.           in Port Louis Mauritius.

Weitere ähnliche Inhalte

Mehr von Vicente Ordonez

Visual Saliency: Learning to Detect Salient Objects
Visual Saliency: Learning to Detect Salient ObjectsVisual Saliency: Learning to Detect Salient Objects
Visual Saliency: Learning to Detect Salient ObjectsVicente Ordonez
 
Contenido Generado Por Los Usuarios
Contenido Generado Por Los UsuariosContenido Generado Por Los Usuarios
Contenido Generado Por Los UsuariosVicente Ordonez
 
Google Earth Maps Api Barcamp Quito 2009
Google Earth Maps Api Barcamp Quito 2009Google Earth Maps Api Barcamp Quito 2009
Google Earth Maps Api Barcamp Quito 2009Vicente Ordonez
 
Sistema de Recuperacion de Audio
Sistema de Recuperacion de AudioSistema de Recuperacion de Audio
Sistema de Recuperacion de AudioVicente Ordonez
 
Transmision de Vídeo por Red / Internet
Transmision de Vídeo por Red / InternetTransmision de Vídeo por Red / Internet
Transmision de Vídeo por Red / InternetVicente Ordonez
 
Buscadores de Podcast en Internet
Buscadores de Podcast en InternetBuscadores de Podcast en Internet
Buscadores de Podcast en InternetVicente Ordonez
 
Portal Concepts and .NET Webparts
Portal Concepts and .NET WebpartsPortal Concepts and .NET Webparts
Portal Concepts and .NET WebpartsVicente Ordonez
 

Mehr von Vicente Ordonez (14)

Texture Synthesis
Texture SynthesisTexture Synthesis
Texture Synthesis
 
Visual Saliency: Learning to Detect Salient Objects
Visual Saliency: Learning to Detect Salient ObjectsVisual Saliency: Learning to Detect Salient Objects
Visual Saliency: Learning to Detect Salient Objects
 
Contenido Generado Por Los Usuarios
Contenido Generado Por Los UsuariosContenido Generado Por Los Usuarios
Contenido Generado Por Los Usuarios
 
Pantallas Plasma vs LCD
Pantallas Plasma vs LCDPantallas Plasma vs LCD
Pantallas Plasma vs LCD
 
Google Earth Maps Api Barcamp Quito 2009
Google Earth Maps Api Barcamp Quito 2009Google Earth Maps Api Barcamp Quito 2009
Google Earth Maps Api Barcamp Quito 2009
 
Sistema de Recuperacion de Audio
Sistema de Recuperacion de AudioSistema de Recuperacion de Audio
Sistema de Recuperacion de Audio
 
Suenaemprendevive
SuenaemprendeviveSuenaemprendevive
Suenaemprendevive
 
MapReduce
MapReduceMapReduce
MapReduce
 
Robotica
RoboticaRobotica
Robotica
 
Transmision de Vídeo por Red / Internet
Transmision de Vídeo por Red / InternetTransmision de Vídeo por Red / Internet
Transmision de Vídeo por Red / Internet
 
Buscadores de Podcast en Internet
Buscadores de Podcast en InternetBuscadores de Podcast en Internet
Buscadores de Podcast en Internet
 
Sistemas Operativos 3D
Sistemas Operativos 3DSistemas Operativos 3D
Sistemas Operativos 3D
 
Ajax Atlas
Ajax AtlasAjax Atlas
Ajax Atlas
 
Portal Concepts and .NET Webparts
Portal Concepts and .NET WebpartsPortal Concepts and .NET Webparts
Portal Concepts and .NET Webparts
 

Kürzlich hochgeladen

How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 

Kürzlich hochgeladen (20)

How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 

Im2Text: Describing Images Using 1 Million Captioned Photographs

  • 1. Im2Text: Describing Images Using 1 Million Captioned Photographs Vicente Ordonez (presenter), Girish Kulkarni, Tamara L. Berg Stony Brook University sky trees water Computer Vision building bridge Our Goal An old bridge over dirty green water. One of the many stone bridges in town that carry the gravel carriage roads. A stone bridge over a peaceful river.
  • 2. Harness the Web! Matching using Global Image Features SBU Captioned Photo Dataset (GIST + Color) 1 million captioned images! Smallest house in paris Bridge to temple in A walk around the between red (on right) Hoan Kiem lake. lake near our house and beige (on left). with Abby. Transfer Caption(s) e.g. “The water is clear Hangzhou bridge in The daintree river by enough to see fish The water is clear West lake. boat. swimming around in it.” enough to see fish swimming around in it. ...
  • 3. Use High Level Content to Rerank (Objects, Stuff, People, Scenes, Captions) The bridge over the Iron bridge over the Duck lake on Suzhou Street. river. Transfer Caption(s) e.g. “The bridge over the The Daintree river by boat. Bridge over Cacapon river. lake on Suzhou Street.” ...
  • 4. Results Good Bad A female Mallard duck in the lake at Luukki Espoo. Amazing colours in the sky at sunset with the orange of The cat in the window. the cloud and the blue of the sky behind. The boat ended up a kilometre Fresh fruit and from the water in the middle of vegetables at the market the airstrip. Cat in sink. in Port Louis Mauritius.

Hinweis der Redaktion

  1. Most computer vision methods deal with the problem of identifying individual pieces of information but do not output the same type of output you would expect from a human. From this picture a good computer vision system would identify sky, trees, water, building, perhaps even bridge but a person on the other hand would say things about this picture like “a stone bridge over a peaceful river”. So our goal in this paper is to generate image descriptions as opposed to generate the individual pieces of information that computer vision methods would usually output.
  2. We approach this task in a data-driven manner by first building a 1 million dataset of images with visually relevant captions. We construct this dataset by collecting an enormous amount of captions assigned to images by web users and filtering these captions in such a way that we end up with captions that are more likely to refer to visual content. We use standard global image feature descriptors such as GIST and Tinyimages to retrieve similar images from which we can directly transfer captions.
  3. Additionally we incorporate high level information to rerank the retrieved images used by the previous baseline method by running object detectors, scene classification, stuff detection, people and action detection and computing text statistics. So in this example we have a bridge and a water detections, we use those to match them with similar detections in the retrieved set of images. As you can see here we run object detectors in our retrieved images only if a relevant keyword is mentioned. Text statistics are also relevant because if in the retrieved set a lot of images agree that there is a bridge then those images are rewarded in the final ranking as well. And then again we can transfer captions from this reranked set of images.
  4. Finally here are some good and bad results obtained using our full approach. The first picture says Amazing colours in the sky at sunset with the orange of the cloud and the blue of the sky behind. The captions are very human like because they were written by actual humans. And it works suprisingly well for a some types of images. On the other hand even with 1 million images we can’t generalize to all possible observable images and also our image matching methods can fail thus leading to bad results. If you would like to check in more detail our quantitative results please come to our poster. Thanks.