SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Content-based recommendation
 The requirement
 some information about the available items such as the genre ("content")
 some sort of user profile describing what the user likes (the preferences)
• “Similarity” is computed from item attributes, e.g.,
• Similarity of movies by actors, director, genre
• Similarity of text by words, topics
• Similarity of music by genre, year
 The task:
 learn user preferences
 locate/recommend items that are "similar" to the user preferences
"show me more
of the same
what I've liked"
• Most Content Based-recommendation techniques were applied to recommending
text documents.
• Like web pages or newsgroup messages for example.
• Content of items can also be represented as text documents.
• With textual descriptions of their basic characteristics.
• Structured: Each item is described by the same set of attributes
Title Genre Author Type Price Keywords
The Night of
the Gun
Memoir David Carr Paperback 29.90 Press and journalism, drug addiction,
personal memoirs, New York
The Lace
Reader
Fiction,
Mystery
Brunonia Barry Hardcover 49.90 American contemporary fiction, detective,
historical
Into the Fire Romance,
Suspense
Suzanne
Brockmann
Hardcover 45.90 American fiction, murder, neo-Nazism
 Item representation
Content representation and item similarities
• Approach
• Compute the similarity of an unseen item with
the user profile based on the keyword overlap
(e.g. using the Dice coefficient)
• Or use and combine multiple metrics
Title Genre Author Type Price Keywords
The Night of
the Gun
Memoir David Carr Paperback 29.90 Press and journalism, drug
addiction, personal memoirs,
New York
The Lace
Reader
Fiction,
Mystery
Brunonia Barry Hardcover 49.90 American contemporary fiction,
detective, historical
Into the Fire Romance,
Suspense
Suzanne
Brockmann
Hardcover 45.90 American fiction, murder, neo-
Nazism
 User profile
Title Genre Author Type Price Keywords
… Fiction Brunonia,
Barry, Ken
Follett
Paperback 25.65 Detective, murder,
New York
𝟐 × 𝒌𝒆𝒚𝒘𝒐𝒓𝒅𝒔 𝒃𝒊 ∩ 𝒌𝒆𝒚𝒘𝒐𝒓𝒅𝒔 𝒃𝒋
𝒌𝒆𝒚𝒘𝒐𝒓𝒅𝒔 𝒃𝒊 + 𝒌𝒆𝒚𝒘𝒐𝒓𝒅𝒔 𝒃𝒋
𝑘𝑒𝑦𝑤𝑜𝑟𝑑𝑠 𝑏𝑗
describes Book 𝑏𝑗
with a set of
keywords
Term-Frequency - Inverse Document
Frequency (𝑻𝑭 − 𝑰𝑫𝑭)
• Simple keyword representation has its problems
• in particular when automatically extracted as
• not every word has similar importance
• longer documents have a higher chance to have an overlap with the user profile
• Standard measure: TF-IDF
• Encodes text documents in multi-dimensional Euclidian space
• weighted term vector
• TF: Measures, how often a term appears (density in a document)
• assuming that important terms appear more often
• normalization has to be done in order to take document length into account
• IDF: Aims to reduce the weight of terms that appear in all documents
• Given a keyword 𝑖 and a document 𝑗
• 𝑇𝐹 𝑖, 𝑗
• term frequency of keyword 𝑖 in document 𝑗
• 𝐼𝐷𝐹 𝑖
• inverse document frequency calculated as 𝑰𝑫𝑭 𝒊 = 𝒍𝒐𝒈
𝑵
𝒏 𝒊
• 𝑁 : number of all recommendable documents
• 𝑛 𝑖 : number of documents from 𝑁 in which keyword 𝑖 appears
• 𝑇𝐹 − 𝐼𝐷𝐹
• is calculated as: 𝑻𝑭-𝑰𝑫𝑭 𝒊, 𝒋 = 𝑻𝑭 𝒊, 𝒋 ∗ 𝑰𝑫𝑭 𝒊
Term-Frequency - Inverse Document
Frequency (𝑻𝑭 − 𝑰𝑫𝑭)
Cosine similarity
• Usual similarity metric to compare vectors: Cosine similarity (angle)
• Cosine similarity is calculated based on the angle between the vectors
• 𝑠𝑖𝑚 𝑎, 𝑏 =
𝑎∙𝑏
𝑎 ∗ 𝑏
• Adjusted cosine similarity
• take average user ratings into account ( 𝑟𝑢), transform the original ratings
• U: set of users who have rated both items a and b
• 𝑠𝑖𝑚 𝑎, 𝑏 = 𝑢∈𝑈 𝑟 𝑢,𝑎− 𝑟 𝑢 𝑟 𝑢,𝑏− 𝑟 𝑢
𝑢∈𝑈 𝑟 𝑢,𝑎− 𝑟 𝑢
2
𝑢∈𝑈 𝑟 𝑢,𝑏− 𝑟 𝑢
2
An example for computing cosine similarity of annotations
To calculate cosine similarity between two texts t1 and t2, they are
transformed
in vectors as shown in the Table
Probabilistic methods
Calculation of probabilities in simplistic approach
Item1 Item2 Item3 Item4 Item5
Alice 1 3 3 2 ?
User1 2 4 2 2 4
User2 1 3 3 5 1
User3 4 5 2 3 3
User4 1 1 5 2 1
X = (Item1 =1, Item2=3, Item3= … )
Item1 Item5
Alice 2 ?
User1 1 2
 Idea of Slope One predictors is simple and is based on a popularity
differential between items for users
 Example:
 p(Alice, Item5) =
 Basic scheme: Take the average of these differences of the co-ratings to
make the prediction
 In general: Find a function of the form f(x) = x + b
Slope One predictors
-
2 + ( 2 - 1 ) = 3
Relevant
Nonrelevant
• Most learning methods aim to find coefficients of a linear model
• A simplified classifier with only two dimensions can be represented by a line
 Other linear classifiers:
– Naive Bayes classifier, Rocchio method, Windrow-Hoff algorithm, Support vector machines
Linear classifiers
 The line has the form 𝒘 𝟏 𝒙 𝟏 + 𝒘 𝟐 𝒙 𝟐 = 𝒃
– 𝑥1 and 𝑥2 correspond to the vector
representation of a document (using e.g. TF-IDF
weights)
– 𝑤1, 𝑤2 and 𝑏 are parameters to be learned
– Classification of a document based on checking
𝑤1 𝑥1 + 𝑤2 𝑥2 > 𝑏
 In n-dimensional space the classification
function is 𝑤 𝑇 𝑥 = 𝑏
– Mean Absolute Error (MAE) computes the deviation
between predicted ratings and actual ratings
– Root Mean Square Error (RMSE) is similar to MAE,
but places more emphasis on larger deviation
Metrics Measure Error rate
Next …
• Hybrid recommendation systems
• More theories
• Boolean and Vector Space
Retrieval Models
• Clustering
• Data mining
• And so on

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation SystemsTrieu Nguyen
 
How to Build Recommender System with Content based Filtering
How to Build Recommender System with Content based FilteringHow to Build Recommender System with Content based Filtering
How to Build Recommender System with Content based FilteringVõ Duy Tuấn
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systemsAravindharamanan S
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filteringD Yogendra Rao
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System ExplainedCrossing Minds
 
Classifying Text using CNN
Classifying Text using CNNClassifying Text using CNN
Classifying Text using CNNSomnath Banerjee
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender SystemsLior Rokach
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systemsFalitokiniaina Rabearison
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems BasicsJarin Tasnim Khan
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measuresankit_ppt
 
Recommendation systems
Recommendation systemsRecommendation systems
Recommendation systemsSaurabhWani6
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFYusuke Yamamoto
 
Recommendation system
Recommendation system Recommendation system
Recommendation system Vikrant Arya
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithmsnextlib
 

Was ist angesagt? (20)

Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
 
How to Build Recommender System with Content based Filtering
How to Build Recommender System with Content based FilteringHow to Build Recommender System with Content based Filtering
How to Build Recommender System with Content based Filtering
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 
Classifying Text using CNN
Classifying Text using CNNClassifying Text using CNN
Classifying Text using CNN
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Text Classification
Text ClassificationText Classification
Text Classification
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems Basics
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
Recommendation systems
Recommendation systemsRecommendation systems
Recommendation systems
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CF
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 

Ähnlich wie Content based filtering

Book Recommendation Engine
Book Recommendation EngineBook Recommendation Engine
Book Recommendation EngineShravaniBheema
 
sa-mincut-aditya.ppt
sa-mincut-aditya.pptsa-mincut-aditya.ppt
sa-mincut-aditya.pptaashnareddy1
 
Publish or Perish: Towards a Ranking of Scientists using Bibliographic Data ...
Publish or Perish:  Towards a Ranking of Scientists using Bibliographic Data ...Publish or Perish:  Towards a Ranking of Scientists using Bibliographic Data ...
Publish or Perish: Towards a Ranking of Scientists using Bibliographic Data ...Lior Rokach
 
Digital Image Processing.pptx
Digital Image Processing.pptxDigital Image Processing.pptx
Digital Image Processing.pptxMukhtiarKhan5
 
Scalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSHScalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSHMaruf Aytekin
 
Simple semantics in topic detection and tracking
Simple semantics in topic detection and trackingSimple semantics in topic detection and tracking
Simple semantics in topic detection and trackingGeorge Ang
 
IRT Unit_ 2.pptx
IRT Unit_ 2.pptxIRT Unit_ 2.pptx
IRT Unit_ 2.pptxthenmozhip8
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text ClassificationSai Srinivas Kotni
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And ClusteringDataminingTools Inc
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clusteringguest0edcaf
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And ClusteringDatamining Tools
 
Recommender systems
Recommender systemsRecommender systems
Recommender systemsTamer Rezk
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用台灣資料科學年會
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationArjen de Vries
 
Textual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNNTextual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNNRounak Dhaneriya
 

Ähnlich wie Content based filtering (20)

Book Recommendation Engine
Book Recommendation EngineBook Recommendation Engine
Book Recommendation Engine
 
sa-mincut-aditya.ppt
sa-mincut-aditya.pptsa-mincut-aditya.ppt
sa-mincut-aditya.ppt
 
sa.ppt
sa.pptsa.ppt
sa.ppt
 
sa-mincut-aditya.ppt
sa-mincut-aditya.pptsa-mincut-aditya.ppt
sa-mincut-aditya.ppt
 
Recommenders.ppt
Recommenders.pptRecommenders.ppt
Recommenders.ppt
 
Recommenders.ppt
Recommenders.pptRecommenders.ppt
Recommenders.ppt
 
Publish or Perish: Towards a Ranking of Scientists using Bibliographic Data ...
Publish or Perish:  Towards a Ranking of Scientists using Bibliographic Data ...Publish or Perish:  Towards a Ranking of Scientists using Bibliographic Data ...
Publish or Perish: Towards a Ranking of Scientists using Bibliographic Data ...
 
Digital Image Processing.pptx
Digital Image Processing.pptxDigital Image Processing.pptx
Digital Image Processing.pptx
 
Scalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSHScalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSH
 
Simple semantics in topic detection and tracking
Simple semantics in topic detection and trackingSimple semantics in topic detection and tracking
Simple semantics in topic detection and tracking
 
Filtering content bbased crs
Filtering content bbased crsFiltering content bbased crs
Filtering content bbased crs
 
IRT Unit_ 2.pptx
IRT Unit_ 2.pptxIRT Unit_ 2.pptx
IRT Unit_ 2.pptx
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text Classification
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 
Textual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNNTextual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNN
 

Kürzlich hochgeladen

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Kürzlich hochgeladen (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Content based filtering

  • 1. Content-based recommendation  The requirement  some information about the available items such as the genre ("content")  some sort of user profile describing what the user likes (the preferences) • “Similarity” is computed from item attributes, e.g., • Similarity of movies by actors, director, genre • Similarity of text by words, topics • Similarity of music by genre, year  The task:  learn user preferences  locate/recommend items that are "similar" to the user preferences "show me more of the same what I've liked"
  • 2. • Most Content Based-recommendation techniques were applied to recommending text documents. • Like web pages or newsgroup messages for example. • Content of items can also be represented as text documents. • With textual descriptions of their basic characteristics. • Structured: Each item is described by the same set of attributes Title Genre Author Type Price Keywords The Night of the Gun Memoir David Carr Paperback 29.90 Press and journalism, drug addiction, personal memoirs, New York The Lace Reader Fiction, Mystery Brunonia Barry Hardcover 49.90 American contemporary fiction, detective, historical Into the Fire Romance, Suspense Suzanne Brockmann Hardcover 45.90 American fiction, murder, neo-Nazism
  • 3.  Item representation Content representation and item similarities • Approach • Compute the similarity of an unseen item with the user profile based on the keyword overlap (e.g. using the Dice coefficient) • Or use and combine multiple metrics Title Genre Author Type Price Keywords The Night of the Gun Memoir David Carr Paperback 29.90 Press and journalism, drug addiction, personal memoirs, New York The Lace Reader Fiction, Mystery Brunonia Barry Hardcover 49.90 American contemporary fiction, detective, historical Into the Fire Romance, Suspense Suzanne Brockmann Hardcover 45.90 American fiction, murder, neo- Nazism  User profile Title Genre Author Type Price Keywords … Fiction Brunonia, Barry, Ken Follett Paperback 25.65 Detective, murder, New York 𝟐 × 𝒌𝒆𝒚𝒘𝒐𝒓𝒅𝒔 𝒃𝒊 ∩ 𝒌𝒆𝒚𝒘𝒐𝒓𝒅𝒔 𝒃𝒋 𝒌𝒆𝒚𝒘𝒐𝒓𝒅𝒔 𝒃𝒊 + 𝒌𝒆𝒚𝒘𝒐𝒓𝒅𝒔 𝒃𝒋 𝑘𝑒𝑦𝑤𝑜𝑟𝑑𝑠 𝑏𝑗 describes Book 𝑏𝑗 with a set of keywords
  • 4. Term-Frequency - Inverse Document Frequency (𝑻𝑭 − 𝑰𝑫𝑭) • Simple keyword representation has its problems • in particular when automatically extracted as • not every word has similar importance • longer documents have a higher chance to have an overlap with the user profile • Standard measure: TF-IDF • Encodes text documents in multi-dimensional Euclidian space • weighted term vector • TF: Measures, how often a term appears (density in a document) • assuming that important terms appear more often • normalization has to be done in order to take document length into account • IDF: Aims to reduce the weight of terms that appear in all documents
  • 5. • Given a keyword 𝑖 and a document 𝑗 • 𝑇𝐹 𝑖, 𝑗 • term frequency of keyword 𝑖 in document 𝑗 • 𝐼𝐷𝐹 𝑖 • inverse document frequency calculated as 𝑰𝑫𝑭 𝒊 = 𝒍𝒐𝒈 𝑵 𝒏 𝒊 • 𝑁 : number of all recommendable documents • 𝑛 𝑖 : number of documents from 𝑁 in which keyword 𝑖 appears • 𝑇𝐹 − 𝐼𝐷𝐹 • is calculated as: 𝑻𝑭-𝑰𝑫𝑭 𝒊, 𝒋 = 𝑻𝑭 𝒊, 𝒋 ∗ 𝑰𝑫𝑭 𝒊 Term-Frequency - Inverse Document Frequency (𝑻𝑭 − 𝑰𝑫𝑭)
  • 6. Cosine similarity • Usual similarity metric to compare vectors: Cosine similarity (angle) • Cosine similarity is calculated based on the angle between the vectors • 𝑠𝑖𝑚 𝑎, 𝑏 = 𝑎∙𝑏 𝑎 ∗ 𝑏 • Adjusted cosine similarity • take average user ratings into account ( 𝑟𝑢), transform the original ratings • U: set of users who have rated both items a and b • 𝑠𝑖𝑚 𝑎, 𝑏 = 𝑢∈𝑈 𝑟 𝑢,𝑎− 𝑟 𝑢 𝑟 𝑢,𝑏− 𝑟 𝑢 𝑢∈𝑈 𝑟 𝑢,𝑎− 𝑟 𝑢 2 𝑢∈𝑈 𝑟 𝑢,𝑏− 𝑟 𝑢 2
  • 7. An example for computing cosine similarity of annotations To calculate cosine similarity between two texts t1 and t2, they are transformed in vectors as shown in the Table
  • 9. Calculation of probabilities in simplistic approach Item1 Item2 Item3 Item4 Item5 Alice 1 3 3 2 ? User1 2 4 2 2 4 User2 1 3 3 5 1 User3 4 5 2 3 3 User4 1 1 5 2 1 X = (Item1 =1, Item2=3, Item3= … )
  • 10. Item1 Item5 Alice 2 ? User1 1 2  Idea of Slope One predictors is simple and is based on a popularity differential between items for users  Example:  p(Alice, Item5) =  Basic scheme: Take the average of these differences of the co-ratings to make the prediction  In general: Find a function of the form f(x) = x + b Slope One predictors - 2 + ( 2 - 1 ) = 3
  • 11. Relevant Nonrelevant • Most learning methods aim to find coefficients of a linear model • A simplified classifier with only two dimensions can be represented by a line  Other linear classifiers: – Naive Bayes classifier, Rocchio method, Windrow-Hoff algorithm, Support vector machines Linear classifiers  The line has the form 𝒘 𝟏 𝒙 𝟏 + 𝒘 𝟐 𝒙 𝟐 = 𝒃 – 𝑥1 and 𝑥2 correspond to the vector representation of a document (using e.g. TF-IDF weights) – 𝑤1, 𝑤2 and 𝑏 are parameters to be learned – Classification of a document based on checking 𝑤1 𝑥1 + 𝑤2 𝑥2 > 𝑏  In n-dimensional space the classification function is 𝑤 𝑇 𝑥 = 𝑏
  • 12. – Mean Absolute Error (MAE) computes the deviation between predicted ratings and actual ratings – Root Mean Square Error (RMSE) is similar to MAE, but places more emphasis on larger deviation Metrics Measure Error rate
  • 13. Next … • Hybrid recommendation systems • More theories • Boolean and Vector Space Retrieval Models • Clustering • Data mining • And so on