SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Downloaden Sie, um offline zu lesen
Master Thesis
The Design of a Rich Internet Application for
Exploratory Search by Real-Time Generation of
Similarity Maps
Roman Atachiants
Master of Science Thesis DKE 10-5
Thesis submitted in partial fulfillment of the requirements for
the degree of Master of Science of Master of Science in Artificial
Intelligence at the Department of Knowledge Engineering of the
Maastricht University
Exam committee:
Dr. Eduard Hoenkamp (supervisor)
Dr. Ronald Westra
Maastricht University
Faculty of Humanities and Sciences
Department of Knowledge Engineering
Master of Science in Artificial Intelligence
June 28, 2010
Abstract
Users who cannot formulate a precise query but know there must be a good answer somewhere,
often rely on exploratory search. This requires an interactive and responsive system, or else
the user will soon give up. As data bases are becoming larger, more specialized, and more
distributed this calls for a Rich Internet Application, fast enough to keep pace with the users
explorations. This thesis studies and implements a system, called MultiMap, which computes
similarity maps in real-time. This entailed: (1) precomputing every data structure that does
not change after the initial query, (2) optimizing algorithms for zooming and map generation
(3) and providing a cognitively appropriate visualization of high dimensional space. Applied
to a very large movie database, it resulted in a highly responsive, satisfying, usable system.
1
Acknowledgments
A lot of people helped me in different ways all along the research project and brought different
insights and opinions. I want to thank my fellow students, professors, friends and family who
helped, tested the prototype and supported/endured me during the research.
In particular, I would like to thank Dr. Eduard Hoenkamp for his support and supervision
of the project. Our regular meetings, discussions, brainstorming helped me a lot from the
very beginning and theoretical part of the research, down to the implementation, engineering
and design. But aside of professional relationship, I enjoyed his company the most and our
discussions about various domains, including: education, technology, politics, travel,... are
really memorable to me.
Next, I would like to thank a fellow A.I. student, Tom Marechal. He was an invaluable
asset and friend, as he provided me with inspiration and ideas all along the research project.
Additional, I would like to thank Dr. Johannes C. Scholtes and Dr. Ronald Westra for
their support, evaluation and critical thinking. Not only they, during the classes, largely
inspired me for this project but also gave various invaluable insights that contributed to
making this thesis better.
I would also like to thank also everyone who participated in the testing and evaluation of
the system, without their time and feedback the project would not be what it is today.
2
Contents
1 Introduction 4
1.1 Exploratory Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Faceted Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Interactivity & Responsiveness . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 The Concept 12
2.1 The Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 The Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 The System 15
3.1 Architectural Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Mathematical Concepts & Algorithms . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.2 Preprocessing & Correlations . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.3 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.4 Facets Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.5 Movies Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.6 Creation of Aspect Maps . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Server Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 The Client Front-End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.2 GridMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Usability Aspects 35
5 Conclusions 37
A Protocol Generation DSL 41
3
Chapter 1
Introduction
Search and data visualization are becoming more and more important as we are entering the
Petabyte Age. Traditional approaches of searching large datasets are query-based ones, which
by itself implies knowing what the user (researcher) is looking for. However, this approach
of searching the information is difficult when one is not familiar with the domain or lacks
the knowledge or contextual awareness in order to formulate precise queries to navigate the
information space. For example, how do we find something we would like to know more about,
but without having the specific knowledge to formulate a precise question? How would we
find a movie we might enjoy if we never saw Robert DeNiro or Charlie Chaplin? Or knowing
that we enjoy Quentin Tarantino’s movies, how would we discover other, relatively similar
movies? In order to find those movies, we perform a search process called exploratory search.
Exploratory search is a specialization of information retrieval which represents the activities
carried out by searchers who are:
• unfamiliar with the domain of their goal (i.e. need to learn about the topic in order to
understand how to achieve their goal)
• or unsure about the ways to achieve their goals (either the technology or the process)
• or even unsure about their goals in the first place.
In this research, we try to address this exploratory search problem [27] by introducing a
novel interactive search system. This system is called MultiMap and relies on similarity
measurements in order to present the latent information relations to the user in a geographic
manner. The system have been developed and tested using the Netflix dataset [7], containing
about 125.000 movies. A custom selection were performed on the dataset:
• The genres were filtered to 28 IMDB genres.
• The directors were filtered to those with at least 5 movies made (in total around 2500
directors).
• The actors were filtered to those with at least 10 movies where an actor has participated
(in total around 6000 actors).
• The movies were filtered to those containing all needed information and made by the
preselected directors and actors. The final database contained around 16000 movies.
4
1.1. EXPLORATORY SEARCH CHAPTER 1. INTRODUCTION
1.1 Exploratory Search
During the first phase of research we considered the exploratory search problem [11] [19],
trying to answer the following questions:
1. How to help the user who is unfamiliar with the domain (i.e.: a user who saw only a
few movies and/or doesn’t know many directors, actors)?
2. How to help the user who doesn’t know how to find a particular movie?
3. How to help the user who doesn’t know what kind of movies he likes?
Figure 1.1: This figure represents an abstracted backwards reasoning that has been applied,
in order to answer to exploratory search questions. On the figure: green represents the
interesting directions; red represents an unwanted direction; blue represents intermediate
steps.
Figure 1.1 shows a result of a backwards reasoning we performed in order to try to reason
about those 3 questions. The goal of the research was to find a system that can answer those
questions without much guessing, mostly because we want the user to explore and learn about
5
1.1. EXPLORATORY SEARCH CHAPTER 1. INTRODUCTION
the domain. From this analysis phase we derived several things that needed to be achieved
by the system:
• An extracted meaning of the data is required, the system should know about the domain.
In our particular case, the cinematographic domain.
• A way to preserve relations in order to help the user to relate different items.
• A way to drill down to individual movies and examine them is needed in order to allow
the user to navigate.
• Relevance feedback is needed in order to show the user how interesting a particular item
is and how relevant it is for his search. The idea behind relevance feedback is to take
the results that are initially returned from a given query and to use information about
whether or not those results are relevant to perform a new query.
The exploration in exploratory search means that a user have to be able to explore different
directions and, in a manner, swim in the data. The exploration factor is something very
implicit and therefore difficult to evaluate. In contrast to standard search engines, where the
user composes a query and the engine returns the closest documents to that query (document),
we do not want to select the closest points always in our system and restrict the user to the
search results that are the most relevant ones. By doing so, we allow the user to explore
different directions in this multi-dimensional space.
6
1.2. FACETED CLASSIFICATION CHAPTER 1. INTRODUCTION
1.2 Faceted Classification
One of the approaches in the exploratory search research domain that has been proven useful
and used in many different visualization systems is called faceted classification [26] [12]. This
approach is very common and widely used all across the World Wide Web, especially on
commercial web sites (Amazon, Ebay). Figure 1.2 illustrates the search box of the website
Amazon.com, where the fields Author, Title, ISBN, Publisher, Subject, Condition, etc. are
the facet categories. Faceted classification system allows assigning a different classifications
to a particular object, often, the object we want to search for, which is in our case: a movie.
Using multiple classifications enables to reorder the data in multiple of different ways and
define a search criteria.
Figure 1.2: The advanced search box on the Amazon.com website, the additional fields are
different aspects of a book.
A facet comprises “clearly defined, mutually exclusive, and collectively exhaustive aspects,
properties or characteristics of a class or specific subject” [25]. In this thesis, we use the word
“Aspect” to distinguish a facet category, and word “Facet” for a particular facet, for example:
Aspect : Actors;
Facets : Robert DeNiro, Johnny Depp, Bruce Willis...
The Netflix contest dataset contained 17700 different movie titles and served as a basis for
the data in this research. Considering the need of extracting different facets for each of those
7
1.2. FACETED CLASSIFICATION CHAPTER 1. INTRODUCTION
movies, a special tool has been written to extract additional information from the Internet
Movie DataBase (IMDB) [1] website and Netflix Database via their exposed APIs. This tool
was able to extract about 95% of the information for those movies. In particular, we were
interested in:
• Genres of the movies (Fantasy, Science-Fiction, Crime, Drama...)
• Year of release
• IMDB ratings, which is a precise rating from 1 to 10, rounded to 1st decimal
• Directors of the movies (Steven Spielberg, Quentin Tarantino...)
• Actors of the movies (Robert DeNiro, Johnny Depp, Bruce Willis...)
Additionally, there were also some other data about the movies (writers, movie plots, ...),
but not as abundant as the five aspects presented above. Therefore, we decided to base the
system on above aspects alone.
8
1.3. INTERACTIVITY & RESPONSIVENESS CHAPTER 1. INTRODUCTION
1.3 Interactivity & Responsiveness
Exploratory search is a process performed by a human who is using a tool (computer) to
interact with large quantities of information in order to explore and find the relevant pieces
of information. This human-computer part means by definition that the actual process is
an interactive process, therefore the interactivity is a very important aspect in exploratory
search.
One way to approach interactivity is to start with the notion of “look and feel”. The term has
become more or less synonymous with how the term style is used in other design disciplines.
In a concrete sense, the “look” of a GUI is its visual appearance, while the “feel” denotes
its interactive aspects [24]. One of the consequences is that the interface should be very
responsive and fast. One must also consider the fact that search systems need to handle large
amounts of data and need a lot of computing power. One logical conclusion is that in order to
build a good exploratory search system, the data manipulation should be handled by powerful
machines to be fast. During our research, we opted to a client-server approach to enhance
the interactivity without losing the computing power we need to perform all operations in
real-time, keeping the system well responsive and interactive. By having all operations in
real-time, we run into the problem of massive networking communication.
The communication in this case is a two-way dialog between the client and the server. We
need the communication to be duplex, where the server and the client have the ability to
initiate the dialog, because the current world wide web is becoming real-time (huge services
as Twitter and Facebook are good examples). As the information flow is updated in real-time,
most of the services are still using the traditional HTTP protocol-based technologies.
The Hypertext Transfer Protocol (HTTP) is an Application Layer protocol for distributed,
collaborative, hypermedia information systems (RFC specifications can be found: [2]). HTTP
is a request-response protocol standard for client-server computing. In HTTP, a web browser,
for example, acts as a client, while an application running on a computer hosting the web
site acts as a server. The client submits HTTP requests to the responding server by sending
messages to it. The server, which stores content (or resources) such as HTML files and images,
or generates such content on the fly, sends messages back to the client in response. These
returned messages may contain the content requested by the client or may contain other kinds
of response indications [3].
The problem with using HTTP for interactive and real-time web is a fundamental one, as
world wide web evolved, different architectures and new frameworks (SaaS, SOAP, AJAX ...)
were built on the top of HTTP protocol, but fundamentally, the real-time communication
is mainly done using the polling technique (see figure 1.3). The polling is a workaround,
basically it is a client, asking the server for update on a very short interval, constantly. There
are several problems with this approach:
1. The client’s and server’s CPU resources are used all the time for mostly useless update
checking. This, on mobile devices, potentially drains the battery life.
2. The networking bandwidth is used constantly, and as the networking throughput of the
server is limited, this becomes a bottleneck very quickly.
In order to find how to design a system responsive enough for such communication, consider
the requirements:
9
1.3. INTERACTIVITY & RESPONSIVENESS CHAPTER 1. INTRODUCTION
Figure 1.3: This figure shows the communication principles for real-time updates of the polling
architecture and a publisher/subscriber architecture.
1. A client-server approach, since the amount of data is important and the computations
can be very expensive.
2. Reliable networking is necessary (as we are not considering a streaming application and
need a reliable two-way communication), therefore the choice for the transport layer is
TCP [14].
3. A format for message parsing in order to encode/decode complex messages while having
the minimum impact on the performance
Since those requirements are quite similar to the requirements for multi-player client/server
on-line games, we considered that the best place for finding the technological answer for an
interactive search system would be the gaming literature [10] [18] [22]. The games are by
definition interactive applications, and on-line games are usually intensively optimized for
the latency and throughput. Due to the fact that the interactivity requires a lot of duplex
communication, the best option is a socket-server [18], and a custom protocol for low-level
message encoding.
Following those considerations, an interactive exploratory search system can be designed as
a multiuser on-line game engine. The architecture should fulfill six goals: minimize network
traffic, provide opportunities for load balancing, provide a secure game playing environment,
10
1.3. INTERACTIVITY & RESPONSIVENESS CHAPTER 1. INTRODUCTION
provide a high level of scalability and maintainability, and maximize client side performance
for real-time graphics [8].
The architecture for the system is layered and component-based:
• The Network Component that contains the Packet Serializer (Messenger), De/Encrypt,
De/Compress and Network modules. The Messenger module is in charge of forming
and sending messages in a given format.
• The User Component that contains both the Authenticator and the User Database
modules.
• The Search Component that is used and designed specifically for the exploratory search
purposes with a custom protocol. For the system designed for this thesis, the search
component is described more in detail in the section 3.2.
As mentioned earlier, the latency is a crucial point for highly interactive applications. Latency
refers to the time it takes for a packet of data to be transported from its source to its
destination. In many networking texts, you will also see the term Round Trip Time (RTT)
in reference to the latency of a round trip from source to destination and then back to source
again. In many cases the RTT is twice the latency, but some network paths exhibit asymmetric
latencies, with higher latencies in one direction than the other [6]. There are different ways
to deal with latency, but simply put: we need more control over the sent/received packets
and minimize their size and being able to prioritize and parallelize different actions [5].
11
Chapter 2
The Concept
2.1 The Idea
In the chapter 1 we considered the implications of exploratory search problem and its basic
components as faceted classification and interactivity. This thesis introduces a novel ex-
ploratory search interface, called MultiMap which relies on similarity measurements in order
to present the information to the user. In earlier 1990s it was demonstrated that spacial map-
ping techniques can be generated to visualize contents and semantic relationships of a docu-
ment space [15], yet, there are still not many systems that actually use mapping techniques.
The idea behind a system comes from a simple map, where the information is presented in a
geographic manner: two towns that are close on a map mean the closer transition from one
to another. Using a map, it is possible to navigate and explore huge amount of information
by zooming/unzooming and exploring the dataset both locally and globally.
Figure 2.1: A world map with countries divisions.
If we can do it for our planet earth using mapping software (Google Maps or Bing Maps are
the examples of such software), why couldn’t we explore different datasets in the same way?
12
2.1. THE IDEA CHAPTER 2. THE CONCEPT
What if we could zoom on both New York and Tokyo and generate a new world map, having
Washington, New York, Tokyo, Kyoto and Paris in between (use figure 2.1 in order to help
imagining)? It can be rather messy to view them in this way, that’s why we also need to
introduce the context: Washington and New York are in United States of America, Tokyo and
Kyoto are in Japan and Paris is in France. The countries are a clear separation between the
cities and helps us to understand better the cities. Now replace the towns by the Movies, the
countries by Genres/Actors/Directors and this gives a basic understanding of how MultiMap
works.
MultiMap is based on this idea of zooming and on-the-fly generation of new maps. Formally
it involves choosing new coordinate system. MultiMap features also the ability to unzoom to
see again the whole picture and switch the maps if needed (again, think Google Maps). In
order to understand better how MultiMap works, let’s go back into the movie context and
think of different aspects, facets and movies:
• An aspect “Genres” contains facets “Action”, “Adventure”, etc.
• The facets “Action”, “Adventure” can relate to movies like “Indiana Jones” etc.
• The movie “Indiana Jones” contains the actor “Harrison Ford” (which is also a facet of
aspect “Actors”)
One can notice that this is a closed loop, it is possible to look at different genres, then look at
a particular movie, then switch to actors and go on and explore the information this way. If
we imagine for a second that we can create a map of an aspect, where the points (“countries”)
would be the facets, we probably should be able to place also the movies (”towns”) on that
map. In order to create such maps, we need several components:
• A function to compare two facets of an aspect, a distance measurement. For example,
this way we would be able to compare the similarity between the Adventure genre and
the Action genre or between Tom Hanks and Harrison Ford.
• A way to create a map very quickly as new map should be generated when the user
zooms on some movie.
• A way to measure relevancy of the movies and facets. Considering our example above,
what towns we would choose to present on a new map if we zoomed on New York and
Tokyo? Paris, London, Rome?
Further in this document, chapter 3 explains how the whole system is done, and in particular,
the section 3.2 explains all concepts and algorithms that were developed in order to produce
a working prototype of MultiMap.
13
2.2. THE PROTOTYPE CHAPTER 2. THE CONCEPT
2.2 The Prototype
The MultiMap concept can be divided on two main parts:
• The system that performs all mathematical computations, handles the data and oper-
ations on the data.
• The front-end that is presented to the user, after all, there are many different ways to
present a map. Figure 2.2 shows the front-end that we designed as our first approach
to create a visualization for MultiMap system.
Figure 2.2: A screen-shot of the prototype, presenting a grid map on the directors aspect.
The front-end visualization for the MultiMap we designed is called GridMap, and is one of
the approaches to visualize those maps. This approach relies on very ordered presentation
of the maps . In fact, it tries to map a cloud of 2D points to a grid while trying preserve
the spacial relations. The interface allows users to switch the aspect maps, zoom on different
facets and by flipping a grid cell, viewing a details of a particular movie and follow its links to
construct new maps. Section 3.4 explains more in detail the actual interface and its different
components.
14
Chapter 3
The System
3.1 Architectural Overview
The system was designed to be a client-server application with several tiers, in this section
we will describe its design. The main idea is based on the interactivity between the user and
the data, and the ease-of-use. First of all, the system should meet several prerequisites:
• it should be interactive, so it has a real-time constraint;
• it should be able to handle large datasets;
• it should be easy to use and available to remote users.
Figure 3.1: The layered architecture of MultiMap system.
Following those prerequisites, the logical conclusion is to build a real-time Rich Internet Appli-
cation (RIA) [9]. Such applications are mainly standard n-tier based applications. MultiMap
architecture is a 3-tier real-time architecture, allowing to the front-end client to have full
15
3.1. ARCHITECTURAL OVERVIEW CHAPTER 3. THE SYSTEM
interactivity with the data. The main idea behind such a system is to have a clear separa-
tion between the client, the logic and the data itself, as illustrated in Fig.3.2. The actual
architecture, as described in Fig. 3.1, consists of :
• a front-end client in flash, allowing interactive data visualization;
• a custom C# real-time server, written by myself in order to handle large amounts of
data interactively;
• a logic layer running the Matlab engine for all data-intensive search, correlations and
other operations.
Figure 3.2: Visual overview of a Three-tiered application. Illustration from Wikipedia.
16
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
3.2 Mathematical Concepts & Algorithms
3.2.1 Overview
Figure 3.3: The representation of the data-flow, representing how the data is processed on
the fly (in an interactive mode).
The main purpose of the research is the interactivity of the system. This imposes a real-time
constraint and makes things very difficult to engineer, especially when the computation time
can take very much time. Based on this, we needed a system, that can handle this data-flow
rapidly, and update quickly respond to user queries. Figure 3.3 shows the simplified sequence
17
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
diagram of the system, when the information need to be updated and presented. Next few
section explain the details of this schema, block by block.
The system uses a content-based recommendation method. In content-based recommendation
methods, the utility u(c, s) of item s for user c is estimated based on the utilities u(c, si)
assigned by user c to items si ∈ S that are similar to item s. For example, in a movie
recommendation application, in order to recommend movies to user c, the content-based
recommender system tries to understand the commonalities among the movies user c has
rated highly in the past (specific actors, directors, genres, subject matter, etc.). Then, only
the movies that have a high degree of similarity to whatever users preferences are would get
recommended [4].
Overall, the flow consists of several main points:
• The preprocessing step performs the transformation and precomputes the maximum of
information that can be precomputed. It considers all aspects and for each facet in each
aspect computes a closest network (explained in the section 3.2.2).
• The session initialization step initializes the user session and copies some of the prepro-
cessed data in a so-called Ranking Matrix.
• The update step performs the update of the Ranking Matrix (see 3.2.3 for more infor-
mation). By doing so, a new ranking matrix is created, basically updating the ranks/rel-
evancy ratings based on the selection.
• The facets selection step chooses several facets, based on the Ranking Matrix. To do
so, it combines 2 techniques: takes a subset of most relevant facets from the matrix,
then performs a k-means clustering to be able to pick most ”global” facets. This step
is explained more in detail in section 3.2.4.
• The movies selection step selects the most relevant movies for each facet that have been
chosen. This step is explained more in detail in section 3.2.5.
• The creation of aspect maps performs the multidimensional scaling [23] and a custom
grid-map algorithms, in order to create 2-dimensional grid, where the latent relations
between different facets are retained. This approach is explained in section 3.2.6. This
step can be potentially replaced by any other representation, including 3-dimensional
ones.
18
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
3.2.2 Preprocessing & Correlations
Overview
The system handles a lot of data and reorders it continually on each request of the user. In
order to allow the system to perform in the real-time, as much data as can be done should
be precomputed. Several things that needs to be done:
• For each aspect, the facets should be correlated in order to allow the comparison between
2 points. This is done differently for each aspect, depending on the data. It allows, for
example, to correlate an Adventure genre and Science-Fiction genre.
• For each aspect, the facet network is computed. This network allows us to propagate a
ranking and reorder the facets in real-time. See the section 3.2.2 for more details.
• For each facet of each aspect, a list of most relevant movies is constructed and ordered.
This is done to allow to pick the movies in real-time. This step is explained in more
detail in the section 3.2.2.
In the precomputation phase, one of the most important result is to be able to construct so-
called ”Aspect Spaces”. Aspect Spaces are N-Dimensional dissimilarity matrices. The Aspect
Spaces are computed based on a particular distance metric δ(i, j) := distance between i th
and j th features of an aspect. In order to simplify the implementation, we define:
• Input matrix I is an initial data we need in order to compute similarities between aspect
samples. They are presented in N dimensional space, where N is the number of movies,
about 16000.
• Per aspect, a function δ which can be different for every aspect and computes the
membership of the aspect to a particular movie.
Next few sections are explaining the definitions and the steps which are performed in order
to create each aspect space.
Genres Space
In order to create the genres space, the genres are correlated using simply the complete movies
distribution. The input matrix I for the genres space is defined as following:
Ii,j =



δ(Genre1, Movie1) · · · δ(Genre1, Moviej)
...
...
...
δ(Genrei, Movie1) · · · δ(Genrei, Moviej)



The membership function δ :
δ(Genrei, Moviej) =
1 if movie contains the genre
0 otherwise
19
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
Finally, we define a distance function, which is a general cosine distance:
∆(Genrei, Genrej) =
Ii ∗ Ij
Ii Ij
In order to test how good the correlation is, one can use the aspect space as the input for
the multidimensional scaling function. This helps to visualize the correlations and see if the
desired meaning is preserved. Figure 3.4 show the 2 dimensional genres space, we will call
such maps “Aspect Maps”. One can see that the correlation makes sense, for example: the
Adventure genre is close to Fantasy and Science-Fiction.
Figure 3.4: This figure shows the distances between genres in 2 dimensional space after
performing a multidimensional scaling on the genres space.
Ratings Space
Ratings space can be used in different ways, and depending on the choice of usage, the
correlation can be adapted:
• ratings can be used as an additional dimension, shown using a color or a font size while
showing a movie;
• ratings can be shown in order of euclidean distance;
20
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
• ratings can be used to create a complete ratings aspect space, but this requires more
complex correlation function.
In the research, we decided to use the second approach, simply calculating the euclidean
pairwise distance for each rating.
Years, Directors and Actors Spaces
There are several ways to correlate the years, directors and actors. In our research, we
wanted to explore the possibility to correlate those facets based on their genres distribution.
This approach would allow the user, for example, to see what kind of movies were done in
a particular year and what are similar years, in terms of genres distribution. To do so, we
proceed as follows:
Ai,j =



δ1(Y ear1, Movie1) · · · δ1(Y ear1, Moviej)
...
...
...
δ1(Y eari, Movie1) · · · δ1(Y eari, Moviej)



The membership function δ1 :
δ1(Y eari, Moviej) =
1 if movie released that year
0 otherwise
Next, we reuse the input matrix I from the genres space. This is defined as follows:
Bi,j =



δ2(Genre1, Movie1) · · · δ2(Genre1, Moviej)
...
...
...
δ2(Genrei, Movie1) · · · δ2(Genrei, Moviej)



The membership function δ2 :
δ2(Genrei, Moviej) =
1 if movie contains the genre
0 otherwise
Next, we need to compute the matrix I, which tells us in how many movies of different
genres the actor has participated in. This is computed by a matrix multiplication of A and
B transposed:
Ii,j =



δ(Y ear1, Genre1) · · · δ(Y ear1, Genrej)
...
...
...
δ(Y eari, Genre1) · · · δ(Y eari, Genrej)


 = A × BT
Finally, by computing the pairwise cosine distance for the matrix I, we are able to correlate
the years, based on their genres distribution. The same procedure is applied in order to
correlate the directors and actors. Figure 3.5 shows the aspect map created for the directors,
as we did with the genres, the results seem to make sense: Quentin Tarantino is quite close
to Martin Scorcesse (they do very similar kind of crime movies) and at the same time quite
far away from George Lucas, the creator of Star Wars saga.
21
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
Figure 3.5: This figure shows the distances between directors in 2 dimensional space after
performing a multidimensional scaling on the directors space, similar to figure 3.4
Facet Network
In order to perform the zooming and allow the system to be interactive, one needs a way
to select and sort the facets rapidly. In MultiMap, this is done by precomputing a facet
network (Fig. 3.6), and setting a particular rank value to each node in this kind of network.
Generally speaking, we need to compute the matrix R with facets on the rows and two (or
more) “pointers” to the closest points. The desired matrix R:
Ri,3 =



Facet1 1st closest facet 2nd closest facet
...
...
...
Faceti 1st closest facet 2nd closest facet



The closest points computation is done using the previous inter-facet correlations. This
step can be very time-consuming, as it has the complexity of O(n2). This would interrupt
a smooth interaction with the user, and therefore would be prohibitive. Fortunately this
matrix can be precomputed even before the interaction starts. In general, anything that can
be precomputed, should be precomputed to make the system responsive.
22
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
Figure 3.6: A subset of the precomputed facet network for Genres aspect. In MultiMap,
everything that can precomputed will be precomputed, which is conducive to a smooth and
responsive interaction.
Movie Ordering
Last step is movie ordering. This step is very straightforward, as it is the rearranging of the
movies-facet relations in the following form:
Fi,2 =



Facet1 Movie vector, ordered by relevancy
...
...
Faceti Movie vector, ordered by relevancy



For the sake of simplicity, we use an IMDb rating as a relevancy measure. This rating is
a number from 0 to 10 with one decimal and based on the huge statistics from the IMDb
website visitors. The following example of the movie ordering for genres space illustrates this:
Fi,2 =





Adventure
The Judy Garland Show The Secret of Monkey Island · · ·
9, 8 9, 6 · · ·
...
...
Faceti Movie vector, ordered by relevancy





23
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
3.2.3 Ranking
We would like to give users the ability to zoom in on individual facets or movies based on
their selection. This can be accomplished, by ranking each point and re-ranking them with
every zoom. For this we need a facet network (graph), ideally with a 100% coverage of the
facets and tightly interconnected. Such a network is constructed in the preprocessing step (see
section 3.2.2) in the form of graph where a node (a facet) is connected to 2 closest neighbors.
For example Science-Fiction genre would be connected to Adventure genre and Action genre,
as illustrated in figure 3.6.
Based on such network, a zooming can be effectively done as a recursive algorithm, with
several parameters:
• Vector B, is a weight vector for the closest points. For example, a vector where first
closest gets full weight, second closest gets half of the weight would be:
B = (1, 0.5)
• Depth-decay function for each node at depth d
λ(d + 1, ρ, b) = ρ + (γ/d) ∗ b
Where:
– d is the actual depth
– ρ is the actual rank of the node
– γ is the decay factor (a constant)
– b is the weight of the point from weight vector
The depth-decay function here presented is a linear function, but depending on the context
and needs, can be adapted or changed. The depth-decay function calculates the current
ranking ρ, which updates the network. The ranking is computed recursively for each neighbor,
then the network is sorted by the rank and first x nodes are shown to the user.
Additionally, zooming out can be done in several different ways: the simplest (and most
computationally efficient one), is to keep track of all changes to the ranking value ρ on each
step. This approach would use some memory, but there’s no need to recalculate everything.
Another approach would be to recursively recalculate ρ values backwards, but effectively
using CPU to do the calculation. The depth-decay function should also be updated in order
to support such feature.
24
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
3.2.4 Facets Selection
At this point in the data-flow we have a Ranking Matrix and a simple solution would consist
of performing a selection and simply selecting few first ranked facets. Such an approach is
just fine for standard search engines, for example Google, Lemur... In MultiMap, this is
performed using a selection algorithm but why do we actually need one? In order to answer
this question, let’s consider following:
• standard search engines use a query in order search the data, therefore the most relevant
documents are the ones what are the closest to the query in this multi-dimensional
document space;
• in exploratory search we need an exploration factor, allowing the users to explore dif-
ferent possibilities. With this, we don’t particularly want to restrict the results to only
closely-related and most relevant points, but also to other points, related to the topic
(at some extent).
The selection algorithm allows us to pick a number of rows from an Input Matrix I. Recall that
Input Matrix I is a step just before pair-wise distance comparison, so basically it’s a ready-to-
compare matrix, where getting a distance between 2 points actually means something. The
idea behind the algorithm is quite simple: it selects a subset of relevant facets, which is bigger
than the amount of facets that need to be shown to the user; it tries to find k clusters within
the subset and then takes the closest points to each cluster centroid. The selection algorithm
works in a rather straightforward way:
• first, a selection of top ranked facets is performed. In the prototype we take twice the
number of facets that we actually want to present to the user (i.e.: if we need to show
a grid of 2 by 2 points, we take the 8 most relevant facets from the ranking matrix);
• next, the algorithm computes k-means clustering, with k clusters. Where k clusters
would be the number of points to show to the user, for example 4 actors would mean
k=4
• once k clusters are found, each point has an assigned index of a cluster and we also have
k centroids for each cluster. The selection continues by taking 1 closest point to each
centroid, therefore taking the most average point in the particular cluster.
• finally, it returns the selected facets.
25
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
3.2.5 Movies Selection
The next step in the data-flow is the actual selection of the movies. By now the system is going
to present the facets it selected (the most relevant facets to the current zoom sequence). The
movies presented on the map can be selected simply by taking several first movies, based on
some rating function. We take the IMDB average rating as the value used to sort the movies
within each facet. This was already done in the preprocessing phase (see section 3.2.2), and
the selection resumes by taking the first few movies from the facet.
For example, in the following matrix one can see that if Adventure is a selected facet, the
movies ”The Judy Garland Show” and ”The Secret of Monkey Island” will be selected as
they have the highest IMDB rating within the facet.
Fi,2 =





Adventure
The Judy Garland Show The Secret of Monkey Island · · ·
9, 8 9, 6 · · ·
...
...
Faceti Movie vector, ordered by relevancy





Now the selection of facets and movies are done, we can actually proceed to the creation of
the Aspect Maps.
26
3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
3.2.6 Creation of Aspect Maps
The final step in the data-flow is the creation of the so-called Aspect Maps, a spatial rep-
resentation of the selected facets. The maps allow the user to compare different facets and
subsequently the related movies between themselves. We use maps to help the user envisage
the locations of movies and facets in high dimensional space. Since it would be too difficult to
visualize, this high dimensional space is reduced to two or three dimensions. For this of course
we need a dimension reduction that is faithful to the distances in the original space. From
the many techniques that are available (dimensionality reduction, ordination...) we selected
multidimensional scaling (MDS).
Figure 3.7: The transition from the facet selections to the aspect map.
Multidimensional scaling is a special case of ordination. An MDS algorithm starts with a
matrix of item-item similarities, then assigns a location to each item in N-dimensional space,
where N is specified a priori. In our case, we want to reduce the matrix to 2 or 3 dimensions,
to be able to visualize the result on a screen.
The figure 3.7 shows the process of creating the aspect map in this step, it is quite straight-
forward and all the data structures by now are ready to be consumed directly by an MDS
algorithm. Figure 3.5 is actually a result of the MDS on a subset of the directors aspect and
illustrates the output in this step.
Sometimes people have suggested to use Self-Organizing Maps (SOM, [16]) to generate a lower
dimensional representation. What we found that for this particular case SOM is prohibitively
inefficient.
By the end of this step, we have a collection of points in low dimensional space. Those
points can be presented to the user in a number of different ways. Our approach is called the
GridMap visualization and it is explained in the section 3.4.2.
27
3.3. SERVER TECHNOLOGY CHAPTER 3. THE SYSTEM
3.3 Server Technology
From the beginning of the research, we wanted the system to be highly interactive and re-
sponsive. In order achieve this we need a scalable system with high performance. For this,
we determined the following requirements:
• the data has to be sent very efficiently, potentially about 5-10 Kilobyte of text data on
each user request;
• the ability to notify user of events happening on the server;
• real-time communication for the interaction, for instance, when user clicks on something,
the system have to process the request in less than a second (or else, people simply won’t
use it).
Given the above requirements, the system should be based on an event-driven architecture
(EDA) with compression and security. For completeness, here is the list of most distinctive
features of the server (some readers may find it a bit technical):
• Monolithic server, running on one machine, but potentially scalable to a cluster of
machines.
• Manages the thread pool and distributes the work to each thread. It would try to match
the number of threads to cores (i.e.: 4 threads on a Quad Core machine) and distribute
smaller tasks to those threads.
• Big tasks are represented in a form of software timers, which are sliced in order to
achieve scalability.
• The server manages a socket pool, listening to several endpoints. Works with IPv4 and
IPv6 as well.
• Written in C#, the server is compatible with 32 and 64 bit platforms. It is also CLI-
compliant and works on cross-platform frameworks like Mono (works on Unix, Linux...).
• Handles client-socket lifetime, in order to achieve stability and error-tolerance.
• Integrates Matlab interoperability layer, allowing the C# to communicate with Matlab
and then send the results to the Flash client via network.
• Handles the data via an object-relational mapping (ORM) layer.
• Publish-Subscribe model is used for the real-time notifications. It allows clients to
subscribe to an event of the server and be notified by the server when the event happens.
This notification happens via a push-operation.
• Custom message serialization/deserialization.
• Per-packet compression/decompression.
28
3.3. SERVER TECHNOLOGY CHAPTER 3. THE SYSTEM
• Through introspection, the server generates a networking libraries, compliant to a pro-
tocol interface. Appendix A provides more information on this feature and illustrates
some of the security and compression mechanisms.
• Accounting, sessions mechanisms in order to keep track of users and their accounts and
connections.
• Access-Level security mechanism.
All these features were actually implemented by ourselves, since at the time of our research
not all of the technology was available to us.
29
3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM
3.4 The Client Front-End
3.4.1 Overview
Figure 3.8: The prototype of the client front-end
The system we have described manipulates points in high dimensional space. This is not
going to change. What we will add in this section is a way to present these points in a low
dimensional space so that the user can interact with the system through direct manipulation
in real time. For our prototype, we developed the visualization system, called GridMap.
Section 3.4.2 explains how this system works and why.
30
3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM
3.4.2 GridMap
The reduction to the two dimensional space was already explained in the section of Aspect
Maps (3.2.6). This was accomplished by multidimensional scaling. It is more important to
know that a point is near another point than to know the exact distance. For example, as
shown in figure 3.8, it is more important to know that Action is close to Adventure than to
know the exact distance. Gridmap then, maps the points from 2D space calculated by MDS
to a grid, where the exact distances disappear but the spatial order is retained. In the figure
3.8 9 cells are presented and the number of cells can be changed depending on the size of the
screen (for example, during our experiments on 24 inch screen, the optimum GridMap size
was 4 by 5, allowing to present easily more than a hundred of movies without overloading the
user with information).
Figure 3.9: This figure illustrates a mapping performed by the GridMap which removes the
exact distances while leaving the order intact.
The interface makes it easy for the user to zoom and filter: the left panel (as shown on
figure 3.8) allows to switch between the aspect maps and filter and search on every facet. For
example, when the user knows some particular actor he can search in the actors pane and
then zoom and view the similar actors. Additionally, this panel allows the user to customize
the zooming criteria and tune the MultiMap parameters: number of movies per facet, number
of facets to show on the map, etc.
31
3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM
Transitions
It often happens that a person viewing a scene fails to see large changes in the scene. This
is called change blindness, a well-known psychological phenomenon [20]: if the change in the
scene coincides with some visual disruption such as a saccade (very small eye movements) or
when the scene is briefly obscured. This situation often occurs in web applications, where
the web page briefly flashes after actions demanding a new server request. In this context,
animated transitions help the user see the changes in the scene [13] [21].
The transitions turned out to be quite important, providing visual feedback to the user so he
know what’s going on. In the GridMap, there are two kinds of crucial transitions:
• The transition that animates the facet pane, keeping it visible during the zooming on
this pane, then moving it to a new position. This greatly helps to the user to keep track
of the item he is zooming on to. This is needed, since on each zooming the coordinate
system changes according to the zoom and can be quite confusing to the person who
uses the interface.
• The transition that is shown on the figure 3.10 which flips the grid cell, allowing the user
to see the details of a particular movie within its context. This transition allows the
user to directly see the information about the actual element he’s interested in, keeping
everything in context. The user can always flip back and see other movie details. This
is what people do in the video store where they look at available movies, pick one and
flip it to see it’s details on the back.
Figure 3.10: As in the video store, one can select the movie and look on the back of the box
to see the details.
32
3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM
Cell Representation
The cell representation allows the flipping feature, illustrated in figure 3.10. Based on the
feedback of users, this feature proved to be very attractive and motivated them to experiment
further with the interface. Additionally, this allows to present movie details while keeping
the other facets visible.
The figure 3.11 shows how a list of movies is presented on a grid cell, giving a visual relevance
feedback with a star. Golden stars are the best-rated movies and are probably most interesting
for the user to check out.
Figure 3.11: An actual list of movies presented on the front of the grid cell.
The figure 3.12 illustrates the content presented when a movie is flipped: the movie cover,
synopsis and two additional tabs.
Figure 3.12: Details of the movie, first tab. It presents the synopsis available to the user to
read in order to learn about a particular movie.
The second tab, illustrated on figure 3.13, shows the related information of the movie, linking
directly to different facets. By clicking on a particular genre, for example, the system will
perform a zoom on the facet and construct a new map. It allows back and forth navigation:
from big picture to details of one movie, then moving again on another map and zoom in
again to a particular movie.
33
3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM
Figure 3.13: Details of the movie, second tab. It presents the facet links to various information
as year, rating and the genres of the movie.
Figure 3.14: Details of the movie, third tab. It presents the facet links to the directors and
actors.
The last tab, illustrated on figure 3.14, allows the user to view the people: directors who
made the movie and actors starring in the movie. Yet again, the system allows to directly
zoom one one of those links, constructing a new map.
34
Chapter 4
Usability Aspects
The main purpose of the work was to build a responsive system for a particular Rich Internet
Application, in the area of exploratory search. Of course, such a system only makes sense
if users can actually use it. So we did a, admittedly limited and informal, evaluation of its
usability.
To do so, I asked ten people, acquaintances and friends age 20-30 years old, half of each gender,
to participate in a survey about my thesis work. The were explained what exploratory search
was in general, without reference to the movie database. Next, they were asked to work with
the system for about half an hour, and find movies of their liking. After working with the
system they were asked to fill out a questionnaire with 25 questions. The questions are shown
below and were about Usefulness, Ease of Use, Ease of Learning, and Satisfaction with the
system.
The questionnaires were constructed as seven-point Likert rating scales. Users were asked to
rate agreement with the statements, raging from strongly disagree to strongly agree [17].
Following are the global averaged results of the questionnaire, per feature:
Average results of USE questionnaire
Average Usefulness: 5.3/7
Average Ease of Use: 5.6/7
Average Ease of Learning: 6.4/7
Average Satisfaction: 5.9/7
The users were very satisfied with the system and few of them also pointed out that the
interface was very beautiful and user-friendly. On the other hand, some of them thought
that the interface didn’t gave enough control to them in order to know exactly what happens
underneath.
For completeness of the section, here are the tables with averaged results:
35
CHAPTER 4. USABILITY ASPECTS
Average results, Usefulness questionnaire
It is useful 6.3/7
It gives me more control over the activities in my life 3.8/7
It makes the things I want to accomplish easier to get done 5.3/7
It meets my needs 5.7/7
It does everything I would expect it to do 5.3/7
Average results, Ease of Use questionnaire
It is easy to use 5.8/7
It is user friendly 6.7/7
It requires the fewest steps possible to accomplish what I want to do with it 5.7/7
Using it is effortless 5.2/7
I can use it without written instructions 4.5/7
I don’t notice any inconsistencies as I use it 5.0/7
Both occasional and regular users would like it 6.2/7
I can recover from mistakes quickly and easily 5.8/7
I can use it successfully every time 5.5/7
Average results, Ease of Learning questionnaire
I learned to use it quickly 6.5/7
I easily remember how to use it 6.5/7
It is easy to learn to use it 6.2/7
I quickly became skillful with it 6.3/7
Average results, Satisfaction questionnaire
I am satisfied with it 6.0/7
I would recommend it to a friend 6.3/7
It is fun to use 6.7/7
It works the way I want it to work 6.2/7
It is wonderful 4.8/7
I feel I need to have it 4.8/7
It is pleasant to use 6.2/7
Those are preliminary results, but the more formal evaluation is beyond the scope of this
thesis.
36
Chapter 5
Conclusions
This thesis described a form of exploratory search where responsiveness was of the essence.
The application we called ‘MultiMap’ can be categorized under the heading of so-called Rich
Internet Applications, a class of applications that is becoming more and more important as
data bases become larger, more specialized, and more distributed. Because of this, users more
and more often get into a situation where they know there must be information available to
answer their questions, nor are the means to formulate a precise query.
The resources they need to answer such a query may be available on remote servers, hence
to quickly explore possible answers, the servers much be made responsive enough or else the
user will quickly give up. MultiMap was built with such users in mind. Every design decision
in this thesis was under the constraint of responsiveness.
This led to the following requirements:
• The system should be responsive, scalable, and interactive.
• The system should support exploratory search.
• The system should provide real-time spatial visual feedback reflecting changes in the
high-dimensional search space.
Exploratory search is the problem to find information that we may not know how to formulate,
but which we will recognize once we see it. There are three bottlenecks that could make
our system unresponsive: (1) complex calculations, (2) slow zooming, and (3) ineffective
visualization . The way we solved these bottlenecks are the following:
1. Every computation that can be done in advance will be done in advance, so that it
cannot cause any delay.
2. Zooming and map generation are highly optimized and can be done in real-time.
3. The visualization is presented to the user in a cognitively appropriate way.
We believe that such a system should be constructed in a modular fashion and in this thesis
we presented a way to do so. This modularity allows, for example, to change the ranking or
37
CHAPTER 5. CONCLUSIONS
enhance the selection algorithms and be able to evaluate the new algorithm performance based
on the existing one. It also allows to build various user-interfaces on top of the search engine
and eventually audience-targeted user interfaces. During the research we discussed several
different possible front ends, including different 2-dimensional representations enhanced with
colors, sounds, font sizes. Also, 3-dimensional interfaces can be built and are very interesting
directions to explore. We considered implementing 3-dimensional sphere navigation where
the zooming could allow to create 2D map or a new 3D sphere, but we leave that for future
work.
The third question (about usability) was answered by evaluation and feedback we got from
users. The users were very satisfied with both MultiMap and GridMap, but also felt that
they had not enough control over the system. They quickly learned how to use the system
and how to get movie suggestions. However, here was a need to explain and introduce them
to the concept at first, as it is a different approach to information exploration.
After having designed and evaluated the system, we believe that the map generation technique
presented in this thesis is an important direction to go and an effective way to perform
exploratory search.
38
Bibliography
[1] International movie database, http://www.imdb.com, December 2009.
[2] Rfc 2616: Hypertext transfer protocol – http/1.1, http://tools.ietf.org/html/rfc2616,
June 1999.
[3] Http wikipedia, http : //en.wikipedia.org/wiki/hypertexttransferprotocol, June 2010.
[4] G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: a
survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge
and Data Engineering, 17:734-749, 6 2005.
[5] G. Armitage. Quality of service in ip networks: Foundations for a multi-service internet.
Macmillan Technical Publishing, 4 2000.
[6] G. Armitage, M. Claypool, and P. Branch. Networking and Online Games: Understand-
ing and Engineering Multiplayer Internet Games. John Wiley and Sons Ltd., 2006.
[7] R.M. Bell, J. Bennett, Y. Koren, and C. Volinsky. The million dollars programming
prize. IEEE Spectrum, 5 2009.
[8] S. Caltagirone, M. Keys, B. Schlief, and M. J. Willshire. Architecture for a massively
multiplayer online role playing game engine. Journal of Computing Sciences in Colleges,
Volume 18, Issue 2, 12 2002.
[9] Piero Fraternali, Gustavo Rossi, and Fernando S andnchez Figueroa. Rich internet ap-
plications. Internet Computing, IEEE, 14(3):9 –12, may-june 2010.
[10] J. Gregory. Game Engine Architecture. A K Peters, 2009.
[11] M. A. Hearst. Next generation web search: Setting our sites. IEEE Data Engineering
Buletin 23, 3, 38-48, 3 2000.
[12] M. A. Hearst. Design recommendations for hierarchical faceted search interfaces. SIGIR,
Workshop on Faceted Search, pages 2630, August 2006. pages 2630, August 2006, 2006.
[13] J. Heer and G. Robertson. Animated transitions in statistical data graphics. IEEE
Transactions on Visualization and Computer Graphics, 6 2007.
[14] J. F. Kurose and K. W. Ross. Computer Networking A Top-Down Approach. Pearson
Education Inc., 2008.
39
BIBLIOGRAPHY BIBLIOGRAPHY
[15] X. Lin. Map displays for information retrieval. Journal of the Americal Society for
Information Science, 1 1997.
[16] X. Lin, D. Soergel, and G. Marchionini. A self-organizing semantic map for informa-
tion retrieval. Proceedings of the 14th annual international ACM SIGIR conference on
Research and development in information retrieval. 262 - 269, 1991.
[17] A.M. Lund. Measuring usability with the use questionnaire. STC Usability SIG Newslet-
ter, 8:2, 8 2001.
[18] J. Makar. ActionScript for Multiplayer Games and Virtual Worlds. New Riders, 2010.
[19] G. Marchionini. Exploratory search: From finding to understanding. Communications
of the ACM 49, 4 2006.
[20] J. ORegan, R. Rensink, and J. Clark. To see or not to see: The need for attention to
perceive changes in scenes. Psychological Science, 8 1997.
[21] G. M. Sacco and Y. Tzitzikas. Dynamic Taxonomies and Faceted Search: Theory, Prac-
tice, and Experience. Springer Science and Business Media Inc., 2009.
[22] J. Smed and H. Hakonen. Algorithms and Networking for Computer Games. John Wiley
and Sons Ltd, 2006.
[23] M. Steyvers. Multidimensional Scaling. In: Encyclopedia of Cognitive Science. Macmillan
Reference Ltd., 2002.
[24] D. Svanaes. Understanding Interactivity: Steps to a Phenomenology of Human-Computer
Interaction. PhD Thesis. NTNU, Trondheim, Norway, 2000.
[25] A. G Taylor. Introduction to Cataloging and Classification. 8th ed. Englewood, Colorado.
Libraries Unlimited, 1992.
[26] B.C Vickery. Faceted classification: a guide to construction and use of special schemes.
London: Aslib, 1960.
[27] R.W. White, B. Kules, S.M. Drucker, and M.C. Schraefel. Supporting exploratory search.
Communications of the ACM, 49, 4 2006.
40
Appendix A
Protocol Generation DSL
Since I had to do all the programming for the research project myself, the workload was quite
demanding. In order to avoid writing individual implementations for each networking method
or protocol, the protocol generation mechanism has been implemented. To explain how it
works, consider the following C# code:
Listing A.1: A partial definition of the MultiMap protocol
[ Protocol ]
public interface IMultiMapProtocol
{
// Gets a l l aspects in the system
[ ProtocolOperation (100 , Direction . Pull , CompressionTarget . Outgoing ) ]
Aspect [ ] GetAllAspects ( ) ;
// Zooms to a p a r t i c u l a r s e l e c t i o n
[ ProtocolOperation (106 , Direction . Pull , CompressionTarget . Incoming ) ]
void Zoom( Aspect Aspect , List<int> Facets ) ;
// Gets some a d d i t i o n a l information of a movie
[ ProtocolOperation (112 , Direction . Pull , CompressionTarget . Outgoing ,
AccessLevel=AccessLevel . Root ) ]
MovieDetails GetMovieDetails ( int Oid ) ;
( . . . )
}
Figure A.1 illustrates the code one needs to write in order to define a communication protocol.
Such approach can be also considered as a domain-specific language (DSL). Once the protocol
definition is written, the server analyses the protocol definition and generates the code to make
all the communication possible. It generates an assembly for its own and a flash component
library (.swc) for flash application, thus, making possible to simply call any method and
abstracting the complexity from the developer. Our research greatly benefit from this DSL,
as several thousands of lines of code could be generated eliminating potential errors and
boosting productivity.
41
APPENDIX A. PROTOCOL GENERATION DSL
Using the protocol definition it is also possible to define the compression direction (None,
Incoming, Outgoing or Both), which will generate the subsequent function calls during the
packet compilation/read. It is also possible to define the security level per operation, using
AccessLevel parameter (shown in figure A.1).
42

Weitere ähnliche Inhalte

Andere mochten auch

IIiX2012 - Information vs Interaction - Examining different interaction model...
IIiX2012 - Information vs Interaction - Examining different interaction model...IIiX2012 - Information vs Interaction - Examining different interaction model...
IIiX2012 - Information vs Interaction - Examining different interaction model...Max L. Wilson
 
Dynamic Information Retrieval Tutorial - SIGIR 2015
Dynamic Information Retrieval Tutorial - SIGIR 2015Dynamic Information Retrieval Tutorial - SIGIR 2015
Dynamic Information Retrieval Tutorial - SIGIR 2015Marc Sloan
 
Search User Interface Design
Search User Interface DesignSearch User Interface Design
Search User Interface DesignMax L. Wilson
 
Business plan sample by bhawani nandan prasad
Business plan   sample by bhawani nandan prasadBusiness plan   sample by bhawani nandan prasad
Business plan sample by bhawani nandan prasadBhawani N Prasad
 
Hershey`s marketing plan report
Hershey`s marketing plan reportHershey`s marketing plan report
Hershey`s marketing plan reportKerOro SUn
 
Feasibility study of setting up a computer store chapter 1-5
Feasibility study of setting up a computer store chapter 1-5Feasibility study of setting up a computer store chapter 1-5
Feasibility study of setting up a computer store chapter 1-5Christofer De Los Reyes
 
Introduction To The Internet Cafe Business Philippines
Introduction To The Internet Cafe Business PhilippinesIntroduction To The Internet Cafe Business Philippines
Introduction To The Internet Cafe Business PhilippinesGener Luis Morada
 
Cafe coffee day thesis sachin ds68_m310
Cafe coffee day thesis  sachin ds68_m310Cafe coffee day thesis  sachin ds68_m310
Cafe coffee day thesis sachin ds68_m310Himanshu Bansal
 
The Operation Plan Of Black Cafe
The Operation Plan Of Black CafeThe Operation Plan Of Black Cafe
The Operation Plan Of Black Cafedorami
 
The HCI Perspective on IR (DIR2016 Keynote)
The HCI Perspective on IR (DIR2016 Keynote)The HCI Perspective on IR (DIR2016 Keynote)
The HCI Perspective on IR (DIR2016 Keynote)Max L. Wilson
 
Business management dissertation sample for mba students by dissertation-serv...
Business management dissertation sample for mba students by dissertation-serv...Business management dissertation sample for mba students by dissertation-serv...
Business management dissertation sample for mba students by dissertation-serv...Dissertation Services
 
Thesis my documentation
Thesis  my documentationThesis  my documentation
Thesis my documentationcas123
 
Writing Chapters 1, 2, 3 of the Capstone Project Proposal Manuscript
Writing Chapters 1, 2, 3 of the Capstone Project Proposal ManuscriptWriting Chapters 1, 2, 3 of the Capstone Project Proposal Manuscript
Writing Chapters 1, 2, 3 of the Capstone Project Proposal ManuscriptSheryl Satorre
 
instructional matertials authored by Mr. Ranie M. Esponilla
instructional matertials authored by Mr. Ranie M. Esponillainstructional matertials authored by Mr. Ranie M. Esponilla
instructional matertials authored by Mr. Ranie M. EsponillaRanie Esponilla
 
CHIIR2017 - Tetris Model of Resolving Information Needs
CHIIR2017 - Tetris Model of Resolving Information NeedsCHIIR2017 - Tetris Model of Resolving Information Needs
CHIIR2017 - Tetris Model of Resolving Information NeedsMax L. Wilson
 

Andere mochten auch (20)

IIiX2012 - Information vs Interaction - Examining different interaction model...
IIiX2012 - Information vs Interaction - Examining different interaction model...IIiX2012 - Information vs Interaction - Examining different interaction model...
IIiX2012 - Information vs Interaction - Examining different interaction model...
 
my thesis
my thesismy thesis
my thesis
 
Dynamic Information Retrieval Tutorial - SIGIR 2015
Dynamic Information Retrieval Tutorial - SIGIR 2015Dynamic Information Retrieval Tutorial - SIGIR 2015
Dynamic Information Retrieval Tutorial - SIGIR 2015
 
Search User Interface Design
Search User Interface DesignSearch User Interface Design
Search User Interface Design
 
Business plan sample by bhawani nandan prasad
Business plan   sample by bhawani nandan prasadBusiness plan   sample by bhawani nandan prasad
Business plan sample by bhawani nandan prasad
 
Hershey`s marketing plan report
Hershey`s marketing plan reportHershey`s marketing plan report
Hershey`s marketing plan report
 
Feasibility study of setting up a computer store chapter 1-5
Feasibility study of setting up a computer store chapter 1-5Feasibility study of setting up a computer store chapter 1-5
Feasibility study of setting up a computer store chapter 1-5
 
Introduction To The Internet Cafe Business Philippines
Introduction To The Internet Cafe Business PhilippinesIntroduction To The Internet Cafe Business Philippines
Introduction To The Internet Cafe Business Philippines
 
Cafe coffee day thesis sachin ds68_m310
Cafe coffee day thesis  sachin ds68_m310Cafe coffee day thesis  sachin ds68_m310
Cafe coffee day thesis sachin ds68_m310
 
The Operation Plan Of Black Cafe
The Operation Plan Of Black CafeThe Operation Plan Of Black Cafe
The Operation Plan Of Black Cafe
 
The HCI Perspective on IR (DIR2016 Keynote)
The HCI Perspective on IR (DIR2016 Keynote)The HCI Perspective on IR (DIR2016 Keynote)
The HCI Perspective on IR (DIR2016 Keynote)
 
Business management dissertation sample for mba students by dissertation-serv...
Business management dissertation sample for mba students by dissertation-serv...Business management dissertation sample for mba students by dissertation-serv...
Business management dissertation sample for mba students by dissertation-serv...
 
BSCS | BSIT Thesis Guidelines
BSCS | BSIT Thesis GuidelinesBSCS | BSIT Thesis Guidelines
BSCS | BSIT Thesis Guidelines
 
Thesis my documentation
Thesis  my documentationThesis  my documentation
Thesis my documentation
 
Writing Chapters 1, 2, 3 of the Capstone Project Proposal Manuscript
Writing Chapters 1, 2, 3 of the Capstone Project Proposal ManuscriptWriting Chapters 1, 2, 3 of the Capstone Project Proposal Manuscript
Writing Chapters 1, 2, 3 of the Capstone Project Proposal Manuscript
 
Chapter 4
Chapter 4Chapter 4
Chapter 4
 
Chapters 1 5
Chapters 1 5Chapters 1 5
Chapters 1 5
 
dissertation project
dissertation projectdissertation project
dissertation project
 
instructional matertials authored by Mr. Ranie M. Esponilla
instructional matertials authored by Mr. Ranie M. Esponillainstructional matertials authored by Mr. Ranie M. Esponilla
instructional matertials authored by Mr. Ranie M. Esponilla
 
CHIIR2017 - Tetris Model of Resolving Information Needs
CHIIR2017 - Tetris Model of Resolving Information NeedsCHIIR2017 - Tetris Model of Resolving Information Needs
CHIIR2017 - Tetris Model of Resolving Information Needs
 

Ähnlich wie Master Thesis: The Design of a Rich Internet Application for Exploratory Search by Real-Time Generation of Similarity Maps

User behavior model & recommendation on basis of social networks
User behavior model & recommendation on basis of social networks User behavior model & recommendation on basis of social networks
User behavior model & recommendation on basis of social networks Shah Alam Sabuj
 
Organisering av digitale prosjekt: Hva har IT-bransjen lært om store prosjekter?
Organisering av digitale prosjekt: Hva har IT-bransjen lært om store prosjekter?Organisering av digitale prosjekt: Hva har IT-bransjen lært om store prosjekter?
Organisering av digitale prosjekt: Hva har IT-bransjen lært om store prosjekter?Torgeir Dingsøyr
 
Performance characterization in computer vision
Performance characterization in computer visionPerformance characterization in computer vision
Performance characterization in computer visionpotaters
 
Computing Science Dissertation
Computing Science DissertationComputing Science Dissertation
Computing Science Dissertationrmc1987
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsAntonio Severien
 
Document 2 - Interns@Strathclyde
Document 2 - Interns@StrathclydeDocument 2 - Interns@Strathclyde
Document 2 - Interns@StrathclydeKerrie Noble
 
Stock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_AnalysisStock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_AnalysisOktay Bahceci
 
IRJET- Criminal Recognization in CCTV Surveillance Video
IRJET-  	  Criminal Recognization in CCTV Surveillance VideoIRJET-  	  Criminal Recognization in CCTV Surveillance Video
IRJET- Criminal Recognization in CCTV Surveillance VideoIRJET Journal
 
Alyxander May MAY11213081 MComp Project
Alyxander May MAY11213081 MComp ProjectAlyxander May MAY11213081 MComp Project
Alyxander May MAY11213081 MComp ProjectAlyxander David May
 
Nt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature ReviewNt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature ReviewCamella Taylor
 
Geometric Processing of Data in Neural Networks
Geometric Processing of Data in Neural NetworksGeometric Processing of Data in Neural Networks
Geometric Processing of Data in Neural NetworksLorenzo Cassani
 
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...Cagatay Turkay
 
TOGETHER: TOpology GEneration THrough HEuRistics
TOGETHER: TOpology GEneration THrough HEuRisticsTOGETHER: TOpology GEneration THrough HEuRistics
TOGETHER: TOpology GEneration THrough HEuRisticsSubin Mathew
 
2 Flint Art ScrapbookBy Team CNDMHProject Management ()Membe.docx
2 Flint Art ScrapbookBy Team CNDMHProject Management ()Membe.docx2 Flint Art ScrapbookBy Team CNDMHProject Management ()Membe.docx
2 Flint Art ScrapbookBy Team CNDMHProject Management ()Membe.docxvickeryr87
 
A Real-time Classroom Attendance System Utilizing Viola–Jones for Face Detect...
A Real-time Classroom Attendance System Utilizing Viola–Jones for Face Detect...A Real-time Classroom Attendance System Utilizing Viola–Jones for Face Detect...
A Real-time Classroom Attendance System Utilizing Viola–Jones for Face Detect...Nischal Lal Shrestha
 

Ähnlich wie Master Thesis: The Design of a Rich Internet Application for Exploratory Search by Real-Time Generation of Similarity Maps (20)

Final_Thesis
Final_ThesisFinal_Thesis
Final_Thesis
 
User behavior model & recommendation on basis of social networks
User behavior model & recommendation on basis of social networks User behavior model & recommendation on basis of social networks
User behavior model & recommendation on basis of social networks
 
Organisering av digitale prosjekt: Hva har IT-bransjen lært om store prosjekter?
Organisering av digitale prosjekt: Hva har IT-bransjen lært om store prosjekter?Organisering av digitale prosjekt: Hva har IT-bransjen lært om store prosjekter?
Organisering av digitale prosjekt: Hva har IT-bransjen lært om store prosjekter?
 
merged_document
merged_documentmerged_document
merged_document
 
DMDI
DMDIDMDI
DMDI
 
Performance characterization in computer vision
Performance characterization in computer visionPerformance characterization in computer vision
Performance characterization in computer vision
 
Final Document
Final DocumentFinal Document
Final Document
 
Computing Science Dissertation
Computing Science DissertationComputing Science Dissertation
Computing Science Dissertation
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data Streams
 
Document 2 - Interns@Strathclyde
Document 2 - Interns@StrathclydeDocument 2 - Interns@Strathclyde
Document 2 - Interns@Strathclyde
 
Stock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_AnalysisStock_Market_Prediction_using_Social_Media_Analysis
Stock_Market_Prediction_using_Social_Media_Analysis
 
IRJET- Criminal Recognization in CCTV Surveillance Video
IRJET-  	  Criminal Recognization in CCTV Surveillance VideoIRJET-  	  Criminal Recognization in CCTV Surveillance Video
IRJET- Criminal Recognization in CCTV Surveillance Video
 
Alyxander May MAY11213081 MComp Project
Alyxander May MAY11213081 MComp ProjectAlyxander May MAY11213081 MComp Project
Alyxander May MAY11213081 MComp Project
 
Z suzanne van_den_bosch
Z suzanne van_den_boschZ suzanne van_den_bosch
Z suzanne van_den_bosch
 
Nt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature ReviewNt1310 Unit 1 Literature Review
Nt1310 Unit 1 Literature Review
 
Geometric Processing of Data in Neural Networks
Geometric Processing of Data in Neural NetworksGeometric Processing of Data in Neural Networks
Geometric Processing of Data in Neural Networks
 
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
Designing Progressive and Interactive Analytics Processes for High-Dimensiona...
 
TOGETHER: TOpology GEneration THrough HEuRistics
TOGETHER: TOpology GEneration THrough HEuRisticsTOGETHER: TOpology GEneration THrough HEuRistics
TOGETHER: TOpology GEneration THrough HEuRistics
 
2 Flint Art ScrapbookBy Team CNDMHProject Management ()Membe.docx
2 Flint Art ScrapbookBy Team CNDMHProject Management ()Membe.docx2 Flint Art ScrapbookBy Team CNDMHProject Management ()Membe.docx
2 Flint Art ScrapbookBy Team CNDMHProject Management ()Membe.docx
 
A Real-time Classroom Attendance System Utilizing Viola–Jones for Face Detect...
A Real-time Classroom Attendance System Utilizing Viola–Jones for Face Detect...A Real-time Classroom Attendance System Utilizing Viola–Jones for Face Detect...
A Real-time Classroom Attendance System Utilizing Viola–Jones for Face Detect...
 

Kürzlich hochgeladen

Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 

Kürzlich hochgeladen (20)

Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 

Master Thesis: The Design of a Rich Internet Application for Exploratory Search by Real-Time Generation of Similarity Maps

  • 1. Master Thesis The Design of a Rich Internet Application for Exploratory Search by Real-Time Generation of Similarity Maps Roman Atachiants Master of Science Thesis DKE 10-5 Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science of Master of Science in Artificial Intelligence at the Department of Knowledge Engineering of the Maastricht University Exam committee: Dr. Eduard Hoenkamp (supervisor) Dr. Ronald Westra Maastricht University Faculty of Humanities and Sciences Department of Knowledge Engineering Master of Science in Artificial Intelligence June 28, 2010
  • 2. Abstract Users who cannot formulate a precise query but know there must be a good answer somewhere, often rely on exploratory search. This requires an interactive and responsive system, or else the user will soon give up. As data bases are becoming larger, more specialized, and more distributed this calls for a Rich Internet Application, fast enough to keep pace with the users explorations. This thesis studies and implements a system, called MultiMap, which computes similarity maps in real-time. This entailed: (1) precomputing every data structure that does not change after the initial query, (2) optimizing algorithms for zooming and map generation (3) and providing a cognitively appropriate visualization of high dimensional space. Applied to a very large movie database, it resulted in a highly responsive, satisfying, usable system. 1
  • 3. Acknowledgments A lot of people helped me in different ways all along the research project and brought different insights and opinions. I want to thank my fellow students, professors, friends and family who helped, tested the prototype and supported/endured me during the research. In particular, I would like to thank Dr. Eduard Hoenkamp for his support and supervision of the project. Our regular meetings, discussions, brainstorming helped me a lot from the very beginning and theoretical part of the research, down to the implementation, engineering and design. But aside of professional relationship, I enjoyed his company the most and our discussions about various domains, including: education, technology, politics, travel,... are really memorable to me. Next, I would like to thank a fellow A.I. student, Tom Marechal. He was an invaluable asset and friend, as he provided me with inspiration and ideas all along the research project. Additional, I would like to thank Dr. Johannes C. Scholtes and Dr. Ronald Westra for their support, evaluation and critical thinking. Not only they, during the classes, largely inspired me for this project but also gave various invaluable insights that contributed to making this thesis better. I would also like to thank also everyone who participated in the testing and evaluation of the system, without their time and feedback the project would not be what it is today. 2
  • 4. Contents 1 Introduction 4 1.1 Exploratory Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Faceted Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Interactivity & Responsiveness . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 The Concept 12 2.1 The Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 The Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 The System 15 3.1 Architectural Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 Mathematical Concepts & Algorithms . . . . . . . . . . . . . . . . . . . . . . 17 3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2.2 Preprocessing & Correlations . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.3 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.4 Facets Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.5 Movies Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2.6 Creation of Aspect Maps . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3 Server Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 The Client Front-End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4.2 GridMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4 Usability Aspects 35 5 Conclusions 37 A Protocol Generation DSL 41 3
  • 5. Chapter 1 Introduction Search and data visualization are becoming more and more important as we are entering the Petabyte Age. Traditional approaches of searching large datasets are query-based ones, which by itself implies knowing what the user (researcher) is looking for. However, this approach of searching the information is difficult when one is not familiar with the domain or lacks the knowledge or contextual awareness in order to formulate precise queries to navigate the information space. For example, how do we find something we would like to know more about, but without having the specific knowledge to formulate a precise question? How would we find a movie we might enjoy if we never saw Robert DeNiro or Charlie Chaplin? Or knowing that we enjoy Quentin Tarantino’s movies, how would we discover other, relatively similar movies? In order to find those movies, we perform a search process called exploratory search. Exploratory search is a specialization of information retrieval which represents the activities carried out by searchers who are: • unfamiliar with the domain of their goal (i.e. need to learn about the topic in order to understand how to achieve their goal) • or unsure about the ways to achieve their goals (either the technology or the process) • or even unsure about their goals in the first place. In this research, we try to address this exploratory search problem [27] by introducing a novel interactive search system. This system is called MultiMap and relies on similarity measurements in order to present the latent information relations to the user in a geographic manner. The system have been developed and tested using the Netflix dataset [7], containing about 125.000 movies. A custom selection were performed on the dataset: • The genres were filtered to 28 IMDB genres. • The directors were filtered to those with at least 5 movies made (in total around 2500 directors). • The actors were filtered to those with at least 10 movies where an actor has participated (in total around 6000 actors). • The movies were filtered to those containing all needed information and made by the preselected directors and actors. The final database contained around 16000 movies. 4
  • 6. 1.1. EXPLORATORY SEARCH CHAPTER 1. INTRODUCTION 1.1 Exploratory Search During the first phase of research we considered the exploratory search problem [11] [19], trying to answer the following questions: 1. How to help the user who is unfamiliar with the domain (i.e.: a user who saw only a few movies and/or doesn’t know many directors, actors)? 2. How to help the user who doesn’t know how to find a particular movie? 3. How to help the user who doesn’t know what kind of movies he likes? Figure 1.1: This figure represents an abstracted backwards reasoning that has been applied, in order to answer to exploratory search questions. On the figure: green represents the interesting directions; red represents an unwanted direction; blue represents intermediate steps. Figure 1.1 shows a result of a backwards reasoning we performed in order to try to reason about those 3 questions. The goal of the research was to find a system that can answer those questions without much guessing, mostly because we want the user to explore and learn about 5
  • 7. 1.1. EXPLORATORY SEARCH CHAPTER 1. INTRODUCTION the domain. From this analysis phase we derived several things that needed to be achieved by the system: • An extracted meaning of the data is required, the system should know about the domain. In our particular case, the cinematographic domain. • A way to preserve relations in order to help the user to relate different items. • A way to drill down to individual movies and examine them is needed in order to allow the user to navigate. • Relevance feedback is needed in order to show the user how interesting a particular item is and how relevant it is for his search. The idea behind relevance feedback is to take the results that are initially returned from a given query and to use information about whether or not those results are relevant to perform a new query. The exploration in exploratory search means that a user have to be able to explore different directions and, in a manner, swim in the data. The exploration factor is something very implicit and therefore difficult to evaluate. In contrast to standard search engines, where the user composes a query and the engine returns the closest documents to that query (document), we do not want to select the closest points always in our system and restrict the user to the search results that are the most relevant ones. By doing so, we allow the user to explore different directions in this multi-dimensional space. 6
  • 8. 1.2. FACETED CLASSIFICATION CHAPTER 1. INTRODUCTION 1.2 Faceted Classification One of the approaches in the exploratory search research domain that has been proven useful and used in many different visualization systems is called faceted classification [26] [12]. This approach is very common and widely used all across the World Wide Web, especially on commercial web sites (Amazon, Ebay). Figure 1.2 illustrates the search box of the website Amazon.com, where the fields Author, Title, ISBN, Publisher, Subject, Condition, etc. are the facet categories. Faceted classification system allows assigning a different classifications to a particular object, often, the object we want to search for, which is in our case: a movie. Using multiple classifications enables to reorder the data in multiple of different ways and define a search criteria. Figure 1.2: The advanced search box on the Amazon.com website, the additional fields are different aspects of a book. A facet comprises “clearly defined, mutually exclusive, and collectively exhaustive aspects, properties or characteristics of a class or specific subject” [25]. In this thesis, we use the word “Aspect” to distinguish a facet category, and word “Facet” for a particular facet, for example: Aspect : Actors; Facets : Robert DeNiro, Johnny Depp, Bruce Willis... The Netflix contest dataset contained 17700 different movie titles and served as a basis for the data in this research. Considering the need of extracting different facets for each of those 7
  • 9. 1.2. FACETED CLASSIFICATION CHAPTER 1. INTRODUCTION movies, a special tool has been written to extract additional information from the Internet Movie DataBase (IMDB) [1] website and Netflix Database via their exposed APIs. This tool was able to extract about 95% of the information for those movies. In particular, we were interested in: • Genres of the movies (Fantasy, Science-Fiction, Crime, Drama...) • Year of release • IMDB ratings, which is a precise rating from 1 to 10, rounded to 1st decimal • Directors of the movies (Steven Spielberg, Quentin Tarantino...) • Actors of the movies (Robert DeNiro, Johnny Depp, Bruce Willis...) Additionally, there were also some other data about the movies (writers, movie plots, ...), but not as abundant as the five aspects presented above. Therefore, we decided to base the system on above aspects alone. 8
  • 10. 1.3. INTERACTIVITY & RESPONSIVENESS CHAPTER 1. INTRODUCTION 1.3 Interactivity & Responsiveness Exploratory search is a process performed by a human who is using a tool (computer) to interact with large quantities of information in order to explore and find the relevant pieces of information. This human-computer part means by definition that the actual process is an interactive process, therefore the interactivity is a very important aspect in exploratory search. One way to approach interactivity is to start with the notion of “look and feel”. The term has become more or less synonymous with how the term style is used in other design disciplines. In a concrete sense, the “look” of a GUI is its visual appearance, while the “feel” denotes its interactive aspects [24]. One of the consequences is that the interface should be very responsive and fast. One must also consider the fact that search systems need to handle large amounts of data and need a lot of computing power. One logical conclusion is that in order to build a good exploratory search system, the data manipulation should be handled by powerful machines to be fast. During our research, we opted to a client-server approach to enhance the interactivity without losing the computing power we need to perform all operations in real-time, keeping the system well responsive and interactive. By having all operations in real-time, we run into the problem of massive networking communication. The communication in this case is a two-way dialog between the client and the server. We need the communication to be duplex, where the server and the client have the ability to initiate the dialog, because the current world wide web is becoming real-time (huge services as Twitter and Facebook are good examples). As the information flow is updated in real-time, most of the services are still using the traditional HTTP protocol-based technologies. The Hypertext Transfer Protocol (HTTP) is an Application Layer protocol for distributed, collaborative, hypermedia information systems (RFC specifications can be found: [2]). HTTP is a request-response protocol standard for client-server computing. In HTTP, a web browser, for example, acts as a client, while an application running on a computer hosting the web site acts as a server. The client submits HTTP requests to the responding server by sending messages to it. The server, which stores content (or resources) such as HTML files and images, or generates such content on the fly, sends messages back to the client in response. These returned messages may contain the content requested by the client or may contain other kinds of response indications [3]. The problem with using HTTP for interactive and real-time web is a fundamental one, as world wide web evolved, different architectures and new frameworks (SaaS, SOAP, AJAX ...) were built on the top of HTTP protocol, but fundamentally, the real-time communication is mainly done using the polling technique (see figure 1.3). The polling is a workaround, basically it is a client, asking the server for update on a very short interval, constantly. There are several problems with this approach: 1. The client’s and server’s CPU resources are used all the time for mostly useless update checking. This, on mobile devices, potentially drains the battery life. 2. The networking bandwidth is used constantly, and as the networking throughput of the server is limited, this becomes a bottleneck very quickly. In order to find how to design a system responsive enough for such communication, consider the requirements: 9
  • 11. 1.3. INTERACTIVITY & RESPONSIVENESS CHAPTER 1. INTRODUCTION Figure 1.3: This figure shows the communication principles for real-time updates of the polling architecture and a publisher/subscriber architecture. 1. A client-server approach, since the amount of data is important and the computations can be very expensive. 2. Reliable networking is necessary (as we are not considering a streaming application and need a reliable two-way communication), therefore the choice for the transport layer is TCP [14]. 3. A format for message parsing in order to encode/decode complex messages while having the minimum impact on the performance Since those requirements are quite similar to the requirements for multi-player client/server on-line games, we considered that the best place for finding the technological answer for an interactive search system would be the gaming literature [10] [18] [22]. The games are by definition interactive applications, and on-line games are usually intensively optimized for the latency and throughput. Due to the fact that the interactivity requires a lot of duplex communication, the best option is a socket-server [18], and a custom protocol for low-level message encoding. Following those considerations, an interactive exploratory search system can be designed as a multiuser on-line game engine. The architecture should fulfill six goals: minimize network traffic, provide opportunities for load balancing, provide a secure game playing environment, 10
  • 12. 1.3. INTERACTIVITY & RESPONSIVENESS CHAPTER 1. INTRODUCTION provide a high level of scalability and maintainability, and maximize client side performance for real-time graphics [8]. The architecture for the system is layered and component-based: • The Network Component that contains the Packet Serializer (Messenger), De/Encrypt, De/Compress and Network modules. The Messenger module is in charge of forming and sending messages in a given format. • The User Component that contains both the Authenticator and the User Database modules. • The Search Component that is used and designed specifically for the exploratory search purposes with a custom protocol. For the system designed for this thesis, the search component is described more in detail in the section 3.2. As mentioned earlier, the latency is a crucial point for highly interactive applications. Latency refers to the time it takes for a packet of data to be transported from its source to its destination. In many networking texts, you will also see the term Round Trip Time (RTT) in reference to the latency of a round trip from source to destination and then back to source again. In many cases the RTT is twice the latency, but some network paths exhibit asymmetric latencies, with higher latencies in one direction than the other [6]. There are different ways to deal with latency, but simply put: we need more control over the sent/received packets and minimize their size and being able to prioritize and parallelize different actions [5]. 11
  • 13. Chapter 2 The Concept 2.1 The Idea In the chapter 1 we considered the implications of exploratory search problem and its basic components as faceted classification and interactivity. This thesis introduces a novel ex- ploratory search interface, called MultiMap which relies on similarity measurements in order to present the information to the user. In earlier 1990s it was demonstrated that spacial map- ping techniques can be generated to visualize contents and semantic relationships of a docu- ment space [15], yet, there are still not many systems that actually use mapping techniques. The idea behind a system comes from a simple map, where the information is presented in a geographic manner: two towns that are close on a map mean the closer transition from one to another. Using a map, it is possible to navigate and explore huge amount of information by zooming/unzooming and exploring the dataset both locally and globally. Figure 2.1: A world map with countries divisions. If we can do it for our planet earth using mapping software (Google Maps or Bing Maps are the examples of such software), why couldn’t we explore different datasets in the same way? 12
  • 14. 2.1. THE IDEA CHAPTER 2. THE CONCEPT What if we could zoom on both New York and Tokyo and generate a new world map, having Washington, New York, Tokyo, Kyoto and Paris in between (use figure 2.1 in order to help imagining)? It can be rather messy to view them in this way, that’s why we also need to introduce the context: Washington and New York are in United States of America, Tokyo and Kyoto are in Japan and Paris is in France. The countries are a clear separation between the cities and helps us to understand better the cities. Now replace the towns by the Movies, the countries by Genres/Actors/Directors and this gives a basic understanding of how MultiMap works. MultiMap is based on this idea of zooming and on-the-fly generation of new maps. Formally it involves choosing new coordinate system. MultiMap features also the ability to unzoom to see again the whole picture and switch the maps if needed (again, think Google Maps). In order to understand better how MultiMap works, let’s go back into the movie context and think of different aspects, facets and movies: • An aspect “Genres” contains facets “Action”, “Adventure”, etc. • The facets “Action”, “Adventure” can relate to movies like “Indiana Jones” etc. • The movie “Indiana Jones” contains the actor “Harrison Ford” (which is also a facet of aspect “Actors”) One can notice that this is a closed loop, it is possible to look at different genres, then look at a particular movie, then switch to actors and go on and explore the information this way. If we imagine for a second that we can create a map of an aspect, where the points (“countries”) would be the facets, we probably should be able to place also the movies (”towns”) on that map. In order to create such maps, we need several components: • A function to compare two facets of an aspect, a distance measurement. For example, this way we would be able to compare the similarity between the Adventure genre and the Action genre or between Tom Hanks and Harrison Ford. • A way to create a map very quickly as new map should be generated when the user zooms on some movie. • A way to measure relevancy of the movies and facets. Considering our example above, what towns we would choose to present on a new map if we zoomed on New York and Tokyo? Paris, London, Rome? Further in this document, chapter 3 explains how the whole system is done, and in particular, the section 3.2 explains all concepts and algorithms that were developed in order to produce a working prototype of MultiMap. 13
  • 15. 2.2. THE PROTOTYPE CHAPTER 2. THE CONCEPT 2.2 The Prototype The MultiMap concept can be divided on two main parts: • The system that performs all mathematical computations, handles the data and oper- ations on the data. • The front-end that is presented to the user, after all, there are many different ways to present a map. Figure 2.2 shows the front-end that we designed as our first approach to create a visualization for MultiMap system. Figure 2.2: A screen-shot of the prototype, presenting a grid map on the directors aspect. The front-end visualization for the MultiMap we designed is called GridMap, and is one of the approaches to visualize those maps. This approach relies on very ordered presentation of the maps . In fact, it tries to map a cloud of 2D points to a grid while trying preserve the spacial relations. The interface allows users to switch the aspect maps, zoom on different facets and by flipping a grid cell, viewing a details of a particular movie and follow its links to construct new maps. Section 3.4 explains more in detail the actual interface and its different components. 14
  • 16. Chapter 3 The System 3.1 Architectural Overview The system was designed to be a client-server application with several tiers, in this section we will describe its design. The main idea is based on the interactivity between the user and the data, and the ease-of-use. First of all, the system should meet several prerequisites: • it should be interactive, so it has a real-time constraint; • it should be able to handle large datasets; • it should be easy to use and available to remote users. Figure 3.1: The layered architecture of MultiMap system. Following those prerequisites, the logical conclusion is to build a real-time Rich Internet Appli- cation (RIA) [9]. Such applications are mainly standard n-tier based applications. MultiMap architecture is a 3-tier real-time architecture, allowing to the front-end client to have full 15
  • 17. 3.1. ARCHITECTURAL OVERVIEW CHAPTER 3. THE SYSTEM interactivity with the data. The main idea behind such a system is to have a clear separa- tion between the client, the logic and the data itself, as illustrated in Fig.3.2. The actual architecture, as described in Fig. 3.1, consists of : • a front-end client in flash, allowing interactive data visualization; • a custom C# real-time server, written by myself in order to handle large amounts of data interactively; • a logic layer running the Matlab engine for all data-intensive search, correlations and other operations. Figure 3.2: Visual overview of a Three-tiered application. Illustration from Wikipedia. 16
  • 18. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM 3.2 Mathematical Concepts & Algorithms 3.2.1 Overview Figure 3.3: The representation of the data-flow, representing how the data is processed on the fly (in an interactive mode). The main purpose of the research is the interactivity of the system. This imposes a real-time constraint and makes things very difficult to engineer, especially when the computation time can take very much time. Based on this, we needed a system, that can handle this data-flow rapidly, and update quickly respond to user queries. Figure 3.3 shows the simplified sequence 17
  • 19. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM diagram of the system, when the information need to be updated and presented. Next few section explain the details of this schema, block by block. The system uses a content-based recommendation method. In content-based recommendation methods, the utility u(c, s) of item s for user c is estimated based on the utilities u(c, si) assigned by user c to items si ∈ S that are similar to item s. For example, in a movie recommendation application, in order to recommend movies to user c, the content-based recommender system tries to understand the commonalities among the movies user c has rated highly in the past (specific actors, directors, genres, subject matter, etc.). Then, only the movies that have a high degree of similarity to whatever users preferences are would get recommended [4]. Overall, the flow consists of several main points: • The preprocessing step performs the transformation and precomputes the maximum of information that can be precomputed. It considers all aspects and for each facet in each aspect computes a closest network (explained in the section 3.2.2). • The session initialization step initializes the user session and copies some of the prepro- cessed data in a so-called Ranking Matrix. • The update step performs the update of the Ranking Matrix (see 3.2.3 for more infor- mation). By doing so, a new ranking matrix is created, basically updating the ranks/rel- evancy ratings based on the selection. • The facets selection step chooses several facets, based on the Ranking Matrix. To do so, it combines 2 techniques: takes a subset of most relevant facets from the matrix, then performs a k-means clustering to be able to pick most ”global” facets. This step is explained more in detail in section 3.2.4. • The movies selection step selects the most relevant movies for each facet that have been chosen. This step is explained more in detail in section 3.2.5. • The creation of aspect maps performs the multidimensional scaling [23] and a custom grid-map algorithms, in order to create 2-dimensional grid, where the latent relations between different facets are retained. This approach is explained in section 3.2.6. This step can be potentially replaced by any other representation, including 3-dimensional ones. 18
  • 20. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM 3.2.2 Preprocessing & Correlations Overview The system handles a lot of data and reorders it continually on each request of the user. In order to allow the system to perform in the real-time, as much data as can be done should be precomputed. Several things that needs to be done: • For each aspect, the facets should be correlated in order to allow the comparison between 2 points. This is done differently for each aspect, depending on the data. It allows, for example, to correlate an Adventure genre and Science-Fiction genre. • For each aspect, the facet network is computed. This network allows us to propagate a ranking and reorder the facets in real-time. See the section 3.2.2 for more details. • For each facet of each aspect, a list of most relevant movies is constructed and ordered. This is done to allow to pick the movies in real-time. This step is explained in more detail in the section 3.2.2. In the precomputation phase, one of the most important result is to be able to construct so- called ”Aspect Spaces”. Aspect Spaces are N-Dimensional dissimilarity matrices. The Aspect Spaces are computed based on a particular distance metric δ(i, j) := distance between i th and j th features of an aspect. In order to simplify the implementation, we define: • Input matrix I is an initial data we need in order to compute similarities between aspect samples. They are presented in N dimensional space, where N is the number of movies, about 16000. • Per aspect, a function δ which can be different for every aspect and computes the membership of the aspect to a particular movie. Next few sections are explaining the definitions and the steps which are performed in order to create each aspect space. Genres Space In order to create the genres space, the genres are correlated using simply the complete movies distribution. The input matrix I for the genres space is defined as following: Ii,j =    δ(Genre1, Movie1) · · · δ(Genre1, Moviej) ... ... ... δ(Genrei, Movie1) · · · δ(Genrei, Moviej)    The membership function δ : δ(Genrei, Moviej) = 1 if movie contains the genre 0 otherwise 19
  • 21. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM Finally, we define a distance function, which is a general cosine distance: ∆(Genrei, Genrej) = Ii ∗ Ij Ii Ij In order to test how good the correlation is, one can use the aspect space as the input for the multidimensional scaling function. This helps to visualize the correlations and see if the desired meaning is preserved. Figure 3.4 show the 2 dimensional genres space, we will call such maps “Aspect Maps”. One can see that the correlation makes sense, for example: the Adventure genre is close to Fantasy and Science-Fiction. Figure 3.4: This figure shows the distances between genres in 2 dimensional space after performing a multidimensional scaling on the genres space. Ratings Space Ratings space can be used in different ways, and depending on the choice of usage, the correlation can be adapted: • ratings can be used as an additional dimension, shown using a color or a font size while showing a movie; • ratings can be shown in order of euclidean distance; 20
  • 22. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM • ratings can be used to create a complete ratings aspect space, but this requires more complex correlation function. In the research, we decided to use the second approach, simply calculating the euclidean pairwise distance for each rating. Years, Directors and Actors Spaces There are several ways to correlate the years, directors and actors. In our research, we wanted to explore the possibility to correlate those facets based on their genres distribution. This approach would allow the user, for example, to see what kind of movies were done in a particular year and what are similar years, in terms of genres distribution. To do so, we proceed as follows: Ai,j =    δ1(Y ear1, Movie1) · · · δ1(Y ear1, Moviej) ... ... ... δ1(Y eari, Movie1) · · · δ1(Y eari, Moviej)    The membership function δ1 : δ1(Y eari, Moviej) = 1 if movie released that year 0 otherwise Next, we reuse the input matrix I from the genres space. This is defined as follows: Bi,j =    δ2(Genre1, Movie1) · · · δ2(Genre1, Moviej) ... ... ... δ2(Genrei, Movie1) · · · δ2(Genrei, Moviej)    The membership function δ2 : δ2(Genrei, Moviej) = 1 if movie contains the genre 0 otherwise Next, we need to compute the matrix I, which tells us in how many movies of different genres the actor has participated in. This is computed by a matrix multiplication of A and B transposed: Ii,j =    δ(Y ear1, Genre1) · · · δ(Y ear1, Genrej) ... ... ... δ(Y eari, Genre1) · · · δ(Y eari, Genrej)    = A × BT Finally, by computing the pairwise cosine distance for the matrix I, we are able to correlate the years, based on their genres distribution. The same procedure is applied in order to correlate the directors and actors. Figure 3.5 shows the aspect map created for the directors, as we did with the genres, the results seem to make sense: Quentin Tarantino is quite close to Martin Scorcesse (they do very similar kind of crime movies) and at the same time quite far away from George Lucas, the creator of Star Wars saga. 21
  • 23. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM Figure 3.5: This figure shows the distances between directors in 2 dimensional space after performing a multidimensional scaling on the directors space, similar to figure 3.4 Facet Network In order to perform the zooming and allow the system to be interactive, one needs a way to select and sort the facets rapidly. In MultiMap, this is done by precomputing a facet network (Fig. 3.6), and setting a particular rank value to each node in this kind of network. Generally speaking, we need to compute the matrix R with facets on the rows and two (or more) “pointers” to the closest points. The desired matrix R: Ri,3 =    Facet1 1st closest facet 2nd closest facet ... ... ... Faceti 1st closest facet 2nd closest facet    The closest points computation is done using the previous inter-facet correlations. This step can be very time-consuming, as it has the complexity of O(n2). This would interrupt a smooth interaction with the user, and therefore would be prohibitive. Fortunately this matrix can be precomputed even before the interaction starts. In general, anything that can be precomputed, should be precomputed to make the system responsive. 22
  • 24. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM Figure 3.6: A subset of the precomputed facet network for Genres aspect. In MultiMap, everything that can precomputed will be precomputed, which is conducive to a smooth and responsive interaction. Movie Ordering Last step is movie ordering. This step is very straightforward, as it is the rearranging of the movies-facet relations in the following form: Fi,2 =    Facet1 Movie vector, ordered by relevancy ... ... Faceti Movie vector, ordered by relevancy    For the sake of simplicity, we use an IMDb rating as a relevancy measure. This rating is a number from 0 to 10 with one decimal and based on the huge statistics from the IMDb website visitors. The following example of the movie ordering for genres space illustrates this: Fi,2 =      Adventure The Judy Garland Show The Secret of Monkey Island · · · 9, 8 9, 6 · · · ... ... Faceti Movie vector, ordered by relevancy      23
  • 25. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM 3.2.3 Ranking We would like to give users the ability to zoom in on individual facets or movies based on their selection. This can be accomplished, by ranking each point and re-ranking them with every zoom. For this we need a facet network (graph), ideally with a 100% coverage of the facets and tightly interconnected. Such a network is constructed in the preprocessing step (see section 3.2.2) in the form of graph where a node (a facet) is connected to 2 closest neighbors. For example Science-Fiction genre would be connected to Adventure genre and Action genre, as illustrated in figure 3.6. Based on such network, a zooming can be effectively done as a recursive algorithm, with several parameters: • Vector B, is a weight vector for the closest points. For example, a vector where first closest gets full weight, second closest gets half of the weight would be: B = (1, 0.5) • Depth-decay function for each node at depth d λ(d + 1, ρ, b) = ρ + (γ/d) ∗ b Where: – d is the actual depth – ρ is the actual rank of the node – γ is the decay factor (a constant) – b is the weight of the point from weight vector The depth-decay function here presented is a linear function, but depending on the context and needs, can be adapted or changed. The depth-decay function calculates the current ranking ρ, which updates the network. The ranking is computed recursively for each neighbor, then the network is sorted by the rank and first x nodes are shown to the user. Additionally, zooming out can be done in several different ways: the simplest (and most computationally efficient one), is to keep track of all changes to the ranking value ρ on each step. This approach would use some memory, but there’s no need to recalculate everything. Another approach would be to recursively recalculate ρ values backwards, but effectively using CPU to do the calculation. The depth-decay function should also be updated in order to support such feature. 24
  • 26. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM 3.2.4 Facets Selection At this point in the data-flow we have a Ranking Matrix and a simple solution would consist of performing a selection and simply selecting few first ranked facets. Such an approach is just fine for standard search engines, for example Google, Lemur... In MultiMap, this is performed using a selection algorithm but why do we actually need one? In order to answer this question, let’s consider following: • standard search engines use a query in order search the data, therefore the most relevant documents are the ones what are the closest to the query in this multi-dimensional document space; • in exploratory search we need an exploration factor, allowing the users to explore dif- ferent possibilities. With this, we don’t particularly want to restrict the results to only closely-related and most relevant points, but also to other points, related to the topic (at some extent). The selection algorithm allows us to pick a number of rows from an Input Matrix I. Recall that Input Matrix I is a step just before pair-wise distance comparison, so basically it’s a ready-to- compare matrix, where getting a distance between 2 points actually means something. The idea behind the algorithm is quite simple: it selects a subset of relevant facets, which is bigger than the amount of facets that need to be shown to the user; it tries to find k clusters within the subset and then takes the closest points to each cluster centroid. The selection algorithm works in a rather straightforward way: • first, a selection of top ranked facets is performed. In the prototype we take twice the number of facets that we actually want to present to the user (i.e.: if we need to show a grid of 2 by 2 points, we take the 8 most relevant facets from the ranking matrix); • next, the algorithm computes k-means clustering, with k clusters. Where k clusters would be the number of points to show to the user, for example 4 actors would mean k=4 • once k clusters are found, each point has an assigned index of a cluster and we also have k centroids for each cluster. The selection continues by taking 1 closest point to each centroid, therefore taking the most average point in the particular cluster. • finally, it returns the selected facets. 25
  • 27. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM 3.2.5 Movies Selection The next step in the data-flow is the actual selection of the movies. By now the system is going to present the facets it selected (the most relevant facets to the current zoom sequence). The movies presented on the map can be selected simply by taking several first movies, based on some rating function. We take the IMDB average rating as the value used to sort the movies within each facet. This was already done in the preprocessing phase (see section 3.2.2), and the selection resumes by taking the first few movies from the facet. For example, in the following matrix one can see that if Adventure is a selected facet, the movies ”The Judy Garland Show” and ”The Secret of Monkey Island” will be selected as they have the highest IMDB rating within the facet. Fi,2 =      Adventure The Judy Garland Show The Secret of Monkey Island · · · 9, 8 9, 6 · · · ... ... Faceti Movie vector, ordered by relevancy      Now the selection of facets and movies are done, we can actually proceed to the creation of the Aspect Maps. 26
  • 28. 3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM 3.2.6 Creation of Aspect Maps The final step in the data-flow is the creation of the so-called Aspect Maps, a spatial rep- resentation of the selected facets. The maps allow the user to compare different facets and subsequently the related movies between themselves. We use maps to help the user envisage the locations of movies and facets in high dimensional space. Since it would be too difficult to visualize, this high dimensional space is reduced to two or three dimensions. For this of course we need a dimension reduction that is faithful to the distances in the original space. From the many techniques that are available (dimensionality reduction, ordination...) we selected multidimensional scaling (MDS). Figure 3.7: The transition from the facet selections to the aspect map. Multidimensional scaling is a special case of ordination. An MDS algorithm starts with a matrix of item-item similarities, then assigns a location to each item in N-dimensional space, where N is specified a priori. In our case, we want to reduce the matrix to 2 or 3 dimensions, to be able to visualize the result on a screen. The figure 3.7 shows the process of creating the aspect map in this step, it is quite straight- forward and all the data structures by now are ready to be consumed directly by an MDS algorithm. Figure 3.5 is actually a result of the MDS on a subset of the directors aspect and illustrates the output in this step. Sometimes people have suggested to use Self-Organizing Maps (SOM, [16]) to generate a lower dimensional representation. What we found that for this particular case SOM is prohibitively inefficient. By the end of this step, we have a collection of points in low dimensional space. Those points can be presented to the user in a number of different ways. Our approach is called the GridMap visualization and it is explained in the section 3.4.2. 27
  • 29. 3.3. SERVER TECHNOLOGY CHAPTER 3. THE SYSTEM 3.3 Server Technology From the beginning of the research, we wanted the system to be highly interactive and re- sponsive. In order achieve this we need a scalable system with high performance. For this, we determined the following requirements: • the data has to be sent very efficiently, potentially about 5-10 Kilobyte of text data on each user request; • the ability to notify user of events happening on the server; • real-time communication for the interaction, for instance, when user clicks on something, the system have to process the request in less than a second (or else, people simply won’t use it). Given the above requirements, the system should be based on an event-driven architecture (EDA) with compression and security. For completeness, here is the list of most distinctive features of the server (some readers may find it a bit technical): • Monolithic server, running on one machine, but potentially scalable to a cluster of machines. • Manages the thread pool and distributes the work to each thread. It would try to match the number of threads to cores (i.e.: 4 threads on a Quad Core machine) and distribute smaller tasks to those threads. • Big tasks are represented in a form of software timers, which are sliced in order to achieve scalability. • The server manages a socket pool, listening to several endpoints. Works with IPv4 and IPv6 as well. • Written in C#, the server is compatible with 32 and 64 bit platforms. It is also CLI- compliant and works on cross-platform frameworks like Mono (works on Unix, Linux...). • Handles client-socket lifetime, in order to achieve stability and error-tolerance. • Integrates Matlab interoperability layer, allowing the C# to communicate with Matlab and then send the results to the Flash client via network. • Handles the data via an object-relational mapping (ORM) layer. • Publish-Subscribe model is used for the real-time notifications. It allows clients to subscribe to an event of the server and be notified by the server when the event happens. This notification happens via a push-operation. • Custom message serialization/deserialization. • Per-packet compression/decompression. 28
  • 30. 3.3. SERVER TECHNOLOGY CHAPTER 3. THE SYSTEM • Through introspection, the server generates a networking libraries, compliant to a pro- tocol interface. Appendix A provides more information on this feature and illustrates some of the security and compression mechanisms. • Accounting, sessions mechanisms in order to keep track of users and their accounts and connections. • Access-Level security mechanism. All these features were actually implemented by ourselves, since at the time of our research not all of the technology was available to us. 29
  • 31. 3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM 3.4 The Client Front-End 3.4.1 Overview Figure 3.8: The prototype of the client front-end The system we have described manipulates points in high dimensional space. This is not going to change. What we will add in this section is a way to present these points in a low dimensional space so that the user can interact with the system through direct manipulation in real time. For our prototype, we developed the visualization system, called GridMap. Section 3.4.2 explains how this system works and why. 30
  • 32. 3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM 3.4.2 GridMap The reduction to the two dimensional space was already explained in the section of Aspect Maps (3.2.6). This was accomplished by multidimensional scaling. It is more important to know that a point is near another point than to know the exact distance. For example, as shown in figure 3.8, it is more important to know that Action is close to Adventure than to know the exact distance. Gridmap then, maps the points from 2D space calculated by MDS to a grid, where the exact distances disappear but the spatial order is retained. In the figure 3.8 9 cells are presented and the number of cells can be changed depending on the size of the screen (for example, during our experiments on 24 inch screen, the optimum GridMap size was 4 by 5, allowing to present easily more than a hundred of movies without overloading the user with information). Figure 3.9: This figure illustrates a mapping performed by the GridMap which removes the exact distances while leaving the order intact. The interface makes it easy for the user to zoom and filter: the left panel (as shown on figure 3.8) allows to switch between the aspect maps and filter and search on every facet. For example, when the user knows some particular actor he can search in the actors pane and then zoom and view the similar actors. Additionally, this panel allows the user to customize the zooming criteria and tune the MultiMap parameters: number of movies per facet, number of facets to show on the map, etc. 31
  • 33. 3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM Transitions It often happens that a person viewing a scene fails to see large changes in the scene. This is called change blindness, a well-known psychological phenomenon [20]: if the change in the scene coincides with some visual disruption such as a saccade (very small eye movements) or when the scene is briefly obscured. This situation often occurs in web applications, where the web page briefly flashes after actions demanding a new server request. In this context, animated transitions help the user see the changes in the scene [13] [21]. The transitions turned out to be quite important, providing visual feedback to the user so he know what’s going on. In the GridMap, there are two kinds of crucial transitions: • The transition that animates the facet pane, keeping it visible during the zooming on this pane, then moving it to a new position. This greatly helps to the user to keep track of the item he is zooming on to. This is needed, since on each zooming the coordinate system changes according to the zoom and can be quite confusing to the person who uses the interface. • The transition that is shown on the figure 3.10 which flips the grid cell, allowing the user to see the details of a particular movie within its context. This transition allows the user to directly see the information about the actual element he’s interested in, keeping everything in context. The user can always flip back and see other movie details. This is what people do in the video store where they look at available movies, pick one and flip it to see it’s details on the back. Figure 3.10: As in the video store, one can select the movie and look on the back of the box to see the details. 32
  • 34. 3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM Cell Representation The cell representation allows the flipping feature, illustrated in figure 3.10. Based on the feedback of users, this feature proved to be very attractive and motivated them to experiment further with the interface. Additionally, this allows to present movie details while keeping the other facets visible. The figure 3.11 shows how a list of movies is presented on a grid cell, giving a visual relevance feedback with a star. Golden stars are the best-rated movies and are probably most interesting for the user to check out. Figure 3.11: An actual list of movies presented on the front of the grid cell. The figure 3.12 illustrates the content presented when a movie is flipped: the movie cover, synopsis and two additional tabs. Figure 3.12: Details of the movie, first tab. It presents the synopsis available to the user to read in order to learn about a particular movie. The second tab, illustrated on figure 3.13, shows the related information of the movie, linking directly to different facets. By clicking on a particular genre, for example, the system will perform a zoom on the facet and construct a new map. It allows back and forth navigation: from big picture to details of one movie, then moving again on another map and zoom in again to a particular movie. 33
  • 35. 3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM Figure 3.13: Details of the movie, second tab. It presents the facet links to various information as year, rating and the genres of the movie. Figure 3.14: Details of the movie, third tab. It presents the facet links to the directors and actors. The last tab, illustrated on figure 3.14, allows the user to view the people: directors who made the movie and actors starring in the movie. Yet again, the system allows to directly zoom one one of those links, constructing a new map. 34
  • 36. Chapter 4 Usability Aspects The main purpose of the work was to build a responsive system for a particular Rich Internet Application, in the area of exploratory search. Of course, such a system only makes sense if users can actually use it. So we did a, admittedly limited and informal, evaluation of its usability. To do so, I asked ten people, acquaintances and friends age 20-30 years old, half of each gender, to participate in a survey about my thesis work. The were explained what exploratory search was in general, without reference to the movie database. Next, they were asked to work with the system for about half an hour, and find movies of their liking. After working with the system they were asked to fill out a questionnaire with 25 questions. The questions are shown below and were about Usefulness, Ease of Use, Ease of Learning, and Satisfaction with the system. The questionnaires were constructed as seven-point Likert rating scales. Users were asked to rate agreement with the statements, raging from strongly disagree to strongly agree [17]. Following are the global averaged results of the questionnaire, per feature: Average results of USE questionnaire Average Usefulness: 5.3/7 Average Ease of Use: 5.6/7 Average Ease of Learning: 6.4/7 Average Satisfaction: 5.9/7 The users were very satisfied with the system and few of them also pointed out that the interface was very beautiful and user-friendly. On the other hand, some of them thought that the interface didn’t gave enough control to them in order to know exactly what happens underneath. For completeness of the section, here are the tables with averaged results: 35
  • 37. CHAPTER 4. USABILITY ASPECTS Average results, Usefulness questionnaire It is useful 6.3/7 It gives me more control over the activities in my life 3.8/7 It makes the things I want to accomplish easier to get done 5.3/7 It meets my needs 5.7/7 It does everything I would expect it to do 5.3/7 Average results, Ease of Use questionnaire It is easy to use 5.8/7 It is user friendly 6.7/7 It requires the fewest steps possible to accomplish what I want to do with it 5.7/7 Using it is effortless 5.2/7 I can use it without written instructions 4.5/7 I don’t notice any inconsistencies as I use it 5.0/7 Both occasional and regular users would like it 6.2/7 I can recover from mistakes quickly and easily 5.8/7 I can use it successfully every time 5.5/7 Average results, Ease of Learning questionnaire I learned to use it quickly 6.5/7 I easily remember how to use it 6.5/7 It is easy to learn to use it 6.2/7 I quickly became skillful with it 6.3/7 Average results, Satisfaction questionnaire I am satisfied with it 6.0/7 I would recommend it to a friend 6.3/7 It is fun to use 6.7/7 It works the way I want it to work 6.2/7 It is wonderful 4.8/7 I feel I need to have it 4.8/7 It is pleasant to use 6.2/7 Those are preliminary results, but the more formal evaluation is beyond the scope of this thesis. 36
  • 38. Chapter 5 Conclusions This thesis described a form of exploratory search where responsiveness was of the essence. The application we called ‘MultiMap’ can be categorized under the heading of so-called Rich Internet Applications, a class of applications that is becoming more and more important as data bases become larger, more specialized, and more distributed. Because of this, users more and more often get into a situation where they know there must be information available to answer their questions, nor are the means to formulate a precise query. The resources they need to answer such a query may be available on remote servers, hence to quickly explore possible answers, the servers much be made responsive enough or else the user will quickly give up. MultiMap was built with such users in mind. Every design decision in this thesis was under the constraint of responsiveness. This led to the following requirements: • The system should be responsive, scalable, and interactive. • The system should support exploratory search. • The system should provide real-time spatial visual feedback reflecting changes in the high-dimensional search space. Exploratory search is the problem to find information that we may not know how to formulate, but which we will recognize once we see it. There are three bottlenecks that could make our system unresponsive: (1) complex calculations, (2) slow zooming, and (3) ineffective visualization . The way we solved these bottlenecks are the following: 1. Every computation that can be done in advance will be done in advance, so that it cannot cause any delay. 2. Zooming and map generation are highly optimized and can be done in real-time. 3. The visualization is presented to the user in a cognitively appropriate way. We believe that such a system should be constructed in a modular fashion and in this thesis we presented a way to do so. This modularity allows, for example, to change the ranking or 37
  • 39. CHAPTER 5. CONCLUSIONS enhance the selection algorithms and be able to evaluate the new algorithm performance based on the existing one. It also allows to build various user-interfaces on top of the search engine and eventually audience-targeted user interfaces. During the research we discussed several different possible front ends, including different 2-dimensional representations enhanced with colors, sounds, font sizes. Also, 3-dimensional interfaces can be built and are very interesting directions to explore. We considered implementing 3-dimensional sphere navigation where the zooming could allow to create 2D map or a new 3D sphere, but we leave that for future work. The third question (about usability) was answered by evaluation and feedback we got from users. The users were very satisfied with both MultiMap and GridMap, but also felt that they had not enough control over the system. They quickly learned how to use the system and how to get movie suggestions. However, here was a need to explain and introduce them to the concept at first, as it is a different approach to information exploration. After having designed and evaluated the system, we believe that the map generation technique presented in this thesis is an important direction to go and an effective way to perform exploratory search. 38
  • 40. Bibliography [1] International movie database, http://www.imdb.com, December 2009. [2] Rfc 2616: Hypertext transfer protocol – http/1.1, http://tools.ietf.org/html/rfc2616, June 1999. [3] Http wikipedia, http : //en.wikipedia.org/wiki/hypertexttransferprotocol, June 2010. [4] G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17:734-749, 6 2005. [5] G. Armitage. Quality of service in ip networks: Foundations for a multi-service internet. Macmillan Technical Publishing, 4 2000. [6] G. Armitage, M. Claypool, and P. Branch. Networking and Online Games: Understand- ing and Engineering Multiplayer Internet Games. John Wiley and Sons Ltd., 2006. [7] R.M. Bell, J. Bennett, Y. Koren, and C. Volinsky. The million dollars programming prize. IEEE Spectrum, 5 2009. [8] S. Caltagirone, M. Keys, B. Schlief, and M. J. Willshire. Architecture for a massively multiplayer online role playing game engine. Journal of Computing Sciences in Colleges, Volume 18, Issue 2, 12 2002. [9] Piero Fraternali, Gustavo Rossi, and Fernando S andnchez Figueroa. Rich internet ap- plications. Internet Computing, IEEE, 14(3):9 –12, may-june 2010. [10] J. Gregory. Game Engine Architecture. A K Peters, 2009. [11] M. A. Hearst. Next generation web search: Setting our sites. IEEE Data Engineering Buletin 23, 3, 38-48, 3 2000. [12] M. A. Hearst. Design recommendations for hierarchical faceted search interfaces. SIGIR, Workshop on Faceted Search, pages 2630, August 2006. pages 2630, August 2006, 2006. [13] J. Heer and G. Robertson. Animated transitions in statistical data graphics. IEEE Transactions on Visualization and Computer Graphics, 6 2007. [14] J. F. Kurose and K. W. Ross. Computer Networking A Top-Down Approach. Pearson Education Inc., 2008. 39
  • 41. BIBLIOGRAPHY BIBLIOGRAPHY [15] X. Lin. Map displays for information retrieval. Journal of the Americal Society for Information Science, 1 1997. [16] X. Lin, D. Soergel, and G. Marchionini. A self-organizing semantic map for informa- tion retrieval. Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval. 262 - 269, 1991. [17] A.M. Lund. Measuring usability with the use questionnaire. STC Usability SIG Newslet- ter, 8:2, 8 2001. [18] J. Makar. ActionScript for Multiplayer Games and Virtual Worlds. New Riders, 2010. [19] G. Marchionini. Exploratory search: From finding to understanding. Communications of the ACM 49, 4 2006. [20] J. ORegan, R. Rensink, and J. Clark. To see or not to see: The need for attention to perceive changes in scenes. Psychological Science, 8 1997. [21] G. M. Sacco and Y. Tzitzikas. Dynamic Taxonomies and Faceted Search: Theory, Prac- tice, and Experience. Springer Science and Business Media Inc., 2009. [22] J. Smed and H. Hakonen. Algorithms and Networking for Computer Games. John Wiley and Sons Ltd, 2006. [23] M. Steyvers. Multidimensional Scaling. In: Encyclopedia of Cognitive Science. Macmillan Reference Ltd., 2002. [24] D. Svanaes. Understanding Interactivity: Steps to a Phenomenology of Human-Computer Interaction. PhD Thesis. NTNU, Trondheim, Norway, 2000. [25] A. G Taylor. Introduction to Cataloging and Classification. 8th ed. Englewood, Colorado. Libraries Unlimited, 1992. [26] B.C Vickery. Faceted classification: a guide to construction and use of special schemes. London: Aslib, 1960. [27] R.W. White, B. Kules, S.M. Drucker, and M.C. Schraefel. Supporting exploratory search. Communications of the ACM, 49, 4 2006. 40
  • 42. Appendix A Protocol Generation DSL Since I had to do all the programming for the research project myself, the workload was quite demanding. In order to avoid writing individual implementations for each networking method or protocol, the protocol generation mechanism has been implemented. To explain how it works, consider the following C# code: Listing A.1: A partial definition of the MultiMap protocol [ Protocol ] public interface IMultiMapProtocol { // Gets a l l aspects in the system [ ProtocolOperation (100 , Direction . Pull , CompressionTarget . Outgoing ) ] Aspect [ ] GetAllAspects ( ) ; // Zooms to a p a r t i c u l a r s e l e c t i o n [ ProtocolOperation (106 , Direction . Pull , CompressionTarget . Incoming ) ] void Zoom( Aspect Aspect , List<int> Facets ) ; // Gets some a d d i t i o n a l information of a movie [ ProtocolOperation (112 , Direction . Pull , CompressionTarget . Outgoing , AccessLevel=AccessLevel . Root ) ] MovieDetails GetMovieDetails ( int Oid ) ; ( . . . ) } Figure A.1 illustrates the code one needs to write in order to define a communication protocol. Such approach can be also considered as a domain-specific language (DSL). Once the protocol definition is written, the server analyses the protocol definition and generates the code to make all the communication possible. It generates an assembly for its own and a flash component library (.swc) for flash application, thus, making possible to simply call any method and abstracting the complexity from the developer. Our research greatly benefit from this DSL, as several thousands of lines of code could be generated eliminating potential errors and boosting productivity. 41
  • 43. APPENDIX A. PROTOCOL GENERATION DSL Using the protocol definition it is also possible to define the compression direction (None, Incoming, Outgoing or Both), which will generate the subsequent function calls during the packet compilation/read. It is also possible to define the security level per operation, using AccessLevel parameter (shown in figure A.1). 42