Master Thesis: The Design of a Rich Internet Application for Exploratory Search by Real-Time Generation of Similarity Maps

Master Thesis
The Design of a Rich Internet Application for
Exploratory Search by Real-Time Generation of
Similarity Maps
Roman Atachiants
Master of Science Thesis DKE 10-5
Thesis submitted in partial fulfillment of the requirements for
the degree of Master of Science of Master of Science in Artificial
Intelligence at the Department of Knowledge Engineering of the
Maastricht University
Exam committee:
Dr. Eduard Hoenkamp (supervisor)
Dr. Ronald Westra
Maastricht University
Faculty of Humanities and Sciences
Department of Knowledge Engineering
Master of Science in Artificial Intelligence
June 28, 2010

Abstract
Users who cannot formulate a precise query but know there must be a good answer somewhere,
often rely on exploratory search. This requires an interactive and responsive system, or else
the user will soon give up. As data bases are becoming larger, more specialized, and more
distributed this calls for a Rich Internet Application, fast enough to keep pace with the users
explorations. This thesis studies and implements a system, called MultiMap, which computes
similarity maps in real-time. This entailed: (1) precomputing every data structure that does
not change after the initial query, (2) optimizing algorithms for zooming and map generation
(3) and providing a cognitively appropriate visualization of high dimensional space. Applied
to a very large movie database, it resulted in a highly responsive, satisfying, usable system.
1

Acknowledgments
A lot of people helped me in diﬀerent ways all along the research project and brought diﬀerent
insights and opinions. I want to thank my fellow students, professors, friends and family who
helped, tested the prototype and supported/endured me during the research.
In particular, I would like to thank Dr. Eduard Hoenkamp for his support and supervision
of the project. Our regular meetings, discussions, brainstorming helped me a lot from the
very beginning and theoretical part of the research, down to the implementation, engineering
and design. But aside of professional relationship, I enjoyed his company the most and our
discussions about various domains, including: education, technology, politics, travel,... are
really memorable to me.
Next, I would like to thank a fellow A.I. student, Tom Marechal. He was an invaluable
asset and friend, as he provided me with inspiration and ideas all along the research project.
Additional, I would like to thank Dr. Johannes C. Scholtes and Dr. Ronald Westra for
their support, evaluation and critical thinking. Not only they, during the classes, largely
inspired me for this project but also gave various invaluable insights that contributed to
making this thesis better.
I would also like to thank also everyone who participated in the testing and evaluation of
the system, without their time and feedback the project would not be what it is today.
2

Contents
1 Introduction 4
1.1 Exploratory Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Faceted Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Interactivity & Responsiveness . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 The Concept 12
2.1 The Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 The Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 The System 15
3.1 Architectural Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Mathematical Concepts & Algorithms . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.2 Preprocessing & Correlations . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.3 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.4 Facets Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.5 Movies Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.6 Creation of Aspect Maps . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Server Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 The Client Front-End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.2 GridMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Usability Aspects 35
5 Conclusions 37
A Protocol Generation DSL 41
3

Chapter 1
Introduction
Search and data visualization are becoming more and more important as we are entering the
Petabyte Age. Traditional approaches of searching large datasets are query-based ones, which
by itself implies knowing what the user (researcher) is looking for. However, this approach
of searching the information is difficult when one is not familiar with the domain or lacks
the knowledge or contextual awareness in order to formulate precise queries to navigate the
information space. For example, how do we find something we would like to know more about,
but without having the specific knowledge to formulate a precise question? How would we
find a movie we might enjoy if we never saw Robert DeNiro or Charlie Chaplin? Or knowing
that we enjoy Quentin Tarantino’s movies, how would we discover other, relatively similar
movies? In order to find those movies, we perform a search process called exploratory search.
Exploratory search is a specialization of information retrieval which represents the activities
carried out by searchers who are:
• unfamiliar with the domain of their goal (i.e. need to learn about the topic in order to
understand how to achieve their goal)
• or unsure about the ways to achieve their goals (either the technology or the process)
• or even unsure about their goals in the first place.
In this research, we try to address this exploratory search problem [27] by introducing a
novel interactive search system. This system is called MultiMap and relies on similarity
measurements in order to present the latent information relations to the user in a geographic
manner. The system have been developed and tested using the Netflix dataset [7], containing
about 125.000 movies. A custom selection were performed on the dataset:
• The genres were filtered to 28 IMDB genres.
• The directors were filtered to those with at least 5 movies made (in total around 2500
directors).
• The actors were filtered to those with at least 10 movies where an actor has participated
(in total around 6000 actors).
• The movies were filtered to those containing all needed information and made by the
preselected directors and actors. The final database contained around 16000 movies.
4

1.1. EXPLORATORY SEARCH CHAPTER 1. INTRODUCTION
1.1 Exploratory Search
During the first phase of research we considered the exploratory search problem [11] [19],
trying to answer the following questions:
1. How to help the user who is unfamiliar with the domain (i.e.: a user who saw only a
few movies and/or doesn’t know many directors, actors)?
2. How to help the user who doesn’t know how to find a particular movie?
3. How to help the user who doesn’t know what kind of movies he likes?
Figure 1.1: This figure represents an abstracted backwards reasoning that has been applied,
in order to answer to exploratory search questions. On the figure: green represents the
interesting directions; red represents an unwanted direction; blue represents intermediate
steps.
Figure 1.1 shows a result of a backwards reasoning we performed in order to try to reason
about those 3 questions. The goal of the research was to find a system that can answer those
questions without much guessing, mostly because we want the user to explore and learn about
5

1.1. EXPLORATORY SEARCH CHAPTER 1. INTRODUCTION
the domain. From this analysis phase we derived several things that needed to be achieved
by the system:
• An extracted meaning of the data is required, the system should know about the domain.
In our particular case, the cinematographic domain.
• A way to preserve relations in order to help the user to relate different items.
• A way to drill down to individual movies and examine them is needed in order to allow
the user to navigate.
• Relevance feedback is needed in order to show the user how interesting a particular item
is and how relevant it is for his search. The idea behind relevance feedback is to take
the results that are initially returned from a given query and to use information about
whether or not those results are relevant to perform a new query.
The exploration in exploratory search means that a user have to be able to explore different
directions and, in a manner, swim in the data. The exploration factor is something very
implicit and therefore difficult to evaluate. In contrast to standard search engines, where the
user composes a query and the engine returns the closest documents to that query (document),
we do not want to select the closest points always in our system and restrict the user to the
search results that are the most relevant ones. By doing so, we allow the user to explore
different directions in this multi-dimensional space.
6

1.2. FACETED CLASSIFICATION CHAPTER 1. INTRODUCTION
1.2 Faceted Classification
One of the approaches in the exploratory search research domain that has been proven useful
and used in many different visualization systems is called faceted classification [26] [12]. This
approach is very common and widely used all across the World Wide Web, especially on
commercial web sites (Amazon, Ebay). Figure 1.2 illustrates the search box of the website
Amazon.com, where the fields Author, Title, ISBN, Publisher, Subject, Condition, etc. are
the facet categories. Faceted classification system allows assigning a different classifications
to a particular object, often, the object we want to search for, which is in our case: a movie.
Using multiple classifications enables to reorder the data in multiple of different ways and
define a search criteria.
Figure 1.2: The advanced search box on the Amazon.com website, the additional fields are
different aspects of a book.
A facet comprises “clearly defined, mutually exclusive, and collectively exhaustive aspects,
properties or characteristics of a class or specific subject” [25]. In this thesis, we use the word
“Aspect” to distinguish a facet category, and word “Facet” for a particular facet, for example:
Aspect : Actors;
Facets : Robert DeNiro, Johnny Depp, Bruce Willis...
The Netflix contest dataset contained 17700 different movie titles and served as a basis for
the data in this research. Considering the need of extracting different facets for each of those
7

1.2. FACETED CLASSIFICATION CHAPTER 1. INTRODUCTION
movies, a special tool has been written to extract additional information from the Internet
Movie DataBase (IMDB) [1] website and Netﬂix Database via their exposed APIs. This tool
was able to extract about 95% of the information for those movies. In particular, we were
interested in:
• Genres of the movies (Fantasy, Science-Fiction, Crime, Drama...)
• Year of release
• IMDB ratings, which is a precise rating from 1 to 10, rounded to 1st decimal
• Directors of the movies (Steven Spielberg, Quentin Tarantino...)
• Actors of the movies (Robert DeNiro, Johnny Depp, Bruce Willis...)
Additionally, there were also some other data about the movies (writers, movie plots, ...),
but not as abundant as the ﬁve aspects presented above. Therefore, we decided to base the
system on above aspects alone.
8

1.3. INTERACTIVITY & RESPONSIVENESS CHAPTER 1. INTRODUCTION
1.3 Interactivity & Responsiveness
Exploratory search is a process performed by a human who is using a tool (computer) to
interact with large quantities of information in order to explore and find the relevant pieces
of information. This human-computer part means by definition that the actual process is
an interactive process, therefore the interactivity is a very important aspect in exploratory
search.
One way to approach interactivity is to start with the notion of “look and feel”. The term has
become more or less synonymous with how the term style is used in other design disciplines.
In a concrete sense, the “look” of a GUI is its visual appearance, while the “feel” denotes
its interactive aspects [24]. One of the consequences is that the interface should be very
responsive and fast. One must also consider the fact that search systems need to handle large
amounts of data and need a lot of computing power. One logical conclusion is that in order to
build a good exploratory search system, the data manipulation should be handled by powerful
machines to be fast. During our research, we opted to a client-server approach to enhance
the interactivity without losing the computing power we need to perform all operations in
real-time, keeping the system well responsive and interactive. By having all operations in
real-time, we run into the problem of massive networking communication.
The communication in this case is a two-way dialog between the client and the server. We
need the communication to be duplex, where the server and the client have the ability to
initiate the dialog, because the current world wide web is becoming real-time (huge services
as Twitter and Facebook are good examples). As the information flow is updated in real-time,
most of the services are still using the traditional HTTP protocol-based technologies.
The Hypertext Transfer Protocol (HTTP) is an Application Layer protocol for distributed,
collaborative, hypermedia information systems (RFC specifications can be found: [2]). HTTP
is a request-response protocol standard for client-server computing. In HTTP, a web browser,
for example, acts as a client, while an application running on a computer hosting the web
site acts as a server. The client submits HTTP requests to the responding server by sending
messages to it. The server, which stores content (or resources) such as HTML files and images,
or generates such content on the fly, sends messages back to the client in response. These
returned messages may contain the content requested by the client or may contain other kinds
of response indications [3].
The problem with using HTTP for interactive and real-time web is a fundamental one, as
world wide web evolved, different architectures and new frameworks (SaaS, SOAP, AJAX ...)
were built on the top of HTTP protocol, but fundamentally, the real-time communication
is mainly done using the polling technique (see figure 1.3). The polling is a workaround,
basically it is a client, asking the server for update on a very short interval, constantly. There
are several problems with this approach:
1. The client’s and server’s CPU resources are used all the time for mostly useless update
checking. This, on mobile devices, potentially drains the battery life.
2. The networking bandwidth is used constantly, and as the networking throughput of the
server is limited, this becomes a bottleneck very quickly.
In order to find how to design a system responsive enough for such communication, consider
the requirements:
9

Figure 1.3: This figure shows the communication principles for real-time updates of the polling
architecture and a publisher/subscriber architecture.
1. A client-server approach, since the amount of data is important and the computations
can be very expensive.
2. Reliable networking is necessary (as we are not considering a streaming application and
need a reliable two-way communication), therefore the choice for the transport layer is
TCP [14].
3. A format for message parsing in order to encode/decode complex messages while having
the minimum impact on the performance
Since those requirements are quite similar to the requirements for multi-player client/server
on-line games, we considered that the best place for finding the technological answer for an
interactive search system would be the gaming literature [10] [18] [22]. The games are by
definition interactive applications, and on-line games are usually intensively optimized for
the latency and throughput. Due to the fact that the interactivity requires a lot of duplex
communication, the best option is a socket-server [18], and a custom protocol for low-level
message encoding.
Following those considerations, an interactive exploratory search system can be designed as
a multiuser on-line game engine. The architecture should fulfill six goals: minimize network
traffic, provide opportunities for load balancing, provide a secure game playing environment,
10

provide a high level of scalability and maintainability, and maximize client side performance
for real-time graphics [8].
The architecture for the system is layered and component-based:
• The Network Component that contains the Packet Serializer (Messenger), De/Encrypt,
De/Compress and Network modules. The Messenger module is in charge of forming
and sending messages in a given format.
• The User Component that contains both the Authenticator and the User Database
modules.
• The Search Component that is used and designed specifically for the exploratory search
purposes with a custom protocol. For the system designed for this thesis, the search
component is described more in detail in the section 3.2.
As mentioned earlier, the latency is a crucial point for highly interactive applications. Latency
refers to the time it takes for a packet of data to be transported from its source to its
destination. In many networking texts, you will also see the term Round Trip Time (RTT)
in reference to the latency of a round trip from source to destination and then back to source
again. In many cases the RTT is twice the latency, but some network paths exhibit asymmetric
latencies, with higher latencies in one direction than the other [6]. There are different ways
to deal with latency, but simply put: we need more control over the sent/received packets
and minimize their size and being able to prioritize and parallelize different actions [5].
11

Chapter 2
The Concept
2.1 The Idea
In the chapter 1 we considered the implications of exploratory search problem and its basic
components as faceted classiﬁcation and interactivity. This thesis introduces a novel ex-
ploratory search interface, called MultiMap which relies on similarity measurements in order
to present the information to the user. In earlier 1990s it was demonstrated that spacial map-
ping techniques can be generated to visualize contents and semantic relationships of a docu-
ment space [15], yet, there are still not many systems that actually use mapping techniques.
The idea behind a system comes from a simple map, where the information is presented in a
geographic manner: two towns that are close on a map mean the closer transition from one
to another. Using a map, it is possible to navigate and explore huge amount of information
by zooming/unzooming and exploring the dataset both locally and globally.
Figure 2.1: A world map with countries divisions.
If we can do it for our planet earth using mapping software (Google Maps or Bing Maps are
the examples of such software), why couldn’t we explore diﬀerent datasets in the same way?
12

2.1. THE IDEA CHAPTER 2. THE CONCEPT
What if we could zoom on both New York and Tokyo and generate a new world map, having
Washington, New York, Tokyo, Kyoto and Paris in between (use figure 2.1 in order to help
imagining)? It can be rather messy to view them in this way, that’s why we also need to
introduce the context: Washington and New York are in United States of America, Tokyo and
Kyoto are in Japan and Paris is in France. The countries are a clear separation between the
cities and helps us to understand better the cities. Now replace the towns by the Movies, the
countries by Genres/Actors/Directors and this gives a basic understanding of how MultiMap
works.
MultiMap is based on this idea of zooming and on-the-fly generation of new maps. Formally
it involves choosing new coordinate system. MultiMap features also the ability to unzoom to
see again the whole picture and switch the maps if needed (again, think Google Maps). In
order to understand better how MultiMap works, let’s go back into the movie context and
think of different aspects, facets and movies:
• An aspect “Genres” contains facets “Action”, “Adventure”, etc.
• The facets “Action”, “Adventure” can relate to movies like “Indiana Jones” etc.
• The movie “Indiana Jones” contains the actor “Harrison Ford” (which is also a facet of
aspect “Actors”)
One can notice that this is a closed loop, it is possible to look at different genres, then look at
a particular movie, then switch to actors and go on and explore the information this way. If
we imagine for a second that we can create a map of an aspect, where the points (“countries”)
would be the facets, we probably should be able to place also the movies (”towns”) on that
map. In order to create such maps, we need several components:
• A function to compare two facets of an aspect, a distance measurement. For example,
this way we would be able to compare the similarity between the Adventure genre and
the Action genre or between Tom Hanks and Harrison Ford.
• A way to create a map very quickly as new map should be generated when the user
zooms on some movie.
• A way to measure relevancy of the movies and facets. Considering our example above,
what towns we would choose to present on a new map if we zoomed on New York and
Tokyo? Paris, London, Rome?
Further in this document, chapter 3 explains how the whole system is done, and in particular,
the section 3.2 explains all concepts and algorithms that were developed in order to produce
a working prototype of MultiMap.
13

2.2. THE PROTOTYPE CHAPTER 2. THE CONCEPT
2.2 The Prototype
The MultiMap concept can be divided on two main parts:
• The system that performs all mathematical computations, handles the data and oper-
ations on the data.
• The front-end that is presented to the user, after all, there are many different ways to
present a map. Figure 2.2 shows the front-end that we designed as our first approach
to create a visualization for MultiMap system.
Figure 2.2: A screen-shot of the prototype, presenting a grid map on the directors aspect.
The front-end visualization for the MultiMap we designed is called GridMap, and is one of
the approaches to visualize those maps. This approach relies on very ordered presentation
of the maps . In fact, it tries to map a cloud of 2D points to a grid while trying preserve
the spacial relations. The interface allows users to switch the aspect maps, zoom on different
facets and by flipping a grid cell, viewing a details of a particular movie and follow its links to
construct new maps. Section 3.4 explains more in detail the actual interface and its different
components.
14

Chapter 3
The System
3.1 Architectural Overview
The system was designed to be a client-server application with several tiers, in this section
we will describe its design. The main idea is based on the interactivity between the user and
the data, and the ease-of-use. First of all, the system should meet several prerequisites:
• it should be interactive, so it has a real-time constraint;
• it should be able to handle large datasets;
• it should be easy to use and available to remote users.
Figure 3.1: The layered architecture of MultiMap system.
Following those prerequisites, the logical conclusion is to build a real-time Rich Internet Appli-
cation (RIA) [9]. Such applications are mainly standard n-tier based applications. MultiMap
architecture is a 3-tier real-time architecture, allowing to the front-end client to have full
15

3.1. ARCHITECTURAL OVERVIEW CHAPTER 3. THE SYSTEM
interactivity with the data. The main idea behind such a system is to have a clear separa-
tion between the client, the logic and the data itself, as illustrated in Fig.3.2. The actual
architecture, as described in Fig. 3.1, consists of :
• a front-end client in ﬂash, allowing interactive data visualization;
• a custom C# real-time server, written by myself in order to handle large amounts of
data interactively;
• a logic layer running the Matlab engine for all data-intensive search, correlations and
other operations.
Figure 3.2: Visual overview of a Three-tiered application. Illustration from Wikipedia.
16

3.2. MATHEMATICAL CONCEPTS & ALGORITHMS CHAPTER 3. THE SYSTEM
3.2 Mathematical Concepts & Algorithms
3.2.1 Overview
Figure 3.3: The representation of the data-flow, representing how the data is processed on
the fly (in an interactive mode).
The main purpose of the research is the interactivity of the system. This imposes a real-time
constraint and makes things very difficult to engineer, especially when the computation time
can take very much time. Based on this, we needed a system, that can handle this data-flow
rapidly, and update quickly respond to user queries. Figure 3.3 shows the simplified sequence
17

diagram of the system, when the information need to be updated and presented. Next few
section explain the details of this schema, block by block.
The system uses a content-based recommendation method. In content-based recommendation
methods, the utility u(c, s) of item s for user c is estimated based on the utilities u(c, si)
assigned by user c to items si ∈ S that are similar to item s. For example, in a movie
recommendation application, in order to recommend movies to user c, the content-based
recommender system tries to understand the commonalities among the movies user c has
rated highly in the past (specific actors, directors, genres, subject matter, etc.). Then, only
the movies that have a high degree of similarity to whatever users preferences are would get
recommended [4].
Overall, the flow consists of several main points:
• The preprocessing step performs the transformation and precomputes the maximum of
information that can be precomputed. It considers all aspects and for each facet in each
aspect computes a closest network (explained in the section 3.2.2).
• The session initialization step initializes the user session and copies some of the prepro-
cessed data in a so-called Ranking Matrix.
• The update step performs the update of the Ranking Matrix (see 3.2.3 for more infor-
mation). By doing so, a new ranking matrix is created, basically updating the ranks/rel-
evancy ratings based on the selection.
• The facets selection step chooses several facets, based on the Ranking Matrix. To do
so, it combines 2 techniques: takes a subset of most relevant facets from the matrix,
then performs a k-means clustering to be able to pick most ”global” facets. This step
is explained more in detail in section 3.2.4.
• The movies selection step selects the most relevant movies for each facet that have been
chosen. This step is explained more in detail in section 3.2.5.
• The creation of aspect maps performs the multidimensional scaling [23] and a custom
grid-map algorithms, in order to create 2-dimensional grid, where the latent relations
between different facets are retained. This approach is explained in section 3.2.6. This
step can be potentially replaced by any other representation, including 3-dimensional
ones.
18

3.2.2 Preprocessing & Correlations
Overview
The system handles a lot of data and reorders it continually on each request of the user. In
order to allow the system to perform in the real-time, as much data as can be done should
be precomputed. Several things that needs to be done:
• For each aspect, the facets should be correlated in order to allow the comparison between
2 points. This is done differently for each aspect, depending on the data. It allows, for
example, to correlate an Adventure genre and Science-Fiction genre.
• For each aspect, the facet network is computed. This network allows us to propagate a
ranking and reorder the facets in real-time. See the section 3.2.2 for more details.
• For each facet of each aspect, a list of most relevant movies is constructed and ordered.
This is done to allow to pick the movies in real-time. This step is explained in more
detail in the section 3.2.2.
In the precomputation phase, one of the most important result is to be able to construct so-
called ”Aspect Spaces”. Aspect Spaces are N-Dimensional dissimilarity matrices. The Aspect
Spaces are computed based on a particular distance metric δ(i, j) := distance between i th
and j th features of an aspect. In order to simplify the implementation, we define:
• Input matrix I is an initial data we need in order to compute similarities between aspect
samples. They are presented in N dimensional space, where N is the number of movies,
about 16000.
• Per aspect, a function δ which can be different for every aspect and computes the
membership of the aspect to a particular movie.
Next few sections are explaining the definitions and the steps which are performed in order
to create each aspect space.
Genres Space
In order to create the genres space, the genres are correlated using simply the complete movies
distribution. The input matrix I for the genres space is defined as following:
Ii,j =



δ(Genre1, Movie1) · · · δ(Genre1, Moviej)
...
...
...
δ(Genrei, Movie1) · · · δ(Genrei, Moviej)



The membership function δ :
δ(Genrei, Moviej) =
1 if movie contains the genre
0 otherwise
19

Finally, we define a distance function, which is a general cosine distance:
∆(Genrei, Genrej) =
Ii ∗ Ij
Ii Ij
In order to test how good the correlation is, one can use the aspect space as the input for
the multidimensional scaling function. This helps to visualize the correlations and see if the
desired meaning is preserved. Figure 3.4 show the 2 dimensional genres space, we will call
such maps “Aspect Maps”. One can see that the correlation makes sense, for example: the
Adventure genre is close to Fantasy and Science-Fiction.
Figure 3.4: This figure shows the distances between genres in 2 dimensional space after
performing a multidimensional scaling on the genres space.
Ratings Space
Ratings space can be used in different ways, and depending on the choice of usage, the
correlation can be adapted:
• ratings can be used as an additional dimension, shown using a color or a font size while
showing a movie;
• ratings can be shown in order of euclidean distance;
20

• ratings can be used to create a complete ratings aspect space, but this requires more
complex correlation function.
In the research, we decided to use the second approach, simply calculating the euclidean
pairwise distance for each rating.
Years, Directors and Actors Spaces
There are several ways to correlate the years, directors and actors. In our research, we
wanted to explore the possibility to correlate those facets based on their genres distribution.
This approach would allow the user, for example, to see what kind of movies were done in
a particular year and what are similar years, in terms of genres distribution. To do so, we
proceed as follows:
Ai,j =



δ1(Y ear1, Movie1) · · · δ1(Y ear1, Moviej)
...
...
...
δ1(Y eari, Movie1) · · · δ1(Y eari, Moviej)



The membership function δ1 :
δ1(Y eari, Moviej) =
1 if movie released that year
0 otherwise
Next, we reuse the input matrix I from the genres space. This is deﬁned as follows:
Bi,j =



δ2(Genre1, Movie1) · · · δ2(Genre1, Moviej)
...
...
...
δ2(Genrei, Movie1) · · · δ2(Genrei, Moviej)



The membership function δ2 :
δ2(Genrei, Moviej) =
1 if movie contains the genre
0 otherwise
Next, we need to compute the matrix I, which tells us in how many movies of diﬀerent
genres the actor has participated in. This is computed by a matrix multiplication of A and
B transposed:
Ii,j =



δ(Y ear1, Genre1) · · · δ(Y ear1, Genrej)
...
...
...
δ(Y eari, Genre1) · · · δ(Y eari, Genrej)


 = A × BT
Finally, by computing the pairwise cosine distance for the matrix I, we are able to correlate
the years, based on their genres distribution. The same procedure is applied in order to
correlate the directors and actors. Figure 3.5 shows the aspect map created for the directors,
as we did with the genres, the results seem to make sense: Quentin Tarantino is quite close
to Martin Scorcesse (they do very similar kind of crime movies) and at the same time quite
far away from George Lucas, the creator of Star Wars saga.
21

Figure 3.5: This ﬁgure shows the distances between directors in 2 dimensional space after
performing a multidimensional scaling on the directors space, similar to ﬁgure 3.4
Facet Network
In order to perform the zooming and allow the system to be interactive, one needs a way
to select and sort the facets rapidly. In MultiMap, this is done by precomputing a facet
network (Fig. 3.6), and setting a particular rank value to each node in this kind of network.
Generally speaking, we need to compute the matrix R with facets on the rows and two (or
more) “pointers” to the closest points. The desired matrix R:
Ri,3 =



Facet1 1st closest facet 2nd closest facet
...
...
...
Faceti 1st closest facet 2nd closest facet



The closest points computation is done using the previous inter-facet correlations. This
step can be very time-consuming, as it has the complexity of O(n2). This would interrupt
a smooth interaction with the user, and therefore would be prohibitive. Fortunately this
matrix can be precomputed even before the interaction starts. In general, anything that can
be precomputed, should be precomputed to make the system responsive.
22

Figure 3.6: A subset of the precomputed facet network for Genres aspect. In MultiMap,
everything that can precomputed will be precomputed, which is conducive to a smooth and
responsive interaction.
Movie Ordering
Last step is movie ordering. This step is very straightforward, as it is the rearranging of the
movies-facet relations in the following form:
Fi,2 =



Facet1 Movie vector, ordered by relevancy
...
...
Faceti Movie vector, ordered by relevancy



For the sake of simplicity, we use an IMDb rating as a relevancy measure. This rating is
a number from 0 to 10 with one decimal and based on the huge statistics from the IMDb
website visitors. The following example of the movie ordering for genres space illustrates this:
Fi,2 =





Adventure
The Judy Garland Show The Secret of Monkey Island · · ·
9, 8 9, 6 · · ·
...
...





23

3.2.3 Ranking
We would like to give users the ability to zoom in on individual facets or movies based on
their selection. This can be accomplished, by ranking each point and re-ranking them with
every zoom. For this we need a facet network (graph), ideally with a 100% coverage of the
facets and tightly interconnected. Such a network is constructed in the preprocessing step (see
section 3.2.2) in the form of graph where a node (a facet) is connected to 2 closest neighbors.
For example Science-Fiction genre would be connected to Adventure genre and Action genre,
as illustrated in figure 3.6.
Based on such network, a zooming can be effectively done as a recursive algorithm, with
several parameters:
• Vector B, is a weight vector for the closest points. For example, a vector where first
closest gets full weight, second closest gets half of the weight would be:
B = (1, 0.5)
• Depth-decay function for each node at depth d
λ(d + 1, ρ, b) = ρ + (γ/d) ∗ b
Where:
– d is the actual depth
– ρ is the actual rank of the node
– γ is the decay factor (a constant)
– b is the weight of the point from weight vector
The depth-decay function here presented is a linear function, but depending on the context
and needs, can be adapted or changed. The depth-decay function calculates the current
ranking ρ, which updates the network. The ranking is computed recursively for each neighbor,
then the network is sorted by the rank and first x nodes are shown to the user.
Additionally, zooming out can be done in several different ways: the simplest (and most
computationally efficient one), is to keep track of all changes to the ranking value ρ on each
step. This approach would use some memory, but there’s no need to recalculate everything.
Another approach would be to recursively recalculate ρ values backwards, but effectively
using CPU to do the calculation. The depth-decay function should also be updated in order
to support such feature.
24

3.2.4 Facets Selection
At this point in the data-flow we have a Ranking Matrix and a simple solution would consist
of performing a selection and simply selecting few first ranked facets. Such an approach is
just fine for standard search engines, for example Google, Lemur... In MultiMap, this is
performed using a selection algorithm but why do we actually need one? In order to answer
this question, let’s consider following:
• standard search engines use a query in order search the data, therefore the most relevant
documents are the ones what are the closest to the query in this multi-dimensional
document space;
• in exploratory search we need an exploration factor, allowing the users to explore dif-
ferent possibilities. With this, we don’t particularly want to restrict the results to only
closely-related and most relevant points, but also to other points, related to the topic
(at some extent).
The selection algorithm allows us to pick a number of rows from an Input Matrix I. Recall that
Input Matrix I is a step just before pair-wise distance comparison, so basically it’s a ready-to-
compare matrix, where getting a distance between 2 points actually means something. The
idea behind the algorithm is quite simple: it selects a subset of relevant facets, which is bigger
than the amount of facets that need to be shown to the user; it tries to find k clusters within
the subset and then takes the closest points to each cluster centroid. The selection algorithm
works in a rather straightforward way:
• first, a selection of top ranked facets is performed. In the prototype we take twice the
number of facets that we actually want to present to the user (i.e.: if we need to show
a grid of 2 by 2 points, we take the 8 most relevant facets from the ranking matrix);
• next, the algorithm computes k-means clustering, with k clusters. Where k clusters
would be the number of points to show to the user, for example 4 actors would mean
k=4
• once k clusters are found, each point has an assigned index of a cluster and we also have
k centroids for each cluster. The selection continues by taking 1 closest point to each
centroid, therefore taking the most average point in the particular cluster.
• finally, it returns the selected facets.
25

3.2.5 Movies Selection
The next step in the data-flow is the actual selection of the movies. By now the system is going
to present the facets it selected (the most relevant facets to the current zoom sequence). The
movies presented on the map can be selected simply by taking several first movies, based on
some rating function. We take the IMDB average rating as the value used to sort the movies
within each facet. This was already done in the preprocessing phase (see section 3.2.2), and
the selection resumes by taking the first few movies from the facet.
For example, in the following matrix one can see that if Adventure is a selected facet, the
movies ”The Judy Garland Show” and ”The Secret of Monkey Island” will be selected as
they have the highest IMDB rating within the facet.
Fi,2 =





Adventure
The Judy Garland Show The Secret of Monkey Island · · ·
9, 8 9, 6 · · ·
...
...





Now the selection of facets and movies are done, we can actually proceed to the creation of
the Aspect Maps.
26

3.2.6 Creation of Aspect Maps
The final step in the data-flow is the creation of the so-called Aspect Maps, a spatial rep-
resentation of the selected facets. The maps allow the user to compare different facets and
subsequently the related movies between themselves. We use maps to help the user envisage
the locations of movies and facets in high dimensional space. Since it would be too difficult to
visualize, this high dimensional space is reduced to two or three dimensions. For this of course
we need a dimension reduction that is faithful to the distances in the original space. From
the many techniques that are available (dimensionality reduction, ordination...) we selected
multidimensional scaling (MDS).
Figure 3.7: The transition from the facet selections to the aspect map.
Multidimensional scaling is a special case of ordination. An MDS algorithm starts with a
matrix of item-item similarities, then assigns a location to each item in N-dimensional space,
where N is specified a priori. In our case, we want to reduce the matrix to 2 or 3 dimensions,
to be able to visualize the result on a screen.
The figure 3.7 shows the process of creating the aspect map in this step, it is quite straight-
forward and all the data structures by now are ready to be consumed directly by an MDS
algorithm. Figure 3.5 is actually a result of the MDS on a subset of the directors aspect and
illustrates the output in this step.
Sometimes people have suggested to use Self-Organizing Maps (SOM, [16]) to generate a lower
dimensional representation. What we found that for this particular case SOM is prohibitively
inefficient.
By the end of this step, we have a collection of points in low dimensional space. Those
points can be presented to the user in a number of different ways. Our approach is called the
GridMap visualization and it is explained in the section 3.4.2.
27

3.3. SERVER TECHNOLOGY CHAPTER 3. THE SYSTEM
3.3 Server Technology
From the beginning of the research, we wanted the system to be highly interactive and re-
sponsive. In order achieve this we need a scalable system with high performance. For this,
we determined the following requirements:
• the data has to be sent very efficiently, potentially about 5-10 Kilobyte of text data on
each user request;
• the ability to notify user of events happening on the server;
• real-time communication for the interaction, for instance, when user clicks on something,
the system have to process the request in less than a second (or else, people simply won’t
use it).
Given the above requirements, the system should be based on an event-driven architecture
(EDA) with compression and security. For completeness, here is the list of most distinctive
features of the server (some readers may find it a bit technical):
• Monolithic server, running on one machine, but potentially scalable to a cluster of
machines.
• Manages the thread pool and distributes the work to each thread. It would try to match
the number of threads to cores (i.e.: 4 threads on a Quad Core machine) and distribute
smaller tasks to those threads.
• Big tasks are represented in a form of software timers, which are sliced in order to
achieve scalability.
• The server manages a socket pool, listening to several endpoints. Works with IPv4 and
IPv6 as well.
• Written in C#, the server is compatible with 32 and 64 bit platforms. It is also CLI-
compliant and works on cross-platform frameworks like Mono (works on Unix, Linux...).
• Handles client-socket lifetime, in order to achieve stability and error-tolerance.
• Integrates Matlab interoperability layer, allowing the C# to communicate with Matlab
and then send the results to the Flash client via network.
• Handles the data via an object-relational mapping (ORM) layer.
• Publish-Subscribe model is used for the real-time notifications. It allows clients to
subscribe to an event of the server and be notified by the server when the event happens.
This notification happens via a push-operation.
• Custom message serialization/deserialization.
• Per-packet compression/decompression.
28

3.3. SERVER TECHNOLOGY CHAPTER 3. THE SYSTEM
• Through introspection, the server generates a networking libraries, compliant to a pro-
tocol interface. Appendix A provides more information on this feature and illustrates
some of the security and compression mechanisms.
• Accounting, sessions mechanisms in order to keep track of users and their accounts and
connections.
• Access-Level security mechanism.
All these features were actually implemented by ourselves, since at the time of our research
not all of the technology was available to us.
29

3.4. THE CLIENT FRONT-END CHAPTER 3. THE SYSTEM
3.4 The Client Front-End
3.4.1 Overview
Figure 3.8: The prototype of the client front-end
The system we have described manipulates points in high dimensional space. This is not
going to change. What we will add in this section is a way to present these points in a low
dimensional space so that the user can interact with the system through direct manipulation
in real time. For our prototype, we developed the visualization system, called GridMap.
Section 3.4.2 explains how this system works and why.
30

3.4.2 GridMap
The reduction to the two dimensional space was already explained in the section of Aspect
Maps (3.2.6). This was accomplished by multidimensional scaling. It is more important to
know that a point is near another point than to know the exact distance. For example, as
shown in figure 3.8, it is more important to know that Action is close to Adventure than to
know the exact distance. Gridmap then, maps the points from 2D space calculated by MDS
to a grid, where the exact distances disappear but the spatial order is retained. In the figure
3.8 9 cells are presented and the number of cells can be changed depending on the size of the
screen (for example, during our experiments on 24 inch screen, the optimum GridMap size
was 4 by 5, allowing to present easily more than a hundred of movies without overloading the
user with information).
Figure 3.9: This figure illustrates a mapping performed by the GridMap which removes the
exact distances while leaving the order intact.
The interface makes it easy for the user to zoom and filter: the left panel (as shown on
figure 3.8) allows to switch between the aspect maps and filter and search on every facet. For
example, when the user knows some particular actor he can search in the actors pane and
then zoom and view the similar actors. Additionally, this panel allows the user to customize
the zooming criteria and tune the MultiMap parameters: number of movies per facet, number
of facets to show on the map, etc.
31

Transitions
It often happens that a person viewing a scene fails to see large changes in the scene. This
is called change blindness, a well-known psychological phenomenon [20]: if the change in the
scene coincides with some visual disruption such as a saccade (very small eye movements) or
when the scene is briefly obscured. This situation often occurs in web applications, where
the web page briefly flashes after actions demanding a new server request. In this context,
animated transitions help the user see the changes in the scene [13] [21].
The transitions turned out to be quite important, providing visual feedback to the user so he
know what’s going on. In the GridMap, there are two kinds of crucial transitions:
• The transition that animates the facet pane, keeping it visible during the zooming on
this pane, then moving it to a new position. This greatly helps to the user to keep track
of the item he is zooming on to. This is needed, since on each zooming the coordinate
system changes according to the zoom and can be quite confusing to the person who
uses the interface.
• The transition that is shown on the figure 3.10 which flips the grid cell, allowing the user
to see the details of a particular movie within its context. This transition allows the
user to directly see the information about the actual element he’s interested in, keeping
everything in context. The user can always flip back and see other movie details. This
is what people do in the video store where they look at available movies, pick one and
flip it to see it’s details on the back.
Figure 3.10: As in the video store, one can select the movie and look on the back of the box
to see the details.
32

Cell Representation
The cell representation allows the flipping feature, illustrated in figure 3.10. Based on the
feedback of users, this feature proved to be very attractive and motivated them to experiment
further with the interface. Additionally, this allows to present movie details while keeping
the other facets visible.
The figure 3.11 shows how a list of movies is presented on a grid cell, giving a visual relevance
feedback with a star. Golden stars are the best-rated movies and are probably most interesting
for the user to check out.
Figure 3.11: An actual list of movies presented on the front of the grid cell.
The figure 3.12 illustrates the content presented when a movie is flipped: the movie cover,
synopsis and two additional tabs.
Figure 3.12: Details of the movie, first tab. It presents the synopsis available to the user to
read in order to learn about a particular movie.
The second tab, illustrated on figure 3.13, shows the related information of the movie, linking
directly to different facets. By clicking on a particular genre, for example, the system will
perform a zoom on the facet and construct a new map. It allows back and forth navigation:
from big picture to details of one movie, then moving again on another map and zoom in
again to a particular movie.
33

Figure 3.13: Details of the movie, second tab. It presents the facet links to various information
as year, rating and the genres of the movie.
Figure 3.14: Details of the movie, third tab. It presents the facet links to the directors and
actors.
The last tab, illustrated on ﬁgure 3.14, allows the user to view the people: directors who
made the movie and actors starring in the movie. Yet again, the system allows to directly
zoom one one of those links, constructing a new map.
34

Chapter 4
Usability Aspects
The main purpose of the work was to build a responsive system for a particular Rich Internet
Application, in the area of exploratory search. Of course, such a system only makes sense
if users can actually use it. So we did a, admittedly limited and informal, evaluation of its
usability.
To do so, I asked ten people, acquaintances and friends age 20-30 years old, half of each gender,
to participate in a survey about my thesis work. The were explained what exploratory search
was in general, without reference to the movie database. Next, they were asked to work with
the system for about half an hour, and find movies of their liking. After working with the
system they were asked to fill out a questionnaire with 25 questions. The questions are shown
below and were about Usefulness, Ease of Use, Ease of Learning, and Satisfaction with the
system.
The questionnaires were constructed as seven-point Likert rating scales. Users were asked to
rate agreement with the statements, raging from strongly disagree to strongly agree [17].
Following are the global averaged results of the questionnaire, per feature:
Average results of USE questionnaire
Average Usefulness: 5.3/7
Average Ease of Use: 5.6/7
Average Ease of Learning: 6.4/7
Average Satisfaction: 5.9/7
The users were very satisfied with the system and few of them also pointed out that the
interface was very beautiful and user-friendly. On the other hand, some of them thought
that the interface didn’t gave enough control to them in order to know exactly what happens
underneath.
For completeness of the section, here are the tables with averaged results:
35

CHAPTER 4. USABILITY ASPECTS
Average results, Usefulness questionnaire
It is useful 6.3/7
It gives me more control over the activities in my life 3.8/7
It makes the things I want to accomplish easier to get done 5.3/7
It meets my needs 5.7/7
It does everything I would expect it to do 5.3/7
Average results, Ease of Use questionnaire
It is easy to use 5.8/7
It is user friendly 6.7/7
It requires the fewest steps possible to accomplish what I want to do with it 5.7/7
Using it is eﬀortless 5.2/7
I can use it without written instructions 4.5/7
I don’t notice any inconsistencies as I use it 5.0/7
Both occasional and regular users would like it 6.2/7
I can recover from mistakes quickly and easily 5.8/7
I can use it successfully every time 5.5/7
Average results, Ease of Learning questionnaire
I learned to use it quickly 6.5/7
I easily remember how to use it 6.5/7
It is easy to learn to use it 6.2/7
I quickly became skillful with it 6.3/7
Average results, Satisfaction questionnaire
I am satisﬁed with it 6.0/7
I would recommend it to a friend 6.3/7
It is fun to use 6.7/7
It works the way I want it to work 6.2/7
It is wonderful 4.8/7
I feel I need to have it 4.8/7
It is pleasant to use 6.2/7
Those are preliminary results, but the more formal evaluation is beyond the scope of this
thesis.
36

Chapter 5
Conclusions
This thesis described a form of exploratory search where responsiveness was of the essence.
The application we called ‘MultiMap’ can be categorized under the heading of so-called Rich
Internet Applications, a class of applications that is becoming more and more important as
data bases become larger, more specialized, and more distributed. Because of this, users more
and more often get into a situation where they know there must be information available to
answer their questions, nor are the means to formulate a precise query.
The resources they need to answer such a query may be available on remote servers, hence
to quickly explore possible answers, the servers much be made responsive enough or else the
user will quickly give up. MultiMap was built with such users in mind. Every design decision
in this thesis was under the constraint of responsiveness.
This led to the following requirements:
• The system should be responsive, scalable, and interactive.
• The system should support exploratory search.
• The system should provide real-time spatial visual feedback reflecting changes in the
high-dimensional search space.
Exploratory search is the problem to find information that we may not know how to formulate,
but which we will recognize once we see it. There are three bottlenecks that could make
our system unresponsive: (1) complex calculations, (2) slow zooming, and (3) ineffective
visualization . The way we solved these bottlenecks are the following:
1. Every computation that can be done in advance will be done in advance, so that it
cannot cause any delay.
2. Zooming and map generation are highly optimized and can be done in real-time.
3. The visualization is presented to the user in a cognitively appropriate way.
We believe that such a system should be constructed in a modular fashion and in this thesis
we presented a way to do so. This modularity allows, for example, to change the ranking or
37

CHAPTER 5. CONCLUSIONS
enhance the selection algorithms and be able to evaluate the new algorithm performance based
on the existing one. It also allows to build various user-interfaces on top of the search engine
and eventually audience-targeted user interfaces. During the research we discussed several
different possible front ends, including different 2-dimensional representations enhanced with
colors, sounds, font sizes. Also, 3-dimensional interfaces can be built and are very interesting
directions to explore. We considered implementing 3-dimensional sphere navigation where
the zooming could allow to create 2D map or a new 3D sphere, but we leave that for future
work.
The third question (about usability) was answered by evaluation and feedback we got from
users. The users were very satisfied with both MultiMap and GridMap, but also felt that
they had not enough control over the system. They quickly learned how to use the system
and how to get movie suggestions. However, here was a need to explain and introduce them
to the concept at first, as it is a different approach to information exploration.
After having designed and evaluated the system, we believe that the map generation technique
presented in this thesis is an important direction to go and an effective way to perform
exploratory search.
38

Bibliography
[1] International movie database, http://www.imdb.com, December 2009.
[2] Rfc 2616: Hypertext transfer protocol – http/1.1, http://tools.ietf.org/html/rfc2616,
June 1999.
[3] Http wikipedia, http : //en.wikipedia.org/wiki/hypertexttransferprotocol, June 2010.
[4] G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: a
survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge
and Data Engineering, 17:734-749, 6 2005.
[5] G. Armitage. Quality of service in ip networks: Foundations for a multi-service internet.
Macmillan Technical Publishing, 4 2000.
[6] G. Armitage, M. Claypool, and P. Branch. Networking and Online Games: Understand-
ing and Engineering Multiplayer Internet Games. John Wiley and Sons Ltd., 2006.
[7] R.M. Bell, J. Bennett, Y. Koren, and C. Volinsky. The million dollars programming
prize. IEEE Spectrum, 5 2009.
[8] S. Caltagirone, M. Keys, B. Schlief, and M. J. Willshire. Architecture for a massively
multiplayer online role playing game engine. Journal of Computing Sciences in Colleges,
Volume 18, Issue 2, 12 2002.
[9] Piero Fraternali, Gustavo Rossi, and Fernando S andnchez Figueroa. Rich internet ap-
plications. Internet Computing, IEEE, 14(3):9 –12, may-june 2010.
[10] J. Gregory. Game Engine Architecture. A K Peters, 2009.
[11] M. A. Hearst. Next generation web search: Setting our sites. IEEE Data Engineering
Buletin 23, 3, 38-48, 3 2000.
[12] M. A. Hearst. Design recommendations for hierarchical faceted search interfaces. SIGIR,
Workshop on Faceted Search, pages 2630, August 2006. pages 2630, August 2006, 2006.
[13] J. Heer and G. Robertson. Animated transitions in statistical data graphics. IEEE
Transactions on Visualization and Computer Graphics, 6 2007.
[14] J. F. Kurose and K. W. Ross. Computer Networking A Top-Down Approach. Pearson
Education Inc., 2008.
39

BIBLIOGRAPHY BIBLIOGRAPHY
[15] X. Lin. Map displays for information retrieval. Journal of the Americal Society for
Information Science, 1 1997.
[16] X. Lin, D. Soergel, and G. Marchionini. A self-organizing semantic map for informa-
tion retrieval. Proceedings of the 14th annual international ACM SIGIR conference on
Research and development in information retrieval. 262 - 269, 1991.
[17] A.M. Lund. Measuring usability with the use questionnaire. STC Usability SIG Newslet-
ter, 8:2, 8 2001.
[18] J. Makar. ActionScript for Multiplayer Games and Virtual Worlds. New Riders, 2010.
[19] G. Marchionini. Exploratory search: From finding to understanding. Communications
of the ACM 49, 4 2006.
[20] J. ORegan, R. Rensink, and J. Clark. To see or not to see: The need for attention to
perceive changes in scenes. Psychological Science, 8 1997.
[21] G. M. Sacco and Y. Tzitzikas. Dynamic Taxonomies and Faceted Search: Theory, Prac-
tice, and Experience. Springer Science and Business Media Inc., 2009.
[22] J. Smed and H. Hakonen. Algorithms and Networking for Computer Games. John Wiley
and Sons Ltd, 2006.
[23] M. Steyvers. Multidimensional Scaling. In: Encyclopedia of Cognitive Science. Macmillan
Reference Ltd., 2002.
[24] D. Svanaes. Understanding Interactivity: Steps to a Phenomenology of Human-Computer
Interaction. PhD Thesis. NTNU, Trondheim, Norway, 2000.
[25] A. G Taylor. Introduction to Cataloging and Classification. 8th ed. Englewood, Colorado.
Libraries Unlimited, 1992.
[26] B.C Vickery. Faceted classification: a guide to construction and use of special schemes.
London: Aslib, 1960.
[27] R.W. White, B. Kules, S.M. Drucker, and M.C. Schraefel. Supporting exploratory search.
Communications of the ACM, 49, 4 2006.
40

Appendix A
Protocol Generation DSL
Since I had to do all the programming for the research project myself, the workload was quite
demanding. In order to avoid writing individual implementations for each networking method
or protocol, the protocol generation mechanism has been implemented. To explain how it
works, consider the following C# code:
Listing A.1: A partial definition of the MultiMap protocol
[ Protocol ]
public interface IMultiMapProtocol
{
// Gets a l l aspects in the system
[ ProtocolOperation (100 , Direction . Pull , CompressionTarget . Outgoing ) ]
Aspect [ ] GetAllAspects ( ) ;
// Zooms to a p a r t i c u l a r s e l e c t i o n
[ ProtocolOperation (106 , Direction . Pull , CompressionTarget . Incoming ) ]
void Zoom( Aspect Aspect , List<int> Facets ) ;
// Gets some a d d i t i o n a l information of a movie
[ ProtocolOperation (112 , Direction . Pull , CompressionTarget . Outgoing ,
AccessLevel=AccessLevel . Root ) ]
MovieDetails GetMovieDetails ( int Oid ) ;
( . . . )
}
Figure A.1 illustrates the code one needs to write in order to define a communication protocol.
Such approach can be also considered as a domain-specific language (DSL). Once the protocol
definition is written, the server analyses the protocol definition and generates the code to make
all the communication possible. It generates an assembly for its own and a flash component
library (.swc) for flash application, thus, making possible to simply call any method and
abstracting the complexity from the developer. Our research greatly benefit from this DSL,
as several thousands of lines of code could be generated eliminating potential errors and
boosting productivity.
41

APPENDIX A. PROTOCOL GENERATION DSL
Using the protocol definition it is also possible to define the compression direction (None,
Incoming, Outgoing or Both), which will generate the subsequent function calls during the
packet compilation/read. It is also possible to define the security level per operation, using
AccessLevel parameter (shown in figure A.1).
42

Master Thesis: The Design of a Rich Internet Application for Exploratory Search by Real-Time Generation of Similarity Maps

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Master Thesis: The Design of a Rich Internet Application for Exploratory Search by Real-Time Generation of Similarity Maps

Ähnlich wie Master Thesis: The Design of a Rich Internet Application for Exploratory Search by Real-Time Generation of Similarity Maps (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Master Thesis: The Design of a Rich Internet Application for Exploratory Search by Real-Time Generation of Similarity Maps