[PDF] Record Recommendations for the CERN Document Server





Previous PDF Next PDF



CERN Document Server Software: the integrated digital library 1

1 févr. 2006 It is developed by the CERN Document Server team and is driven and validated by the CERN. Scientific Information Service.



Record Recommendations for the CERN Document Server

17 juin 2016 All the documents pictures



An Integrated Library System on the CERN Document Server

30 avr. 2010 Circulation modules have to process usually



CERNCOURIER

9 nov. 2017 celebrate 60 years of the Particle Data Group whose Review of Particle ... On 12 September



SISSA: The ATLAS public website-Evolution to Drupal 8

This theme was developed by the CERN web team to support clients like the ATLAS of science and the methodology of the scientific process the value of.



Pushing the Boundaries of Open Science at CERN: Submission to

research teams consisting of thousands of researchers from. * kamran.naim@cern. The CERN Document Server (CDS)11 is CERN's insti-. 7https://scoap3.org/.



ATLAS public website: Evolution to Drupal 8

9 févr. 2022 management system the ATLAS Education & Outreach group has completed its migration to the new CERN Drupal 8 infrastructure.



ATLAS public website: Evolution to Drupal 8

9 févr. 2022 management system the ATLAS Education & Outreach group has completed its migration to the new CERN Drupal 8 infrastructure.



ATLAS public website: evolution to Drupal 8 - CERN Document Server

19 mars 2020 override theme. This theme was developed by the CERN web team to support clients like the ATLAS experiment at the LHC to develop websites.



Study on the career trajectories of people with a working experience

8 sept. 2019 This volume is indexed in: CERN Document Server (CDS) ... on questionnaires sent to the team leaders of the different institutes of the ...

CERN-THESIS-2016-053

17/06/2016Hochschule Karlsruhe

Technik und Wirtschaft

Bachelor ThesisRecord Recommendations

for the CERN Document ServerAuthor:

David ZERULLA

Supervisors:

Prof. Dr. Uwe HANEKE

Hochschule KarlsruheLudmila MARIAN

CERN

February 2016

Abstract

CERN Document Server (

CDS ) is the institutional repository of the European

Organization for Nuclear Research (

CERN ). It hosts all the research material produced at CERN , as well as multi- media and administrative documents. It currently has more than 1.5 million records grouped in more than 1000 collections. It"s underlying platform is Invenio, an open source digital library system created at CERN . As the size of CDS increases, disco veringuseful and interesting records becomes more challenging. Therefore, the goal of this work is to create a system that supports the user in the discovery of related interesting records. To achieve this, a set of recommended records are displayed on the record page. These recommended records are based on the analyzed behavior (page views and downloads) of other users. This work will describe the methods and algorithms used for creating, imple- menting, and the integration with the underlying software platform, Invenio. A very important decision factor when designing a recommender system is the analysis of the available data. Based on the CDS data a recommender system using a user-record graph is designed and implemented. Finally, the implementation is evaluated to determine the optimal parameters and the performance of the recommender system. Although this recommender sys- tem is based on the CDS data, it is generic enough to b eus edfor other digital libraries powered by Invenio. i

Zusammenfassung

Organisation für Kernforschung (CERN). Hier werden alle Forschungsmate- rialien, die bei CERN erstellt werden, sowie Multimedia- und verwaltungs- technische Dokumente archiviert und zur Verfügung gestellt. Zurzeit sind basiert auf Invenio, einer von CERN entwickelten Open-Source Bibliotheks- plattform. Da immer mehr Dokumente in CDS archiviert werden, wird es zu finden. Durch die Entwicklung und Integration eines softwarebasierten Empfehlungssystems in das CDS soll der Benutzer beim Finden anderer dem Eintrag eine Auswahl von empfohlenen Dokumenten bereitgestellt. Die Empfehlungen basieren auf dem analysierten Verhalten anderer Benutzer, wie deren Seitenaufrufen und Downloads. In dieser Arbeit werden die benutzten Methoden und Algorithmen für die Entwicklung, Implementierung und Integration in die zugrunde liegende Softwareplattform Invenio beschrieben. Die Analyse der Datenbasis ist ein entscheidender Faktor bei der Entwicklung eines Empfehlungssystems. Ba- sierend auf den analysierten CDS-Daten wird ein Empfehlungssystem, wel- chem ein Benutzer-Eintrag Graph zugrunde liegt, designed und implemen- tiert. Anschließend wird das entwickelte Empfehlungssystem evaluiert. Ziel dieser Evaluation ist das Finden der optimalen Parameter des Empfehlungs- tem ausgehend von der CDS Datenbasis entwickelt wurde, ist es universell genug, um es in anderen Invenio Systemen integriert zu werden. ii

Contents

Abstract

i

List of Figures

iv

Abbreviations

vi

Listings

vii

1 Introduction

1

1.1 Motivation

1

1.2 Context

2

1.2.1 CERN Document Server

2

1.2.2 Invenio a Digital Library System

3

1.3 Goal and Milestones

3

1.4 Structure

4

2 Background

5

2.1 Recommender Systems

5

2.1.1 Collaborative Filtering

7

2.1.2 Content-Based

8

2.1.3 Graph-based Methods

9

2.1.4 Implicit and Explicit Ratings

9

2.1.5 Recommender Problems

10

2.2 Evaluation of Recommender Systems

11

2.2.1 Data Selection for Evaluation

12

2.2.2 Prediction Accuracy

13

3 Generating Recommendations

17

3.1 Recommender Requirements

17

3.2 CDS Data

18

3.2.1 Record description

19

3.2.2 User Data

19

3.2.3 Data Overview

20

3.3 Collaborative or Content-Based Filtering

21
iii

Contents iv

3.4 Algorithm Decision

23

3.5 Recommender Approach

24

3.5.1 Rating Normalization

25

3.5.2 Building the Graph

27

3.5.3 Distance Calculation

28

3.5.4 Finding Neighbors

29

3.5.5 Combining multiple paths to one similarity score

30

3.5.6 Generating Record to Record Recommendations

31

4 Software Design and Implementation

32

4.1 Software Requirements

32

4.2 Software Architecture

35

4.3 Software Selection

35

4.3.1 Programming Language: Python

35

4.3.2 Graph library: NetworkX

36

4.3.3 Elasticsearch

36

4.3.4 Redis

37

4.4 Data Collection

37

4.4.1 Exporting the Data

38

4.4.2 Filtering the Data

39

4.5 Find Item-Item Relationships

41

4.6 Integration in Invenio

42

4.7 Statistics

44

5 Evaluation

45

5.1 Metric Selection

45

5.2 Test Setup

46

5.3 Data Selection

48

5.4 Recommender Test

48

5.4.1 Downloads Indicate a Higher Interest?

49

5.4.2 Graph Depth Accuracy and Speed Test

52

5.4.3 Showing more recommendations

55

5.5 The Optimal Recommender Settings

60

6 Conclusion

61

6.1 Summary

61

6.2 Future Work

63

References

64

Literature

64

Online sources

65

List of Figures

1.1 CERN Document Server (CDS)

2

3.1 Shows how many user viewed how many records. Filtered

user data on a logarithmic scale. 21

3.2 User-item graph with weights, showing one user and four items.

25

3.3 User-item graph with weights, showing three users and three

records. 28

4.1 The recommendation box on CDS

44

5.1 Precision per Run for Run 1-4

51

5.2 Precision per Run for Run 5-8

52

5.3 Good Recommendations per Run for Run 1-8

54

5.4 Average Runtime in Seconds per Generated Recommendation

per Run for Run 1-8 55

5.5 Precision and recall for three presented recommendations.

57

5.6 Precision and recall for five presented recommendations.

57

5.7 Precision and recall for ten presented recommendations.

58
v

Abbreviations

CERNEuropean Organization for Nuclear Research

CDSCERN Document Server

CSVComma-separated values

RMSERoot Mean Squared Error

MAEMean Absolute Error

JSONJavaScript Object Notation

RESTRepresentational state transfer

APIApplication programming interface

HTMLApplication programming interface

XMLExtensible Markup Language

AjaxAsynchronous JavaScript and XML

DFSdepth-first search

BFSbreadth-first search

DESYGerman Electron Synchrotron

SLACSLAC National Accelerator Laboratory

FNALFermi National Accelerator Laboratory

EPFLSwiss Federal Institute of Technology in Lausanne vi

Listings

4.1 Elasticsearch example query

36

4.2 Elasticsearch export query

38

4.3 The shortend returned JSON formated response form the rec-

ommendation endpoint. 43
vii

Chapter 1

Introduction

1.1 Motivation

The amount of easily accessible information as published articles, new books and reports is increasing. This is because technology has enabled us to pub- lish and distribute information more easily.

This can be seen in the CERN Document Server (

CDS ) that stores over 1.4 million records and is steadily growing. The CDS is an institutional rep os- itory at CERN . A record at the CDS can b eev erydo cumentthat CERN creates. For example a published article from one of the experiments located at CERN , recordings of presentations held at CERN and also b ooksthat are available at the CERN

Library .

With more and more records available it gets harder to find relevant records. That is why we want to support the user to discover other related records, once he found an interesting record. In order to find these related records a recommender system is used. Rec- ommender systems are software tools and techniques that try to find items that are interesting for a user. Based on previous recorded interests and the similar users. Similar users are users which share the same interests as the current user. Recommender systems support the user in decision making processes, as what book to read, which movie to watch or what item to buy [ 12 14 1

1. Introduction 2

1.2 Context

This thesis got created at the European Organization for Nuclear Research CERN ). CERN is one of Europe"s first joint ventures with now 21 member states. Physicists and engineers there are probing the fundamental structure of the universe. For that they use the world largest scientific instruments to study the fundamental particles. CERN is employing over 2,500 people. The scientific and technical staff members design and build the particle accelerators and ensure they oper- ate smoothly. CERN hosts various experiments with around 12,000 visiting scientists from over 70 countries. Half of the worlds particle physicists come to CERN for their research [ 17 ]. The CERN laboratory is located at the

Franco-Swiss boarder near Geneva.

1.2.1 CERN Document Server

All the documents, pictures, videos and publications and basically every digital document that got created at CERN are stored by the CERN Docu- ment Server ( CDS ). Most of these resources are available publicly, but also internal documents are stored.Figure 1.1:CERN Document Server (CDS)

1. Introduction 3

1.2.2 Invenio a Digital Library System

The underlying software platform of the CERN Document Server ( CDS ) is Invenio. Invenio is a free and open source software suite that gives every- one the possibility to run their own digital library or document repository on the web. The Invenio software suite covers all functions that are needed for a digital library like the document management, classification, indexing, curation and document dissemination. The flexibility through its modular design and performance makes it a comprehensive solution for the manage- ment of document repositories of moderate to large sizes. Invenio has been originally developed at CERN to run the CERN document server. Nowadays Invenio is co-developed by an international collaboration comprising institutes such as CERN DESY EPFL FNAL SLA C and is being used by many more scientific institutions worldwide [ 22

1.3 Goal and Milestones

The goal of this thesis is to enhance the user experience of CDS , by helping the user to find and explore other related records (documents). The user should get ideas what to view in a similar way as for example Amazon is doing the recommendations for products other "[c]ustomers Who Viewed

This Item Also Viewed" [

12 To achieve this a recommender system is designed and implemented based on the constraints of the CDS an dits a vailabledata and metadata.

The first step is to analyze the available

CDS data to determine h owit can be used to find the related records. Depending on the available data a different recommender approaches has to be selected. For that, different recommender approaches using item-based and collaborative filtering are explored. The recommender system needs to be integrated into the Invenio software suit. At the end of the project all developed code will be contributed back to the Invenio project. In this way, the recommendation system can also be used by other Invenio instances.

1. Introduction 4

Based on this goal the following milestones have been identified: •Analyze what data is available in CDS, to get an overview what is possible to achieve and how the data can be used. •Extract and filter the data from Invenio so it can be used by the recommender system. •Implement a recommender system that interacts with Invenio. •Display the recommendations on the CDS site. •Evaluate the recommender system.

1.4 Structure

Chapter 2 gives a general overview of available recommender systems, with the focus on item-based and collaborative filtering based approaches. Chap- ter 3 introduces the requirements that are given by the CDS group for the recommender system. These requirements are then analyzed and used to cre- ate a recommender system specific for the available data of CDS . For that the available data of CDS is analyze d.Based on the analysis of the data the decisions about what recommendation approach to use and what algorithm fits to create the best results are made. The approach for the recommender system is then shown. In the following chapter 3, the requirements of the software are discussed. Based on these requirements the design of the soft- ware is created. The implementation part starts in chapter 4 with how the data is collected and filtered, followed by how the item-item relationships are calculated, and finally, how the recommender system is integrated into Invenio. In Chapter 5 the important aspects of evaluating the recommender system are covered.

Chapter 2

Background

This chapter provides the background knowledge about recommender sys- tems and the different technologies. It will also show the common problems of recommender systems and how to evaluate recommender systems.

2.1 Recommender Systems

Recommender systems, sometimes also called recommendation systems are software tools and techniques that try to find items that are interesting for a user. They are there to support the user in the decision making process, as what book to read, which movie to watch or what item to buy [ 14 In the context of a recommender system the "item" of a recommender system can be of different kinds, like a movie, a book, a record, a newspaper article and also a product. To recommend useful items to a user, a recommender system has to predict how useful this item is to the user. In order to do this the recommender system needs information about the user and the items, and also a way to compare and then decide which items get recommended [ 6 Recommendation systems are popular in applications where a user has many different items to choose from, such as what music to listen to on Spotify [ 26
what product to buy on a e-commerce website such as Amazon [ 12 18 or which movie to watch on Netflix. Netflix is one of the most referred recommendation system because of the "Netfix Prize" [ 24
5

2. Background 6

The Netflix Prize was an open competition for the best collaborative fil- tering algorithm, outperforming Netflix's own algorithm. The prize was 1 Million dollar, which was won in 2009. The winning algorithm improved the recommender performance, of the one from Netflix by 10.06%. The three most common scenarios where recommender systems are used, areprediction,rankingandclassificationtasks [10]. Depending on the task also the user interface for the recommendations change. •Apredictiontask focuses on predicting how a user will rate a specific item he or she has not rated yet. A prediction task then shows the user the items he will enjoy the most. These kinds of recommendations are usually used for movie, music and item recommendations as for example on Netflix, Spotify and Amazon. •Arankingtask is performed to generate a top-k list of items. This is used for example on news sites to recommend the top 10 articles of the month or also on web stores to show other items the user could be interested in. •Aclassificationanalysis items and predicts how likely they are belong to a class. The classes can be learned automatically from training data or also through manually labeled data. An example for a classification system is a spam detection system. It filters new items, which are emails, into the classes spam and non Spam. There are several taxonomies for recommendation systems which use differ- ent algorithms and techniques. These recommendation systems can be put into five different classes [ 3 •collaborative filtering: Finds the similarity between items using collab- orative information about users or items. •content-based filtering: Finds the similarity between items based on content. •utility-based: Find recommendations which are based on the computed utility of the rated items which describes the there preferences. •demographic: Categorize the user based on demographic classes and their personal attributes.

2. Background 7

•knowledge-based: Have knowledge about how a particular item meets a particular need of a user and therefore can recommend this item. To achieve even better results different approaches can be combined into hy- brid recommendation systems [ 3 ]. The collaborative and the content based filtering approach are introduced in the next sections. These two were se- lected because they are a good fit for the available data, presented in sec- tion 3.2

2.1.1 Collaborative Filtering

The collaborative filtering approach works by collecting the interactions of users with items. For example, a user Max has rated some records. Than his user profile can be used to discover neighbors, which are other users, thatquotesdbs_dbs24.pdfusesText_30
[PDF] CERN Pension Fund Preparing for retirement

[PDF] ?erná hora (Schwarzberg) von Kvilda aus

[PDF] cernay (68) – hypermarche leclerc

[PDF] CERNAY / rénovation du complexe sportif Daniel ECK

[PDF] Cernay et Environs - France

[PDF] Cernay hotels list - access map-En - Anciens Et Réunions

[PDF] Cernay la Ville_2013

[PDF] Cernay les SRC natation

[PDF] Cerner France Infos - Santé Et Remise En Forme

[PDF] Cernes Plan de Traitement - Support Technique

[PDF] cernes: au eansea

[PDF] CERNEVIT, poudre pour solution injectable ou pour perfusion - Chirurgie

[PDF] cerom_entreprises_de_nouvelle

[PDF] Ceros® TCP Granulé et Putty - France

[PDF] cerp bretagne nord - Gestion De Projet