The Solution Distribution of Influence Maximization PDF

Karaté du Bushido Club de. Trappes excelle encore une fois pionnats des Yvelines le BCT a ... http://www.karate-trappes.fr. Fehd Benbouabid.

Vue PDF

24 Oct 2018 du club de Karaté de Trappes-en-Yvelines où il s'entraine 5 fois par semaine. ... KARATE (B.C.T) Bushido Club de Trappes.

Russek: Physiotherapy Management for Children

18 Jul 2019 dance karate forms. Select Appropriate Activities. More Challenging for EDS/HSD. • High impact: running

Vue PDF

11 Jul 2019 espoir du karaté club de Trappes : Rayyan Méziane. ... dans le milieu du karaté. ... KARATE (B.C.T.) Bushido Club de trappes.

List of Employers who received payments under the Temporary

ATHY KENPO KARATE. UNIT 1 BLOCK G. ATHY BUSINESS CAMPUS. KILKENNY ROAD ATHY BCT AVATION MAINTENANCE LIMITED. HEAD OFFICE CENTENNIAL HOUSE EAST MIDLANDS ...

Folder ID: 30036081 Date: Fonds: ISAD Reference Code: WB IBRD

24 Oct 2018 the screen Dan Rather being felled by a karate chop

29th BCT troops return

Kuwait at the 29th BCT welcome home ceremony at Mar- Soldiers who wear the 29th BCT ... karate uniform a 10 percent discount on driver's educa-.

ÉLECTIONS

Rayyane Médiane le nouvel espoir du Karaté Personnellement

The Solution Distribution of Influence Maximization

22 Mar 2020 RIS [7] TIM+ [70] IMM [69]

Longstanton Life

3 May 2018 The information in The Longstanton Life is provided in good faith and we have tried to ensure that it is accurate and correct.

The Solution Distribution of Influence Maximization A High-level Experimental Study on Three Algorithmic Approaches

Naoto Ohsaka

NEC Corporation

naoto.ohsaka@gmail.com ABSTRACTIn?uence maximization is among the most fundamental al- gorithmic problems in social in?uence analysis. Over the last decade, a great e?ort has been devoted to developing e?cient algorithms for in?uence maximization, so that iden- tifying the "best" algorithm has become a demanding task. In SIGMOD"17, Arora, Galhotra, and Ranu reported benchmark results on eleven existing algorithms and demonstrated that there is no single state-of-the-art o?ering the best trade-o? between computational e?ciency and solution quality. In this paper, we report a high-level experimental study on three well-established algorithmic approaches for in?uence maximization, referred to as Oneshot, Snapshot, and Reverse In?uence Sampling (RIS). Di?erent from Arora et al., our ex- perimental methodology is so designed that we examine the distributionof random solutions, characterize the relation be- avoidimplementation dependencies. Our main ?ndings are as follows:1.For a su?ciently large sample number, we obtain a unique solution regardless of algorithms.2.The average solution quality of Oneshot, Snapshot, and RIS improves at the same rate up to scaling of sample number.3.Oneshot requires more samples than Snapshot, and Snapshot requires fewer but larger samples than RIS. We discuss the time e?- ciency whenconditioningOneshot, Snapshot, and RIS to be of identical accuracy. Our conclusion is that Oneshot is suit- able only if the size of available memory is limited, and RIS is more e?cient than Snapshot for large networks; Snapshot is preferable for small, low-probability networks. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro?t or commercial advantage and that copies bear this notice and the full citation on the ?rst page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speci?c permission and/or a fee. Request permissions from permissions@acm.org.

SIGMOD"20, June 14-19, 2020, Portland, OR, USA

2020 Copyright held by the owner/author(s). Publication rights licensed

to ACM.

ACM ISBN 978-1-4503-6735-6/20/06...$15.00

https://doi.org/10.1145/3318464.3380564ACM Reference Format: Naoto Ohsaka. 2020. The Solution Distribution of In?uence Max- imization: A High-level Experimental Study on Three Algorith- mic Approaches. InProceedings of the 2020 ACM SIGMOD Inter- national Conference on Management of Data (SIGMOD"20), June

14-19, 2020, Portland, OR, USA.ACM, New York, NY, USA,16 pages.

https://doi.org/10.1145/3318464.3380564

1 INTRODUCTION

Social in?uence among individuals plays an immense role in decision making and information acquisition, and the rise of online social networks has empowered it to spread out at a tremendous scale. Understanding, predicting, and con- trolling social in?uence and its di?usion have become a big ?eld of research called computational social in?uence [11]. Among the most actively studied algorithmic problems in this ?eld is thein?uence maximizationproblem [35,36], ini- tially motivated by viral marketing [21]. Conceptually, in- ?uence maximization involves identifying a small number ofseedindividuals in the network who can maximize the spread of in?uence. Kempe, Kleinberg, and Tardos [35] in

2003 formulated in?uence maximization as a combinatorial

optimization problem on graphs, and their framework has been broadly accepted in the research community as well as database [ 17 19 23
24
30
34
60
63
65
68
70
One striking topic is the development of e?cient algo- rithms. Under two well-established di?usion models called independent cascade [25] and linear threshold [28], Kempe et al. [35] proved that it is NP-hard to ?nd the optimal so- lution, but the objective function referred to as thein?u- ence spreadenjoys an excellent property called submodular- ity. A naturalGreedy algorithmthus guarantees a(1-1/e)- approximation [54]. However, the sheer size of today"s real- world networks and the stochastic nature of the di?usion process make it more challenging to execute the Greedy algo- rithm. Consequently, this topic has been an active research area for the past ten-odd years (see, e.g., [ 3 12 43

1.1 Quick Review of Existing Approaches

Our focus in this paper is on the empirical behavior of ex- isting algorithms for in?uence maximization. Let us quickly reviewthem(seeSection 3 fordetails).TheGreedyalgorithm su?ers from the intractability of evaluating the in?uencearXiv:2003.09816v1??[cs.SI]??22?Mar?2020 spread, which is de?ned as the expectation over exponen- tially many realizations. The requirement is thus an e?cient scheme for approximating the in?uence spread. The basic idea for this requirement is to construct an (un- based on this idea can be classi?ed into three approaches, namely,Oneshot,Snapshot, andReverse In?uence Sampling (RIS). Each of them is parameterized by a single parameter called thesample numberthat trades between the computa- tional complexity and the solution quality. Up to now, there have been two di?erent directions for re- rithms aim to determine the least sample number required to guarantee the worst-case approximation factor theoretically. On the other hand,Oneshot- andSnapshot-type algorithms though comparing such algorithms designed for distinct pur- poses is quite complicated,RIS-type algorithms have been regarded as the state-of-the-art supported by the near-linear time complexity "in theory" [30,60,68-70]. Note that there exist numerous heuristics that provide in?uence estimates quickly, but they often result in poorly in?uential solutions.

1.2 Benchmarking Study in 2017

Arora, Galhotra, and Ranu [2] published a paper titled"De- bunking the Myths of In?uence Maximization: An In-Depth Benchmarking Study"at SIGMOD 2017. The authors exposed a pitfall in comparing in?uence maximization algorithms by experiments through an exhaustive benchmarking study on eleven existing algorithms [15-17,24,26,27,31,44,62,69,

70]. The results indicated that1.the algorithmic e?ciency is

sensitive to the choice of problem instances, and2.no single state-of-the-art achieves the best trade-o? among computa- tion time, memory consumption, and solution quality. Unfor- tunately, Arora et al."s experimental methodology still con- tains several ?aws as pointed out by Lu, Xiao, Goyal, Huang, and Lakshmanan [49]. For example, Arora et al. [2] used for each algorithm, a ?xed parameter value determined based on preliminary experimental results, which does not match the research focus ofRIS-type algorithms [49, Sect. 3.1]. This study complements previous studies from a di?erent aspect.

1.3 Our Motivations

In this paper, we present a high-level experimental study on existing algorithmic approaches for in?uence maximization. Rather than attempting to choose the best one among them, we would like to clarify their potential applicability. Our key objective under this purpose is to elucidate the empirical impact of the sample number on the solution distribution for each algorithmic approach as driven by the following three facts.1.Existing algorithms ar erandomize d.

Since an in?u-

ence estimator is randomized, each algorithm run gener- ates random solutions as well. Despite this nature, most of the previous studies conducted few-trials experiments only, e.g., the number of trials is 3 in [70], 5 in [69], 10 in [13,15,47], 20 in [30], 50 in [16], and not explicitly stated in [14,17,19,24,26,27,31,38,39,56,57,60-62]; conclu- sions based on them would be questionable. In this paper, we analyze the empirical distribution of random solutions made from 1,000 trials to gain a deeper understanding of the stochastic behavior of randomized algorithms. 2.

Sample numbercontrolstheactualsolutionquality.

Selecting an appropriate sample number is crucial, but

61,68-70], and thus, we have no simple way to compare

RIS-type algorithms with the other two types. Indeed, suchworst-caselower bounds are too loose to explain the empirical success fully. In this paper, we run algorithm implementations for a wide range of sample numbers, the sample number and the actual in?uence. We can, for example, ?nd the minimum sample number required to obtain near-optimal solutions with high probability. 3.

Ther eis a plethora of algorithm implementations.

We aim to evaluate the algorithmic e?ciency, as well; however, a complete experimental comparison of exist- ing implementations is hard for two reasons. First, one can neither completely understand nor modify complex source codes published by many di?erent research groups. For example, on Arora et al. [2]"s setup, SimPath [27] got stuck in an in?nite loop due to its di?erent scheme for handling graph data [49, Sect. 3.2.2]. Second, we have RAM usage, which severely depend on implementations and machine con?gurations. In this paper, to avoid these implementation dependencies, we make use of simple im- plementations that capture the essence of each approach. We measure the number of vertices and edges traversed (traversal cost) and those stored in memory (sample size). Remark that the former (resp. latter) is proportional to running time (resp. memory usage), where the propor- tionality constant depends on testing environments.

1.4 Our Findings

Our empirical ?ndings are summarized as follows (see Sec- tion 5 for mor edetails). •Distribution of solutions : We ?rst reveal how solution distributions converge to what distribution. We ?nd that Oneshot,Snapshot, andRISreturn the unique solution for a su?ciently large sample number. We further ?nd that the Shannon entropy of solution distributions ofOneshot, Snapshot, andRISdrops at the same rate up to scaling of sample number. •Distribution of in?uence spread : We then analyze the empirical distribution of in?uence spread. The minimum sample number required to obtain near-optimal solutions with probability 99% takes a wide range of values; e.g., from 64 to 8,192 forOneshot. Those empirical numbers are far smaller than worst-case bounds that depend on the graph size and seed size. Comparing among the three approaches,the mean value improves at the same rate up to scaling of sample size. To achieve the same mean in?uence,Oneshotrequires up to 96 times as many samples asSnapshotrequires,Snapshotrequires "fewer" (say,105 times fewer) but "larger" (say,103times larger) samples than

RIS; i.e.,RISis more space-saving thanSnapshot.

•Computational e?ciency : We ?nally report the traver- edges causes expensive graph traversal due to the emer- gence of agiant component[5,33,64]. Theper-sample traversal-cost ratio amongOneshot,Snapshot, andRISis that1 :mm:1n, wherenis the number of vertices,mis the number of edges, andmis the sum of all edge probabilities representing the magnitude of in?uence, showing thatRIS is the most per-sample time-e?cient.

Section

6 further discusses the trav ersalcost when condi- tioningOneshot,Snapshot, andRISto be of identical accuracy. available memory is limited, and2.RISis more e?cient than Snapshotfor large complex networks;Snapshotis preferable for small, low-probability networks. Organization.Section2 intr oducesformal de?nitions and known results of in?uence maximization. Section 3 is de- voted to a systematic survey of existing algorithms. Section 4 designs our experimental methodology. Sections 5 and 6 r e- port and discuss experimental results, respectively. Section 7 lists future directions of this study.

2 INFLUENCE MAXIMIZATION

2.1 Notations

For a positive integerℓ, let[ℓ]denote the set{1,2,...,ℓ}. We deal with two types of graph. One is a (deterministic) graphG=(V,E), whereVis a set of vertices andEis a set of edges. The other is anin?uence graphG=(V,E,p), which captures the stochastic nature of network di?usion, where Vis the vertex set,Eis the edge set, andp:E→ (0,1]is an in?uence probabilityfunction representing the magnitude of in?uence between a pair of vertices. For a vertexvinG, we useΓ+G(v)andΓ-G(v)to denote the out-neighbors and the in-neighbors ofv, respectively, and we used+G(v)andd-G(v) to denote the out-degree and the in-degree ofv, respectively. For a vertex setS, letrG(S)denote the number of vertices reachable fromSinG. We will omit the subscripts when Gis clear from the context and use the same notations for in?uence graphG. We conclude this paragraph by de?ning in?uence maximization [35], where the de?nition of the in?uence spread is deferred.

Problem 2.1 (Influence maximization).Given an in-

?uence graphG=(V,E,p)and a seed sizek, thein?uence maximizationproblem is to ?nd a seed setS⊆Vof sizekthat maximizes the in?uence spread ofS.

2.2 Di?usion Model

Network di?usion modelsdescribe the process by which in?u- ence (e.g., information, contamination, and virus) triggered by a set of seed vertices spreads over the network. In this paper, we adopt one of the well-studied models in the liter- ature of in?uence maximization. Theindependent cascade (IC)model introduced by [25] mimics the spread of infec- tious diseases. In the IC model, each vertex takes either of the two states:activeorinactive. An inactive vertex may become active but not vice versa. Let us de?neAtas the set of newly activated vertices at discrete time stept, and t are initially activated at time stept=0and the others are inactive, i.e.,A0=S. GivenAtat each time stept, we con- structAt+1as follows. Each newly activated vertexu∈Atis given a single chance to in?uence its inactive out-neighbors the case,vbecomes active at time stept+1, i.e.,vis added intoAt+1. This repetition terminates within a ?nite time step Thein?uence spreadof a seed setSinG, denotedInfG(S), is de?ned as the expected number of activated vertices by SinceInfG(·)can be viewed as a function on a subset of vertices, we callInfG: 2V→R≥0anin?uence function. Here, we describe therandom-graph interpretation[35] that characterizes the IC model. For an in?uence graphG= (V,E,p) , consider the distribution over deterministic graphs (V,E′), whereE′is obtained fromEby maintaining each edgeewith probabilityp(e). We useG∼ Gto mean that Gis arandom graphsampled from this distribution. Then, the in?uence spread of a seed setSinGis equal to the expected number of vertices reachable fromSinG∼ G, i.e., InfG(S)=EG∼G[rG(S)]. This fact tells us that we do not have to consider the chronological order of activation trials, but we need to consider reachability on random graphs.

2.3 Intractability and Approximability

Let us recall the intractability. Formally, it has been proven that for the IC model, in?uence maximization is NP-hard Table 1: Three popular algorithmic approaches for in?uence maximization.

approachsample number sample size exp. traversal cost(atk=1)time complexity(typical value)=(# vertices)+(# edges)vertexedge(naive impl.)

Oneshot# simulationsβ(104)-β

min exp. (m:=Í

ep(e))τ vInfG(v))θ 1n maxvInfG(v)O(θmn maxvInfG(v))Table 2: Representative existing algorithms. approachrepresentatives

OneshotCELF [44], CELF++ [26], UBLF [73,74 ], SIEA [58,59 ].SnapshotBondPercolation[39,40],NewGreedy[14],MixedGreedy[14],

StaticGreedy [

17 ], PMC [ 62
], SKIM [ 19 ].RISRIS [7] TIM +[70], IMM [69], LISA [20,56 ], BCT [55,61 ], SSA [ 60
], SSA-Fix [ 30
], WebGraph framework [ 65
], OPIM [ 68
].to solve exactly [35, Theorem 2.4], and it is even♯P-hard to compute the in?uence spread exactly [ 13 , Theorem 1]. Albeit the negative results, we can obtain approximation. The striking result of Kempe et al. [35, Theorem 2.2] is that the in?uence function is monotone and submodular. Here, a set functionf: 2V→Ris said to bemonotoneif it holds it holds thatf(S+v) -f(S) ≥f(T+v) -f(T)whenever S⊆T⊆V,v∈V\T. The classical result on submodular functions by Nemhauser, Wolsey, and Fisher [54] then tells us that the simple Greedy, which iteratively adds an element produces a(1-1/e)-approximate solution, i.e., it holds that f(S) ≥ (1-1/e)f(S∗), whereSis thek-seed Greedy solution andS∗is the optimal solution of sizek. Putting it all together, we have that Greedy achieves a constant-factor approximation in polynomial time. The in- ?uence spread can be approximated within a factor of(1±ϵ) for anyϵ>0by running simulationsΩ(ϵ-2n2)times [37].

3 ALGORITHMS REVIEW

In this section, we review existing algorithms for in?uence maximization systematically. Based on the mechanism of in?uence estimation, we partition existing algorithms into three approaches shown in Table 1 , namely,Oneshot(Sec- tion 3.3 ),Snapshot(Section3.4 ),Reverse In?uence Sampling (RIS)(Section3.5 ), and others (Section3.6 ).

3.1 Kempe et al."s Greedy Algorithm

Before taking up each approach, we explain the Greedy algo- adds an element that makes the largest marginal increase in in?uence into the solution untilkelements have been added. Remark that the in?uence function is evaluated for at mostnkseed sets. Due to monotonicity and submodularity of the in?uence function, the resulting solution is a(1-1/e)- approximation [54]; this factor is the best possible [22]. By contrast, it often provides near-optimal (e.g., 0.95 times the optimum) solutions empirically [ 41
44
67

].Algorithm 3.1Simple greedy framework.1:callBuild(G=(V,E,p),sample number)andS0← ∅.2:randomly shu?e the order of vertices inV.

3:for allℓ=1,2, . . .,kdo4:callEstimate(Sℓ-1,v)for allv∈V.5:vℓ←last vertex with maximum (marginal) in?uence.

7:returnSk.

The limitation of Kempe et al."s Greedy algorithm is that we are unable to evaluateInf(·)exactly. Fortunately, even if we are given only an approximate value oracle, which ap- proximates the actual value within a factor of(1±ϵ), running Monte-Carlo simulations can be employed for this purpose; however, running them fornkseed sets is computationally prohibitive, e.g., it takes a few days on graphs with ten thou- sand vertices [38]. Subsequent studies thus seek e?cient and accurate methods for estimating the in?uence spread.

3.2 A Simple Greedy Framework

Here, we introduce a simple greedy framework to describe existing algorithms in a uni?ed manner. Most of the existing algorithms fall into this framework, and Table 2 lists up representative algorithms that belong to either of the three approaches. Our greedy framework shown in Algorithm 3.1quotesdbs_dbs26.pdfusesText_32

[PDF] BCTF Code of Ethics - BC Teachers` Federation

[PDF] BCTF Social Justice Program - Anciens Et Réunions

[PDF] BCU Verso 350 HD GT PN Nu

[PDF] BCV-net e-Code - Anciens Et Réunions

[PDF] BC™ Caméra de recul sans fil BC BC 30

[PDF] bd - coefficients binomiaux - Anciens Et Réunions

[PDF] BD - Inria

[PDF] BD - Quartier lointain - Insuf-FLE

[PDF] BD - Saint

[PDF] BD - «La femme est l`avenir de l`homme

[PDF] BD : Malade d`amour, à Bombay

[PDF] BD Astérix, Lucky Luke et Tintin - Figurines

[PDF] BD de filles - Pointe

[PDF] Bd de Strasbourg BP 465 – MORONI - Comores - Gestion De Projet

[PDF] BD drôles et drôles de BD - Détail Club BD du mercredi 30 - Anciens Et Réunions

[PDF] The Solution Distribution of Influence Maximization