The phenomena of influence propagation through social networks have attracted a great body of research works. A key function of an online social network (OSN), besides sharing, is that it enables users to express their personal opinions about a product or trend of news by means of posts, likes/dislikes, etc. Such opinions are propagated to other users and might make a significant influence on them, either positive or negative.
Real world is full of imprecision and uncertainty and this fact necessarily impacts on OSN data. In fact, social interactions can not be always precise and certain, also, OSN allows only a limited access for their data which generates more imprecision and uncertainty. Then, if we ignore this imperfection, we may be confronted to erroneous analysis results. In such a situation, the theory of belief functions [1, 2] have been widely applied. Furthermore, this theory was used for analyzing social networks [3–5].
Influence maximization (IM) is the problem of finding a set of k seed nodes that are able to influence the maximum number of nodes in the social network. In the literature, we find many solutions for the IM problem. Kempe et al. [6] propose two propagation simulation models which are the Linear Threshold Model (LTM) and the Independent Cascade Model (ICM). Besides, the credit distribution (CD) model [7] is a data based approach that investigates past propagation to detect influencers. However, these solutions does not consider the user’s opinion. Zhang et al. [8] propose an opinion based cascading model that considers the user’s opinion about the product. However, their work is not based on real word data to estimate user’s opinion and influence.
In this paper, we propose a new data based model for influence maximization in online social networks that searches to detect influencer users that adopt a positive opinion about the product. The proposed model is data based because we use past propagation to estimate the influence, and users messages to estimate the opinion. Besides, it uses the theory of belief functions to estimate the influence to deal with data imprecision and uncertainty. To the best of our knowledge, the proposed model is the first evidential data based model that maximizes the influence on OSN, detects influencer users having a positive opinion about the product and uses the theory of belief functions to process the data imperfection.
The remainder of this paper is organized as follows: section 2 introduces the proposed model for maximizing the positive opinion influence, section 3 shows the performance of our model through some relevant experiments. Finally, the paper is concluded in section 4.
In this section, we present our positive opinion influence measure and the proposed influence maximization algorithm.
2.1 Influence measure
Given a social network G = (V, E), a frame of discernment expressing opinion for positive, Neg for negative and Obj for objective, a frame of discernment expressing influence and passivity
for influencer and P for passive user, a probability distribution
defined on
that expresses the opinion of the user
about the product and a basic belief assignment (BBA) function [2],
, defined on
that expresses the influence that exerts the user u on the user v. The first step of the influence maximization process is to measure the influence of each user in the network. Then we propose an influence measure to estimate the positive influence of each user in the network.
The mass value measures the influence of u on v but without considering the opinion of u about the product. We define the positive opinion influence of u on v as the positive proportion of
and we measure this proportion as:
Next, we define the amount of influence given to a set of nodes for influencing a user
. We estimate the influence of S on a user v as follows:
such that and
is the set of nodes in the indegree of v. Finally, we define the influence spread
under the evidential model as the total influence given to
from all nodes in the social network as
. In the spirit of the IM problem, as defined by Kempe et al. [6],
is the objective function to be maximized.
2.2 Influence maximization
In this section, we present the evidential positive opinion influence maximization model. Its purpose is to find a set of nodes S that maximizes the objective function . Given a directed social network G = (V, E), an integer
, the goal is to find a set of users
, that maximizes
. We proved that
is monotone and sub-modular, also the influence maximization under the proposed model is NP-Hard. However, the number of pages limitation prevents us to present proofs in detail.
The influence maximization under the evidential positive opinion influence maximization model is NP-Hard, consequently, the greedy algorithm performs good approximation for the optimal solution especially when we use it with this formula:
that computes the marginal gain of a candidate node x. We choose the cost effec-tive lazy-forward algorithm (CELF) [9] which is a two pass modified greedy algorithm. It exploits the sub-modularity property of the objective function, also, it is about 700 times faster then the basic greedy algorithm. The CELF based evidential influence maximization algorithm starts by estimating the marginal gain of all users in the network and sorts them according to their marginal gain, then, it selects the user that have the maximum marginal gain and add it to S (seed set). After that, the algorithm iterates on the following steps until getting |S| = k: 1) Choose the next user in the list, 2) Update its marginal gain (formula (3)), and 3) If the chosen node keeps its position in the list (it still the maximum) then add it to S
In this section, we conduct some experiments on real world data. We used the library Twitter4jwhich is a java implementation of the Twitter API to collect Twitter data. We crawled the Twitter network for the period between 08/09/2014 and 03/11/2014, and we filtered our data by keeping only tweets that talk about smartphones and users that have at least one tweet in the data base. To estimate the opinion polarity of each tweet in our data set, first, we used the java library “Stanford POS Tagger”
with the model “GATE Twitter part-of-speech tagger”
that were designed for tweets. This step gives a tag (verb, noun, etc) to each word in the tweet. After, we estimated the opinion polarity of each tweet using the SentiWordNet 3.0
dictionary and tags from the first step. We estimated
using the network structure and past propagation between u and v. First, we calculated the number of common neighbors between u and v, the number of tweets where u mentions v and the number of tweets where v retweets from u. After we used the process defined by Wei et al. [10] to estimate a BBA for each defined variable. Finally we combine the resulting BBAs to obtain
. In this section, we call belief model: our model in which we use
as influence measure, CD model: the credit distribution model and opinion model: the proposed positive opinion based model.
The goal of the first experiment is to show that the proposed model detects well influencer spreaders. To examine the quality of the selected seeds, we fixed four comparison criteria which are: the number of followers, #Follow, the number of tweets, #Tweet, the number of times the user was mentioned and retweeted, #Mention and #Retweet. In fact, we assume that if a user is an influencer on Twitter he would be necessarily: very active so he has a lot of tweets, he is followed by many users in the network, he is frequently mentioned and his tweets are retweeted several times. In Figure (1), we compare the maximization results of the proposed opinion model with CD model and belief model according to the fixed criteria. Figure (1) shows the performance of the proposed model against CD model and belief model. In fact, we see that the proposed opinion model detects influencer that have many followers (more than 8000 for 50 influencer), many tweets (over 250 for 50 users), many mentions (about 1200) and many retweets (about 800). However, users detected using the belief model have only two good criteria, i.e. #Follow (over 8000 follower for 50 users) and #Tweet (over 150 tweets for 50 users), and the CD model does not satisfy any criteria. This shows that, the opinion model is the best in detecting influencers.
In a second experiment, we calculated the mean positive opinion of the first 100 influencer user. The proposed model performed well by selecting influencers that have a positive opinion about the product. In fact, it gives a mean positive opinion equals to 0.89 (confidence interval). However, the belief model gives 0.34 (
) and the CD model gives only 0.09 (
). These results show the performance of the proposed model in selecting influencer users that have a positive opinion against the belief and the CD models that have not.
In this paper, we proposed a new influence measure that estimates the positive opinion influence of OSN users. We used the theory of belief functions to
Figure 1: Comparison between opinion model, belief model and CD model according to #Follow, #Mention, #Retweet and #Tweet
deal with the problem of data imperfection. In future works, we will search to improve the proposed influence maximization model by considering other parameters like the user’s profile and the propagation time.
[1] A. P. Dempster, “Upper and Lower probabilities induced by a multivalued mapping,” Annals of Mathematical Statistics, vol. 38, pp. 325–339, 1967.
[2] G. Shafer, A mathematical theory of evidence. Princeton University Press, 1976.
[3] S. Jendoubi, A. Martin, L. Lietard, and B. B. Yaghlane, “Classification of message spreading in a heterogeneous social network,” in IPMU, July 2014, pp. 66–75.
[4] S. Jendoubi, A. Martin, L. Lietard, B. Ben Yaghlane, and H. Ben Hadj, “Dynamic time warping distance for message propagation classification in twitter,” in ECSQARU, July 2015, pp. 419–428.
[5] K. Zhou, A. Martin, and Q. Pan, “A similarity-based community detection method with multiple prototype representation,” Physica A, vol. 438, pp. 519–531, November 2015.
[6] D. Kempe, J. Kleinberg, and E. Tardos, “Maximizing the spread of influence through a social network,” in , August 2003, pp. 137–146.
[7] A. Goyal, F. Bonchi, and L. V. S. Lakshmanan, “A data-based approach to social influence maximization,” in VLDB Endowment, August 2012, pp. 73–84.
[8] H. Zhang, T. N. Dinh, and M. T. Thai, “Maximizing the spread of positive influence in online social networks,” in ICDCS, July 2013, pp. 317–326.
[9] J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance, “Cost-effective outbreak detection in networks,” in , August 2007, pp. 420–429.
[10] D. Wei, X. Deng, X. Zhang, Y. Deng, and S. Mahadeven, “Identifying influential nodes in weighted networks based on evidence theory,” Physica A, vol. 392, no. 10, pp. 2564–2575, Mai 2013.