The most common way of processing evolving networks is by assuming they are edge streams.
Recommended for you
To detect node events we use a sliding window strategy based on time instants. Thus, as time flies, the oldest stream objects are forgotten and only the most recent edges are considered for updating centrality values. In the following, we formally describe this process. According to our definition, for the same discrete time instant t the edge stream can have many edge stream objects.
For example, on a Twitter interaction network, considering 1-day time instants, we can receive several edge stream objects per day. This window strategy is a good choice as it allows for the detection of node events i without much processing effort, ii taking advantage of scoring functions semantics and iii considering the rapidly evolving characteristic of online social networks.
Remark that the window slides over two structures: edge stream objects and summary values. The stream objects are nothing more than the network evolving over time. Thus, having a sliding window over such objects means that centrality metrics used for event detection will always be calculated on an upgraded network, where old edges are discarded. In the same way, values summarized in memory during stream processing are being forgotten as they become older and leave the window cover. As we will present, the summarization is also done in function of time instants.
Computing centrality values On line 6 we update node centralities values in function of the new incoming edge. Summarizing values Each change-point scoring function requires different statistics summarized in memory.
Temporal Patterns of Communication in Social Networks (Springer Theses)
In this way, line 7 calls a computation referent to current values at t and line 15 refreshes values by forgetting old statistics outside the sliding window and computing average past values. Remarking on NodeEventDetection time complexity analysis, the most costly operation is on line 6 when computing centrality values.
For ranking score, there is the additional cost of O VlogV to order the ranking. In fact, we address this issue as future work see Sect. In order to correlate user preferences changes and node events in temporal social networks we need a dataset 1 containing the information of when links occur in the network temporal network topology and 2 some semantic information from which user preferences can be extracted network content. We chose two datasets to perform experiments, one based on Twitter data and the second based on the social music website This Is My Jam.
In this work we follow this trend to profile users by applying LDA as we do not have explicit preferences elicited in our dataset. Every interaction or retweet between two users is associated with a textual content.
Temporal Patterns of Communication in Social Networks
Based on this corpus we perform LDA to extract 50 topics such that each document tweet is represented by a topic distribution. The extra topics can be considered noise. However, choosing a small k may not separate the information precisely. Examples of some topics identified by LDA from Twitter data and respective keywords manually assigned to them for better interpretability.
To extract pairwise preferences for each user we use the following strategy: if user u tweets or retweets about o at time t , then u has more interest in o over the remaining topics in domain at that moment. In this case, the top posted topic is preferred over others, the second top posted topic is preferred over the remaining ones and so on.
Noteworthy here is that the time t being considered depends on the time granularity in question, which can be of 1 day or 1 month, for instance. Therefore, a user can post many tweets at the same t. As example, let us suppose that John posts 4 times about corruption c , 3 times about sports s , 2 times about politics p and 1 time about international i on time 3. Snapshots of samples of the evolving interaction network. Nodes are Twitter users.
The colors represent topics that users are talking about at t. The samples were built by filtering nodes with degree between 50—22, and edges representing the 4 most popular topics.
Each snapshot corresponds to 1 day time-interval. This figure highlights the edges evolving aspect. Nodes are not evolving for better visualization Color figure online. In this way, the directed edges represent the music influence flow. User preferences were extracted based on music genres. Originally, the TIMJ dataset does not contain jams genre annotations.
From these music tracks, we considered the ground truth CD2 from Schreiber to obtain song-level genre annotations. Only the songs present in the ground truth were taken into account in our analysis. The pairwise preferences for each user are extracted from the current jam genre. If user u posted a jam annotated with genre o at time t , then u clearly prefers o over the remaining genres in the domain at that moment.
Though our analysis is limited to the Twitter news and social music domains due to the availability of public datasets, we expect our results to generalize to other items like movies, videos, books, vacation packages, shopping etc. In both domains, the user preferences were extracted based on the content being shared by the users whereas the temporal networks were built based on the interaction of the users with their friends. Centrality Metrics We consider two centrality measures: betweenness and closeness. These measures have different meanings and our objective is to stress to what extent their evolution correlate with preference changes.
Temporal Patterns of Communication in Social Networks | Ebook | Ellibs Ebookstore
Formally, these nodes should have a small average shortest path length to other nodes. The smaller the average shortest path length, the higher the centrality for the node. The betweenness centrality characterizes how important nodes are in connecting other nodes. For a node v , compute the number of shortest paths between other nodes that pass through v. In this baseline approach, authors also propose to spot change-points on a time-varying graph from which many nodes deviate from their common behavior.
It is the work more related to ours due to two aspects: i the change-point based approach and ii the temporal dynamics of the network. The idea is to characterize a node with several features so that it becomes a multi-dimensional point. Z score is computed in function of the dot-product between the current feature-vector v and a typical feature-behavior r , which is the average of past feature-vectors.
Social network modeling We compare static networks with temporal networks. The difference is that in the temporal scenario we consider temporal paths fastest paths, as discussed in Sect.
In temporal networks the temporal order is taken into account, while in static networks it is not. Note that despite static networks do not consider edges labeled with time instants, they are analyzed over time, considering also the sliding window. The difference between both approaches is essentially that inside the window being analyzed, time instants are considered temporal or not static when computing nodes centrality.
Datasets We vary the time granularity of the social temporal networks Twitter and Jam. In Jam network, time granularities are month, semester and year. In Twitter network, we consider day, week and month.
Thus, in all we have six social networks related with news and music domains. Window size W The solutions we propose for the problem of preference change and node events detection are highly sensitive to the size of the observation window W. We vary the window size with values of 2, 4 and 7 time units. This size is related to the desired semantics we wish to analyze.
If we are interested in tracking short-term events, then short sizes fit better. For instance, preferences over the domains of news or restaurants have a high rate of change. On the other hand, long sizes are more appropriate when the events are not frequent, for example preferences about musics and movies. Twitter-month does not vary for values 4 and 7 because it does not contain more than 3 months. The same occur with jam-year because it is limited to 4 time steps 4 years.
In our experiments we explore how this intensity impacts on correlations with preference changes. Thus, we consider different ranges according to scores. To setup Z values, we varied from 0. After, we chose the following values to conduct the remainder of the experiments based on diversity: 0. For ranking and average, the procedure was the same, varying from 0. Performance evaluation of the algorithm PrefChangeDetection. Runtimes refer to the time elapsed to process all users of the corresponding dataset Color figure online.