Estimation of a Low-rank Topic-Based Model for Information Cascades


We consider the problem of estimating the latent structure of a social network based on the observed information diffusion events, or cascades. Here for a given cascade, we only observe the times of infection for infected nodes but not the source of the infection. Most of the existing work on this problem has focused on estimating a diffusion matrix without any structural assumptions on it. In this paper, we propose a novel model based on the intuition that an information is more likely to propagate among two nodes if they are interested in similar topics which are also prominent in the information content. In particular, our model endows each node with an influence vector (which measures how authoritative the node is on each topic) and a receptivity vector (which measures how susceptible the node is for each topic). We show how this node-topic structure can be estimated from the observed cascades and prove an analytical upper bound on the estimation error. The estimated model can be used to build recommendation systems based on the receptivity vectors, as well as for marketing based on the influence vectors. Experiments on synthetic and real data demonstrate the improved performance and better interpretability of our model compared to existing state-of-the-art methods.

Journal of Machine Learning Research
Ming Yu
Ming Yu
PhD (2016-2020)

Ming received his PhD in Econometrics and Statistics at University of Chicago, Booth School of Business in March 2020. His research interests include high dimensional statistical inference, non-convex optimization, and reinforcement learning, with a focus on developing novel methodologies with both practical applications and theoretical guarantees.

Mladen Kolar
Mladen Kolar
Associate Professor of Econometrics and Statistics

Mladen Kolar is an Associate Professor of Econometrics and Statistics at the University of Chicago Booth School of Business. His research is focused on high-dimensional statistical methods, graphical models, varying-coefficient models and data mining, driven by the need to uncover interesting and scientifically meaningful structures from observational data.