Taming the network: Finding relationships in complex data sets

WHAT BRINGS PEOPLE TOGETHER IN ONLINE NETWORKS? Researchers (and advertisers) would like to know, but without access to personal profiles, the question is not easy. Finding previously undetected relationships in networks and complex data sets is one of the major challenges in the age of “big data.”

Now Assistant Professor Emmanuel Abbe and his collaborators have come up with a new way of thinking about networks to accomplish this task. Not limited to exploring social communities, the technique can tackle significant challenges such as determining which genes work together to increase your cancer risk or how to identify objects — such as chairs or puppies — in a collection of digital images.

The method involves examining whether members of a network are connected by looking at how many common “friends” they have, how many common friends those friends have, and so on. Using this information, the researchers construct a set of statistics that can predict who is in the same sphere.

The approach extracts the “signals” of communities amid a background of “noisy” connections. Abbe’s method is analogous to work by Claude Shannon, sometimes called the father of information theory, who showed that noise imposes a limit to the rate at which data can be transmitted with almost zero error. Abbe has shown that there is an analogous limit to the problem of recovering communities from large data sets.

“Once we understood that there is a fundamental limit to this problem, there was a clear line of sight for how to solve it,” said Abbe, a member of Princeton’s Department of Electrical Engineering and Program in Applied and Computational Mathematics. Abbe and Colin Sandon, a graduate student in the Department of Mathematics, put the method to the test by examining political blogs, some right-leaning and others left-leaning, that sprung up prior to the 2004 presidential election. They asked, if you knew which blogs were referring to each other, but had zero information about the content of the blog, could you figure out which blogs are run by Republicans and which ones by Democrats? “We were able to identify 95 percent of the blogs that we looked at as left- or right-leaning,” Abbe said.

The work was published in the Proceedings of the Annual Symposium on Foundations of Computer Science in 2015. Abbe received the prestigious Bell Labs Prize in 2014 for his research contributions.

–By Catherine Zandonella

Download PDF