Taming the network: Finding relationships in complex data sets

WHAT BRINGS PEOPLE TOGETHER IN ONLINE NETWORKS? Researchers (and advertisers) would like to know, but without access to personal profiles, the question is not easy. Finding previously undetected relationships in networks and complex data sets is one of the major challenges in the age of “big data.”

Now Assistant Professor Emmanuel Abbe and his collaborators have come up with a new way of thinking about networks to accomplish this task. Not limited to exploring social communities, the technique can tackle significant challenges such as determining which genes work together to increase your cancer risk or how to identify objects — such as chairs or puppies — in a collection of digital images.

The method involves examining whether members of a network are connected by looking at how many common “friends” they have, how many common friends those friends have, and so on. Using this information, the researchers construct a set of statistics that can predict who is in the same sphere.

The approach extracts the “signals” of communities amid a background of “noisy” connections. Abbe’s method is analogous to work by Claude Shannon, sometimes called the father of information theory, who showed that noise imposes a limit to the rate at which data can be transmitted with almost zero error. Abbe has shown that there is an analogous limit to the problem of recovering communities from large data sets.

“Once we understood that there is a fundamental limit to this problem, there was a clear line of sight for how to solve it,” said Abbe, a member of Princeton’s Department of Electrical Engineering and Program in Applied and Computational Mathematics. Abbe and Colin Sandon, a graduate student in the Department of Mathematics, put the method to the test by examining political blogs, some right-leaning and others left-leaning, that sprung up prior to the 2004 presidential election. They asked, if you knew which blogs were referring to each other, but had zero information about the content of the blog, could you figure out which blogs are run by Republicans and which ones by Democrats? “We were able to identify 95 percent of the blogs that we looked at as left- or right-leaning,” Abbe said.

The work was published in the Proceedings of the Annual Symposium on Foundations of Computer Science in 2015. Abbe received the prestigious Bell Labs Prize in 2014 for his research contributions.

–By Catherine Zandonella

Download PDF

COMPUTER SCIENCE: Internet traffic moves smoothly with Pyretic

60_Hudson_StreetAT 60 HUDSON ST. IN LOWER MANHATTAN, a fortress-like building houses one of the Internet’s busiest exchange points. Packets of data zip into the building, are routed to their next destination, and zip out again, all in milliseconds. Until recently, however, the software for managing these networks required a great deal of specialized knowledge, even for network experts.

Now, computer scientists at Princeton have developed a programming language called Pyretic that makes controlling the flow of data packets easy and intuitive — and more reliable. The new language is part of a trend known as Software-Defined Networking, which gives a network operator direct control over the underlying switches that regulate network traffic.

“In order to make these networks work, we have to be able to program them effectively, to route traffic to the right places, and to balance the traffic load effectively across the network instead of creating traffic jams,” said David Walker, professor of computer science, who leads the project with Jennifer Rexford, the Gordon Y.S. Wu Professor of Engineering and professor of computer science. “Pyretic allows us to make sure packets of information get to where they are going as quickly, reliably and securely as possible.”

Pyretic is open-source software that uses the Python programming language and lowers the barrier to managing network switches, routers, firewalls and other components of a network. Since its initial release in April 2013, the community of developers who are using the language to govern networks has grown quickly.

Additional contributors include Associate Research Scholar Joshua Reich and graduate student Christopher Monsanto of Princeton’s Department of Computer Science as well as Nate Foster, an assistant professor of computer science at Cornell University. The project received support from the U.S. Office of Naval Research, the National Science Foundation and Google.

-By Catherine Zandonella

Computer visions: A selection of research projects in Computer Science

Princeton’s Department of Computer Science has strong groups in theory, networks/systems, graphics/vision, programming languages, security/policy, machine learning, and computational biology. Find out what the researchers have been up to lately in these stories:

Computer VisionsArmchair victory: Computers that recognize everyday objects

JIANXIONG XIAO TYPES “CHAIR” INTO GOOGLE’S search engine and watches as hundreds of images populate his screen. He isn’t shopping — he is using the images to…

 

 

Discovery2014_Computer_flower_mediumTools for the artist in all of us

FROM TRANSLATING FOREIGN LANGUAGES to finding information in minutes, computers have extended our productivity and capability. But can they make us better artists?

 

 

ArtFierce, fiercer, fiercest: Software enables rapid creations

A NEW SOFTWARE PROGRAM MAKES IT EASY for novices to create computer-based 3-D models using simple instructions such as “make it look scarier.” The software could be useful for…

 

 

60_Hudson_StreetInternet traffic moves smoothly with Pyretic

AT 60 HUDSON ST. IN LOWER MANHATTAN, a fortress-like building houses one of the Internet’s busiest exchange points. Packets of data zip…

 

 

Heartbleed bugSecurity check: A strategy for verifying software that could prevent bugs

IN APRIL 2014, INTERNET USERS WERE SHOCKED to learn of the Heartbleed bug, a vulnerability in the open-source software used to encrypt Internet content and passwords…