Bias in the machine: Internet algorithms reinforce harmful stereotypes

THE ARTIFICIAL-INTELLIGENCE (AI) SYSTEMS that suggest our search terms and otherwise determine what we see online rely on data that can be biased against women and racial and religious groups, according to a study led by researchers in Princeton’s Center for Information Technology and Policy (CITP).

As machine learning and AI algorithms become more ubiquitous, this phenomenon could inadvertently cement and amplify bias that is already present in our society or a user’s mind, according to the study, which was led by Arvind Narayanan, an assistant professor of computer science. The paper was posted in August 2016 on the preprint server arXiv.

The team found that the algorithms tended to associate domestic words more with women than men, and associated negative terms with the elderly and certain races and religions. “For just about every kind of bias that’s been documented in people, including gender stereotypes and racial prejudice, we were able to replicate it in today’s machine-learning models,” said Narayanan, who worked with CITP postdoctoral research associate Aylin Caliskan-Islam and Joanna Bryson, a professor of computer science at the University of Bath and a visiting scholar at CITP.

Machine-learning algorithms build models of language by exploring how words are used in context — for example, by combing all of Wikipedia or gigabytes of news clippings. Each time the model learns a word, it gives that word a series of geometric coordinates that correspond to a position in a many-dimensional constellation of words. Words that are frequently found near each other are given nearby coordinates, and the positions reflect the words’ meanings.

Biases develop as a result of the positions of these words. If the text used to train the model Bias in the machine: Internet algorithms reinforce harmful stereotypes more often associates “doctor” with words relating to men, ambition and medicine, while linking “nurse” to words related to women, nurturing and medicine, the model would come to assume that “nurse” is feminine, possibly even the feminine version of the masculine “doctor.”

To measure biases in algorithm results, the researchers adapted a test long used to reveal implicit bias in human subjects, the Implicit Association Test, for use on the language models. The human version of the test measures how long it takes a subject to associate words such as “evil” or “beautiful” with names and faces of people from different demographics. Thanks to the geometric model of language that machine-learning algorithms use, their biases can actually be measured more directly by simply finding the distance between the name of a group and positive, negative or stereotypical words.

Such biases can have very real effects. For example, in 2013 researchers at Harvard University led by Latanya Sweeney noted that African American-sounding names were far more likely to be paired with ads for arrest records. Such experiences could lead to unintentional discrimination when, say, a potential employer searches the internet for an applicant’s name.

“AI is no better and no worse than we are,” Bryson said. “However, we can continue to learn, but the machine learning for an AI program might be turned off, freezing it in a prejudiced state.” If we can measure this bias, however, Narayanan said, we can take steps to mitigate it. This could mean mathematically correcting a language model’s bias or simply being aware of the algorithms’ faults — and our own. –By Bennett McIntosh

COMPUTER SCIENCE: Armchair victory: Computers that recognize everyday objects

Computer VisionsJIANXIONG XIAO TYPES “CHAIR” INTO GOOGLE’S search engine and watches as hundreds of images populate his screen. He isn’t shopping — he is using the images to teach his computer what a chair looks like.

This is much harder than it sounds. Although computers have come a long way toward being able to recognize human faces in photos, they don’t do so well at understanding the objects in everyday 3-D scenes. For example when Xiao, an assistant professor of computer science at Princeton, tested a “state-of-the-art” object recognition system on a fairly average-looking chair, the system identified the chair as a Schipperke dog.

The problem is that our world is filled with stuff that computers find distracting. Tables are cluttered with the day’s mail, chairs are piled with backpacks and draped with jackets, and objects are swathed in shadows. The human brain can filter out these objects but computers falter when they encounter shadows, clutter and occlusion by other objects. Improving software for object recognition has many benefits, from better ways to analyze security-camera images to computer vision systems for robots. “We start with chairs because they are the most common indoor objects,” Xiao said, “but of course our goal is to dissect complex scenes.”

Xiao has developed an approach to teaching computers to recognize objects that he likes to call a “big 3-D data” approach because he feeds a large number of examples into the computer to teach it what an object — in this case a chair — looks like.

The chairs that he uses as training data are not pictures of chairs, nor are they the real thing. They are three-dimensional models of chairs, created with computer graphics (CG) techniques, that are available from 3D Warehouse, a free service that allows users to search and share 3-D models. With help from graduate student Shuran Song, Xiao scans the 3-D models with a virtual camera and depth sensor that maps the distance to each point on the chair, creating a depth map for each object. These depth maps are then converted into a collection of points, or a “point cloud,” that the computer can process and use to learn the shapes of chairs.

The advantage of using CG chairs rather than the real thing, Xiao said, is that the researchers can rapidly record the shape of each chair from hundreds of different viewing angles, creating a comprehensive database about what makes a chair a chair. They can also do hundreds of chairs of various shapes — including office chairs, sofa chairs, kitchen chairs and the like. “Because it is a CG chair and not a real object, we can put the sensor wherever we need,” Xiao said.

For the technique to work, the researchers also must help the computer learn what a chair is not like. For this, the researchers use a 3-D depth sensor, like the one found in the Microsoft Kinect sensor for the Xbox 360 video game console, to capture depth information from real-world, non-chair objects such as toilets, tables and washbasins.

Like a child learning to do arithmetic by making a guess and then checking his or her answers, the computer studies examples of chairs and non-chairs to learn the key differences between them. “The computer can then use this knowledge to tell not only whether an object is a chair but also what type of chair it is,” Song said.

Once this repository of chair knowledge is built, the researchers put it into use to search for chairs in everyday scenes. To improve the accuracy of the scanning, the researchers built a virtual “sliding shapes detector” to skim slowly over the scenes, like a magnifying glass skimming over a picture, only in three dimensions, looking for structures that the computer has learned are associated with different types of chairs.

Because the computer did its training on such a large and comprehensive database of examples, the program can detect chairs that are partially blocked by tables and other objects. The researchers can also spot chairs that are cluttered with stuff because the computer program can subtract away the clutter. The detection of objects using depth information rather than color, which is what determines shapes in most photos, allows the computer to ignore the effect of shadows on the objects.

In tests, the new method significantly outperformed the state-of-the-art systems on images, Xiao said. The same technology can also be generalized to other object categories, such as beds, tables and sofas. The researchers presented the work, titled “Sliding shapes for 3D object detection in RGB-D images,” at the 2014 European Conference on Computer Vision.

-By Catherine Zandonella

COMPUTER SCIENCE: Tools for the artist in all of us

Art

FROM TRANSLATING FOREIGN LANGUAGES to finding information in minutes, computers have extended our productivity and capability. But can they make us better artists?

Researchers in the Department of Computer Science are working on ways to make it easier to express artistic creativity without the painstaking hours spent learning new techniques. “Computers are making it faster and easier for beginners to do a lot of things that are time-consuming,” said Jingwan (Cynthia) Lu, who earned her Ph.D. at Princeton in spring 2014. “I’m interested in using computers to handle some of the more tedious tasks involved in the creation of art so that humans can focus their talents on the creative process.”

The techniques that Lu is creating are far more versatile than the simple drawing and painting tools that come pre-installed on most computers, yet they are much easier to use than the software marketed to artists and designers. “Lu has created tools that enable artistic expression by leveraging the use of computation,” said Professor of Computer Science Adam Finkelstein, Lu’s dissertation adviser.

Last year, Lu introduced RealBrush, a project that permits people to paint on a computer using a variety of media, ranging from traditional paints to unconventional materials such as glittered lip gloss. The software contained a library of photographs of real paint strokes. As the artist painted on a tablet or touch screen, the software pieced together the stored paint strokes.

This year, Lu has introduced two new techniques that further her goal of making it easy to create art digitally:

decoBrush

decoBrush allows the user to create floral designs and other patterns such as those found as borders on invitations and greeting cards. Many design programs offer such borders but they come in set shapes and are not easy to customize, requiring a designer to painstakingly manipulate individual curves and shapes.

With decoBrush, users can create highly structured patterns simply by choosing a style from a gallery and then sketching curves to form the intended design or layout. The decoBrush software transforms the sketched paths into structured patterns in the style chosen. For example, a user might select a floral pattern and then sketch a heart, creating a heart with a floral border.

The challenge for Lu and her codevelopers was to guide the computer to learn existing decorative structured patterns and then apply automatic algorithms to replace the tedious process of manipulating the individual curves and shapes.

“Given a target path such as a sketch that the pattern should follow, the computer copies, alters and merges segments of existing pre-designed patterns, which we call ‘exemplars,’ to compose a new pattern,”  Lu said. “It does this by searching for candidate segments that have similar curviness to the target sketch that the user drew. The candidate segments are then copied and merged using a specialized texture synthesis algorithm that transforms the curves to align with each other seamlessly at the segment boundaries.”

Lu constructed decoBrush with assistance from Connelly Barnes, who earned his doctorate degree from Princeton in 2011 and is now at the University of Virginia; undergraduate Connie Wan, Class of 2014; and Finkelstein. She also collaborated with Paul Asente and Radomir Mech of Adobe Research, where Lu interned for three summers and now works as a researcher. Lu presented decoBrush at the Association of Computer Machinery Siggraph Conference in August 2014.

RealPigment

A second project enables artists and novices to explore mixing of colors in digital painting, with the goal of making the digital results more faithful to the physical behaviors of paints.

Software programs for painting are not adept at combining colors, especially when they are simulating complex media such as oil paints or watercolors. One of the most common techniques for combining colors, alpha blending, estimates that yellow and blue make gray rather than green. Lu and her colleagues came up with a different method for figuring out how colors will blend using techniques borrowed from real-world (non-digital) painting.

The researchers use color charts that artists make to find out what color arises when overlaying or mixing two colors of paint. Making these color charts involves painting rows of one color, and then overlaying them with columns each containing a different color. The resulting grid reveals how all pairs of color will look when layered. Similar charts can be made for mixed rather than layered colors.

Lu’s approach is to feed these color charts into the computer to teach it how to combine colors in a specific medium, such as oil paints or watercolors. “The goal is to learn from existing charts to predict the result of compositing new colors,” Lu said. “We apply simplifying assumptions and prior knowledge about pigment properties to reduce the number of learning parameters, which allows us to perform accurate predictions with limited training data.”

Lu’s research was supported by a Siebel Fellowship and funding from Google. The project included Willa Chen, Class of 2013; Stephen DiVerdi of Google; Barnes and Finkelstein. The work was presented at the June 2014 International Symposium on Non-Photorealistic Animation and Rendering.

COMPUTER SCIENCE: Fierce, Fiercer, Fiercest: Software enables rapid creations

Art

A NEW SOFTWARE PROGRAM MAKES IT EASY for novices to create computer-based 3-D models using simple instructions such as “make it look scarier.” The software could be useful for building models for 3-D printing and designing virtual characters for video games.

The program, called AttribIt, allows users to drag and drop building blocks of a 3-D shape from a menu. Next the user can adjust the characteristics of the model — making it “scarier” or “sleeker” for example — by sliding a bar at the bottom of the screen.

“We wanted to create a program that could be used by people who don’t have any training in computer graphics or design,” said Siddhartha Chaudhuri, a lecturer at Cornell University who co-wrote the software while a postdoctoral researcher at Princeton with Professor of Computer Science Thomas Funkhouser as well as University of Massachusetts-Amherst Assistant Professor Evangelos Kalogerakis and graduate student Stephen Giguere.

“The challenge was to build a tool that could create a model — such as an intricate animal with claws and ears — with only simple commands and common adjectives instead of the complex geometric commands found in most other 3-D design programs,” Funkhouser said.

AttribIt makes new objects by combining parts from repositories of previously made models, Chaudhuri explained. The parts in the repository have been ranked for their “scariness,” “gracefulness” and other everyday adjectives using machine learning algorithms trained on feedback from anonymous volunteers.

The rankings are based on crowd-sourced training data from Amazon Mechanical Turk, an online research platform. Random participants are asked to view two shapes and say which one is scarier. The AttribIt software then builds a model from these value judgments that predicts the relative “scariness” of any shape.

“For example, given a bunch of animal heads, the software assigns each a number which expresses how ‘scary’ it thinks that head is,” Chaudhuri said. “You can sort the animal heads by this predicted scariness to get a sequence that goes from bunnies to velociraptors.”

The researchers tested AttribIt on users who had no prior 3-D modeling experience, including Chaudhuri’s 11-year old nephew. “People were very good at creating models in a very short amount of time,” Chaudhuri said.

In addition to creating 3-D models, the approach can be used in design tasks such as making a website look “more artistic.” The research was supported by funding from Google, Adobe, Intel and the National Science Foundation and was presented at the Association for Computing Machinery Symposium on User Interface Software and Technology in October 2013.

-By Catherine Zandonella

COMPUTER SCIENCE: Internet traffic moves smoothly with Pyretic

60_Hudson_StreetAT 60 HUDSON ST. IN LOWER MANHATTAN, a fortress-like building houses one of the Internet’s busiest exchange points. Packets of data zip into the building, are routed to their next destination, and zip out again, all in milliseconds. Until recently, however, the software for managing these networks required a great deal of specialized knowledge, even for network experts.

Now, computer scientists at Princeton have developed a programming language called Pyretic that makes controlling the flow of data packets easy and intuitive — and more reliable. The new language is part of a trend known as Software-Defined Networking, which gives a network operator direct control over the underlying switches that regulate network traffic.

“In order to make these networks work, we have to be able to program them effectively, to route traffic to the right places, and to balance the traffic load effectively across the network instead of creating traffic jams,” said David Walker, professor of computer science, who leads the project with Jennifer Rexford, the Gordon Y.S. Wu Professor of Engineering and professor of computer science. “Pyretic allows us to make sure packets of information get to where they are going as quickly, reliably and securely as possible.”

Pyretic is open-source software that uses the Python programming language and lowers the barrier to managing network switches, routers, firewalls and other components of a network. Since its initial release in April 2013, the community of developers who are using the language to govern networks has grown quickly.

Additional contributors include Associate Research Scholar Joshua Reich and graduate student Christopher Monsanto of Princeton’s Department of Computer Science as well as Nate Foster, an assistant professor of computer science at Cornell University. The project received support from the U.S. Office of Naval Research, the National Science Foundation and Google.

-By Catherine Zandonella

COMPUTER SCIENCE: Security check: A strategy for verifying software that could prevent bugs

HeartbleedIN APRIL 2014, INTERNET USERS WERE SHOCKED to learn of the Heartbleed bug, a vulnerability in the open-source software used to encrypt Internet content and passwords. The bug existed for two years before it was discovered.

Detection of vulnerabilities like Heartbleed is possible with a new approach pioneered by Andrew Appel, the Eugene Higgins Professor of Computer Science. With funding from the Defense Advanced Research Projects Agency (DARPA) and the National Science Foundation, Appel has developed a strategy for verifying software to ensure that it is performing correctly, and the technique could be applied to the Internet’s widely used encryption system, known as “Secure Sockets Layer.”

“The point is that formal program verification of correctness is now becoming feasible,” said Appel. “The downside of the approach is the expense. But for important and widely used software, it may be less expensive than the consequences of not doing it.”

-By Catherine Zandonella

Computer visions: A selection of research projects in Computer Science

Princeton’s Department of Computer Science has strong groups in theory, networks/systems, graphics/vision, programming languages, security/policy, machine learning, and computational biology. Find out what the researchers have been up to lately in these stories:

Computer VisionsArmchair victory: Computers that recognize everyday objects

JIANXIONG XIAO TYPES “CHAIR” INTO GOOGLE’S search engine and watches as hundreds of images populate his screen. He isn’t shopping — he is using the images to…

 

 

Discovery2014_Computer_flower_mediumTools for the artist in all of us

FROM TRANSLATING FOREIGN LANGUAGES to finding information in minutes, computers have extended our productivity and capability. But can they make us better artists?

 

 

ArtFierce, fiercer, fiercest: Software enables rapid creations

A NEW SOFTWARE PROGRAM MAKES IT EASY for novices to create computer-based 3-D models using simple instructions such as “make it look scarier.” The software could be useful for…

 

 

60_Hudson_StreetInternet traffic moves smoothly with Pyretic

AT 60 HUDSON ST. IN LOWER MANHATTAN, a fortress-like building houses one of the Internet’s busiest exchange points. Packets of data zip…

 

 

Heartbleed bugSecurity check: A strategy for verifying software that could prevent bugs

IN APRIL 2014, INTERNET USERS WERE SHOCKED to learn of the Heartbleed bug, a vulnerability in the open-source software used to encrypt Internet content and passwords…