If you tell Latanya Sweeney, A.L.B. ’95, nothing about yourself except your birth date and five-digit zip code, she’ll tell you your name. If you are under the age of 30 and tell her where you were born, she can correctly predict eight or nine digits of your nine-digit Social Security number. “The main reason privacy is a growing problem is that disk storage is so cheap,” says the visiting professor of computer science, technology, and policy at CRCS. “People can collect data and never throw anything away. Policies on data sharing are not very good, and the result is that data tend to flow around and get linked to other data.”
Sweeney became interested in privacy issues while earning her doctorate at MIT in the mid 1990s. Massachusetts had recently made “anonymized” medical information available. Such data are invaluable for research, for setting up early infectious-disease detection systems, and other public-health uses. “There was a belief at the time that if you removed explicit identifiers—name, address, and Social Security number—you could just give the data away,” she recalls. That dogma was shattered when Sweeney produced a dramatic proof to the contrary.
The medical data that had been made available included minimal demographic information: zip code, birth date, and gender, in addition to the diagnosis. So Sweeney went to the Cambridge City Hall and for $25 purchased a voter list on two diskettes: 54,000 names. By linking the demographic information in the voter database to the demographic information in the publicly available medical records, Sweeney found that in most cases she could narrow the demographic data down to a single person, and so restore the patient’s name to the record. She tried this data-linking technique for then-governor William F. Weld ’66, J.D.’70. Only six people in Cambridge shared his birthday. Just three of them were men. And he was the only one who lived in the right zip code.
Thursday, September 10, 2009
At Harvard Magazine, Jonathan Shaw on the erosion of privacy in the age of the internet: