Blog Post

What would happen if you hacked into a library?

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

We usually think of university libraries as a bastion of free thought, with scholarly publications that are freely shareable by all, but former Reddit staffer and digital activist Aaron Swartz has been arrested by federal prosecutors and accused of hacking into the library at the Massachusetts Institute of Technology computer network and downloading almost 5 million academic documents. If he is found guilty, Swartz could face up to 35 years in prison and a fine of up to $1 million — penalties that seem inappropriate at best for a crime that appears to have no real victims.

According to the indictment that was filed in Boston (PDF link), the 24-year-old programmer — who is the co-founder of a non-profit political action group called Demand Progress, and also co-authored the RSS specification when he was still a teenager — used a laptop and a number of software tools to hack into the MIT computer system and download more than 4 million scholarly papers and journal archives. The indictment notes that when these alleged offences occurred, Swartz was a fellow at Harvard’s Center for Ethics.

The journals and documents that Swartz is alleged to have downloaded are held in the so-called JSTOR archive, which is a database of thousands of scholarly journals maintained by a non-profit organization created in 1995 to allow institutions to share these publications easily. According to a statement from JSTOR, the organization is not involved in the indictment against Swartz. Its statement says that after it noticed unauthorized access to its documents occurring at MIT late last year:

We stopped this downloading activity, and the individual responsible, Mr. Swartz, was identified. We secured from Mr. Swartz the content that was taken, and received confirmation that the content was not and would not be used, copied, transferred, or distributed.

Although academic institutions such as MIT pay JSTOR an annual fee for maintaining the archive, most of the documents in it are freely available to students and anyone else at an accredited university. So what harm did Swartz cause by downloading these journals and archives? That’s not clear. Demand Progress released a statement about the indictment in which executive director David Segal said that arresting the programmer for doing this was like “trying to put someone in jail for allegedly checking too many books out of the library.” The statement also quotes a librarian at Stanford University as saying Swartz’s prosecution “undermines academic inquiry and democratic principles.”

Aaron Swartz

The federal prosecutors office, of course, seems more interested in the fact that Swartz illegally accessed a computer network — in this case (according to the indictment) by gaining unauthorized access to the MIT computer network’s main server space, where he hooked up a laptop to one of the servers and then hid it under a shelf, and repeatedly changed his computer’s hardware address in order to get around the barriers that JSTOR and the university set up. The indictment also says that the young programmer “intended to distribute a significant portion of JSTOR’s archive of digitized journal articles through one or more file-sharing sites.”

As Jason Kottke noted in an overview of the case, Swartz has shown an interest in doing similar things in the past: in 2009, for example, he downloaded 19 million pages worth of federal documents from the Pacer archive — which was set up by the government in an attempt to improve access to electronic files from the federal courts. Swartz and others wanted to download all of the archives (19 million pages reportedly represented about 20 percent of the total) and then upload or share them online. Swartz also helped develop the technology behind the Open Library project.

The government’s indictment of Swartz is more than a little disturbing, if only because the documents that he allegedly took were academic publications that were freely available to anyone studying at a university — in other words, not commercially or politically sensitive in any way. Even the non-profit organization in charge of this archive declined to proceed with any case against the programmer.

Assuming the federal indictment is correct, what Swartz did seems no more threatening than what Mark Zuckerberg did when he set up a script to download photos from the Harvard computer system to create the precursor to Facebook. It’s certainly nowhere near the kind of espionage that the government is alleging occurred in the case of Wikileaks and the diplomatic cables it published, or the hacking that groups such as Anonymous and Lulzsec are accused of being involved in. What could possibly gained by going after a young programmer for trying to liberate academic research from a library?

(Note: Although Swartz describes himself as a co-founder of the link-sharing community Reddit, Alexis Ohanian noted on Twitter and on Google+ that he and Steve Huffman created the company and then acquired Swartz’s company six months later).

Post and thumbnail photos courtesy of Flickr user Eliot Phillips and Wikimedia Commons

25 Responses to “What would happen if you hacked into a library?”

  1. pixlem

    Mathew – there is an odd meme (if the bank wants to pursue you . . . .) travelling around that the government has no role protecting private property – but of course that is one of the core purposes of government. The police protect your house and your stuff, the courts protect your rights under contracts, laws and the constitution, and the government has much more power to investigate (including through the grand jury) and enforce than do private individuals. This protection of the rule of law (including enforcement) provides us with essential freedoms – whether in planning (you know that if you put your money in a bank they have to give it back; you know someone can’t just take your car; etc.) So why should the bank have to engage it’s own private police force (or lawyers) to protect the rules? If your beef is with the rules – then advocate to change them. In principle, JSTOR should be allowed to pursue its business model of paying for its costs through user fees (it is a not-for-profit). We don’t get to legislate individually . . . we do it as part of a group. Even civil disobedience theory says that you accept the criminal penalty – but use the story to militate that the laws change.

  2. Gary D

    It’s like Operation Sundevil all over again. I thought the Secret Service no longer had jurisdiction over hacking “crimes” after the EFF sued their pants off 20 years ago. Swartz must have pissed someone off & this is his payback.

  3. Frankly, I’m disappointed that he didn’t get to make a torrent (or series thereof) out of what he got.
    JSTOR archives about 41 million pages of content ( At about 4KB per page (, that works out to about 156GB—larger than the average torrent, sure, but not even a tenth the capacity of a decent external hard drive. Granted, articles added since January and images might increase the size, but compression can bring it down.
    Those with access to JSTORE should get cracking on this.

  4. I’m not going to defend Mr. Swartz, because I don’t know the details of the case. However, it’s ridiculous and offensive that most of the articles in JSTOR are private property, because the research they report was publicly sponsored, through agencies such as the National Science Foundation and the National Institutes of Health. At this point, in view of electronic document preparation and distribution, privately owned journals add very little value to publicly sponsored research (I speak as an author of a number of scientific articles; see for my bona fides). They’re a racket that serves no one well but their owners and employees. The growth of alternatives such as BioMed Central and the Public Library of Science is welcome, but it should be a matter of law that online reports of publicly sponsored research cannot be paywalled.

  5. I find the tenet of this article disturbing: you are essentially arguing that breaking laws don’t matter if the outcome is harmless. From that pov, driving drunk is ok as long as you make it home safely and don’t run down any children along the way. B&E is ok as long as nothing is stolen or disturbed and the house is locked back up when you’re done.

    The punishment may prove severe in the end, but I don’t see a fault with the indictment. He accessed a system that MIT created with a specified intent and access protocol that he circumvented. To say that the crime is victimless is untrue from that perspective because mit’s intent was violated. Arguing that the material that he stole was public domain (which all of it was not) is moot. We do not, as individuals, have the right to decide which laws we will and will not obey. He had every opportunity to access the documents through existing channels or by creating/altering the protocol thereby giving the owner of the system the opportunity to weigh in on the outcome.

  6. If he is associated with an educational/Academic institution such as the Harvard’s Center for Ethics, and he has open access to this material why not proceed to access the material through that channel? Why choose a different institution? The alleged reason for his infringement is based on a potential data mining exercise, and if thats the goal why not approach JSTOR as a harvard fellow and request the materials? and if thats not fishy enough, why choose to repeatedly connect to the JSTOR catalog through a inappropriate tampered connection (read indictment) in a secure closet on the MIT campus? All which just adds to his culpability rather than the actions of an inquisitive pursuit.

  7. txpatriot

    @Matthew: so if a break into a bank by defeating all of its security systems, just to steal some freely available promotional brochures, that s/b OK? No harm no foul, right?

      • txpatriot

        I agree — let’s decriminalize breaking and entering. That way I can break into your home and office, hack into your home & office computers as well as the GigaOM servers, take any physical objects that are free, and copy any software files, then leave, and you’d testify under oath that you’re perfectly OK with all of that — cool!

  8. I read the indictment, and he knew what he was doing was illegal. He wasn’t just copying “free” articles. JSTOR charges institutions a yearly fee to access those articles, and those who access them agree not to download them for redistribution or profit. Also, 1.5 million of the articles he stole were only legally available for a “purchase price.” The material JSTOR archives and manages are copyrighted – they don’t own the information – publishers and authors do. So, Swartz was stealing from the publishers and authors, he was stealing the use of MIT’s computers (He was an unauthorized “guest” at MIT, and MIT restricts even authorized guests to 14 days of access per year). He prevented PAYING clients from accessing JSTOR’s archives because he crashed their servers, and he prevented authorized MIT users from access to JSTOR because JSTOR had to block access to MIT IPs in an effort to stop Swartz. It doesn’t matter whether JSTOR or MIT want to press charges; what Swartz did was a federal crime involving interstate commerce, and Swartz didn’t agree to give the stolen goods back until he was caught by the feds.

  9. Michael Gersh

    Swartz is alleged to have taken the data servers down repeatedly as a consequence of his efforts to steal the data, he was well aware of their efforts to stop him, and took repeated measures to avoid being stopped. Then, when he was stopped by security, he fled. Plus, he broke into a data closet to do the deed, this was not mere hacking.

    The costs of digitizing and organizing the database are provided by expensive subscription charges to university and other customers. The fact that the universities provide the data without additional charge to credentialed people of their choice does not mean that it was provided free, to anyone. This was criminal behavior, pure and simple. He was caught after trying to flee. Additionally, one needs to remember that this is a federal indictment, and the federals achieve conviction rates in the high 90 percentile.

    The best advice I can give to any young person who similarly wants to help free information from pecuniary interests is, don’t do the crime if you can’t do the crime.

    • flynn like

      Ah! clever man! don’t think anyone will do the crime if they can’t do the crime. this is a victim-less offence which just confirms the saying that rules are meant to be broken

  10. Jon Strang

    Those articles are not “freely available” they are openly available to the university community. The university pays a subscription fee to gain access to this database and access is authorized under a terms of service. JSTOR is not a charity, it’s a non-profit. Furthermore, most of the articles in JSTOR are still in copyright. Swartz may be playing the part of a digital Robin Hood, but I don’t think any of us would deny that Robin Hood is a thief.

    • Even so, JSTOR e other scientific libraries charge too much for the service they provide. How can a non-profit organization charge US$ 19 for a single .pdf article even if its copyright has expired? They get public funded research results and charge money to publish them. I don´t think it is wrong to charge for it, but the prices aren’t suited for non-profit organizations that just need to organize a database.

    • Barba is correct: library costs are a significant burden for most universities, the vast majority of which are nonprofit organizations and most of which are public institutions. This isn’t even just an issue for smaller universities. As a postdoc at the University of Wisconsin, Madison, one of the foremost research universities in the country, I was among a small group of people that persuaded the library not to cancel its subscription to Theoretical Population Biology, an excellent but overpriced journal published by Elsevier; the library was trying to save money, of course. Even as a postdoc at Duke, another of the foremost research universities in the country and one of the richest, with particular strength in biomedical research, I repeatedly ran into articles I wanted in journals to which the library didn’t subscribe; with typical institutional subscriptions costing hundreds of dollars of year, even a university as rich as Duke can’t afford to subscribe to all the relevant journals.

      Moreover, saying that articles are “openly available to the university community” disregards the existence of researchers doing valuable work at universities outside the wealthiest countries or outside any university at all. I’ve received dozens of requests for copies of my publications from people around the world, many in poorer countries such as China and Russia, who cannot access them behind journal paywalls. And it’s always worth remembering that much outstanding science has been done by people not working at universities. For example, Charles Darwin, arguably the greatest biologist the world has yet seen, never had a faculty job. Darwin was rich, so he could maintain his own private library. Young Albert Einstein, working for the Swiss patent office, couldn’t have done that.

      Science thrives on the free exchange of information among scientists. The real costs of such exchange are now vastly lower than ever before in history, yet exchange remains hampered by an academic publishing system with financial arrangements inherited from the pre-computer era. Change is long overdue.

    • Swartz is no more a thief than are the educational facilities that rob our society of progress by restricting access to knowledge to the select few and privileged. If the university’s biggest concern is that knowledge might become available to the general public and provide them with the means and opportunity to better and educate themselves, I do not see the harm. The alarming point to me is not that information was ‘stolen’ but that an educational facility would be upset. Isn’t that against an educational facility’s mission statement: educating people? Education is a market, but I humble believe there’s a difference between charging students for courses staffed by faculty and restricting distribution of knowledge.