{"id":174650,"date":"2025-03-21T16:54:24","date_gmt":"2025-03-21T16:54:24","guid":{"rendered":"https:\/\/societyofauthors.org\/?p=174650"},"modified":"2025-03-27T19:09:57","modified_gmt":"2025-03-27T19:09:57","slug":"the-libgen-data-set-what-authors-can-do","status":"publish","type":"post","link":"https:\/\/societyofauthors.org\/2025\/03\/21\/the-libgen-data-set-what-authors-can-do\/","title":{"rendered":"The LibGen data set \u2013 what authors can do"},"content":{"rendered":"

Last updated 27 March 2025<\/em> – watch out for further updates this week<\/em><\/p>

27 March 2025:<\/em> Sign our open letter to the Secretary of State for Culture Media and Sport<\/a><\/p>

On Thursday 20 March 2025, The Atlantic<\/em> published a searchable database of over 7.5 million books and 81 million research papers<\/a>. This data set, called Library Genesis<\/a> or \u2018LibGen\u2019 for short, is full of pirated material, which has been used to develop AI systems by tech giant Meta.<\/p>

The Atlantic <\/em>says that court documents show<\/a> that staff at Meta discussed licensing books and research papers lawfully but instead chose to use stolen work because it was faster and cheaper. Given that Meta Platforms, Inc, the parent company of Facebook, Instagram and WhatsApp, has a market capitalisation of \u00a31.147 trillion<\/a>, this is appalling behaviour.<\/p>

According to The Atlantic<\/em>, Meta argued that it could then use the US\u2019s \u2018fair use exception\u2019<\/a> defence if it was challenged legally.<\/p>

It is not yet clear whether scraping from copyright works without permission is unlawful under the US fair use exception to copyright, but if that scraping is for commercial purposes (which what Meta is doing surely is) it cannot be fair use. Under the UK fair dealing exception to copyright<\/a>, there is no question that scraping is unlawful without permission.<\/p>

We wrote to Meta in August 2024 to assert our members\u2019 rights around uses of their works by generative AI<\/a>. As a matter of urgency, Meta needs to compensate the rightsholders of all the works it has been exploiting.<\/p>

This is yet more evidence of the catastrophic impact generative AI is having on our creative industries worldwide. From development through to output, creators\u2019 rights are being ignored, and governments need to intervene to protects authors\u2019 rights.<\/p>

In the UK, and globally, we need to see strong legislation from governments to uphold and strengthen copyright law, ensure transparency and fair payment, and to penalise big tech companies who ride roughshod over the law.<\/p>

It is unclear at this stage how Meta used all the data it downloaded from LibGen (and other pirated libraries) but court documents show that it allegedly downloaded the contents of LibGen during the development of its AI tool Llama. The search below uncovers what pirated books are on LibGen and likely to have been used by Meta: <\/p>

You can search the data set here.<\/a><\/p>

The SoA is campaigning for increased protections for authors, and to put an end to AI tech companies unlawfully using copyright works without permission or payment.<\/p>

The SoA\u2019s Chief Executive, Anna Ganley, said:<\/p>

\u2018Rather than ask permission and pay for these copyright-protected materials, AI companies are knowingly choosing to steal them in the race to dominate the market. This is shocking behaviour by big tech that is currently being enabled by governments who are not intervening to strengthen and uphold current copyright protections. As part of the Creative Rights in AI Coalition, the SoA has been at the heart of the fight and is continuing to lobby against these unlawful and exploitative activities.\u2019<\/p><\/blockquote>

How our members are responding<\/h3>

Authors are up in arms. This open letter<\/a> organised by member Vikki Patis gained over 200 signatures in less than 24 hours. <\/p>

Author, and descendent of Charles Dickens, Lucinda Hawksley took to X<\/a> share that \u20187 of [her] books have been stolen by Meta for their AI database. In 1842 [her] greatx3 grandfather called for an International Copyright Law to prevent his works being pirated. Now we have that law, yet once again authors are experiencing theft.\u2019<\/p>