I am Not a Book Pirate

10 Apr, 2015 | Written WordPiracy

Over the past few weeks I have received a few offers, both via Twitter and private messages on the Kindle Users Forum, along the lines of:

I see you’re interested in a free copy of my book, did you find it? If not, I’m happy to send you one.

I assumed they were spam and deleted or ignored them.

Then I was accused of stealing by an author I’d never heard of because of the supposed posts I’d made about getting free copies. That’s when I decided it was time to take action. It led me on quite a journey.

The Investigation Begins

My accuser was kind enough to provide me a link, to a site that looked like the openmamba (a linux distro) forums. None of the users who had posted on the thread were me, so I was curious how I had been linked to this request. Within an hour or two I had other people tweet me to say they were about to accuse me of the same thing. So I started digging.

Although the forum looked to be openmamba, I immediately clocked that the domain didn’t look legitimate. It ended in .mk, which turned out to belong to Macedonia. I assumed this was a pirate site hosting a forum dedicated to illicit material, but the main domain looked legit.

Clicking on any of the avatars took me through the actual domain, but to the profile of a totally different user.

Checking the KUF forum, I found others were having the same issue, both finding requests for their books and receiving messages about being pirates.

Pirate Sites

Another post on the forum led me to a site that appeared to be offering free ebook downloads. I’ve heard of people finding their books available for free, but there was something odd about this one. Almost every link went to a registration page. The copyright in the footer included “Sainsbury’s Entertainment Ltd is a wholly-owned subsidiary of Sainsbury’s Supermarkets Ltd.” So I went looking and found the site resembled eBooks by Sainsbury’s. Someone had ripped off their site design and hadn’t even bothered to change the text in the footer.

A web search for that phrase turned up a whole heap of sites that blatantly weren’t what they were purporting to be.

I emailed Sainsbury’s (I figured they’d have more resources than me), who said they were already working on this.

Looking at the domains being used, they were obviously legitimate at some stage, so it appeared the sites had been hacked and the content replaced. For what purpose I still don’t know (the registration page doesn’t appear to do anything but lead you to a success page and links to books that lead off-site). Hacked Domains

These hacked sites appeared to be something different to the original link I received (although links on them did go out to similar pages). So I did another search, this time for the book(s) I had originally been accused of stealing.

That turned up some interesting results, because there were a lot of sites, but I quickly noticed a pattern. The pages which housed these fake forum pages had a pattern:

http://<top-level-domain/libri—

You can easily see these by doing a search for one of my books (yes, I am a victim too). If you’re an author, just search for your name, your book’s title and word libri. Do a search for a famous author and libri and go far enough down the search results (libri means book in Italian/Latin) and you’ll find them there too.

The numbers appear random, but are often shared across sites (they’re not totally unique), but they can be different too.

Further digging found this wasn’t the only path, some use the word thread instead of libri, some use reviews, although those two seem to be limited to examples where the whole domain has been compromised.

It quickly became clear that these pages were not being hosted with the knowledge of the domain’s owner. And to make matters even more interesting, the links changed their look occasionally. A page may appear to be the openmamba forum one time, KUF the next, and any one of up to eight other forums as well.

On one such page, which appeared as KUF, the thread looked to have been started by someone called Marmoospres, but when you clicked the avatar you were taken through to my profile. That explained why people were contacting me.

Examining the Source Code

Being a web guy, I decided to start examining the source code to see if I could figure out what was going on.

One thing I noticed was that the source files (images, CSS, etc) of the original sites were being hosted on a domain called wunnibook.us. I quickly found copies of a page from each of the forums under numbered subdomains (9.wunnibook.us being KUF). The page they happened to copy was a thread I started, which explained the link to my profile (and why the wonderful Kath is being pestered, definitely no karma there, she deserves good things, not this).

The whole reason for these pages appears to be a single link. It often appears to point to LibraryThing, with the suggestion you can download/access free books, which you can’t. I have found examples that purport to go to a free book search engine.

The links actually point to a file called red.html (or sometimes got.html) in a subdirectory called router. It’s a very simple file that called a JavaScript function from a linked JavaScript file, with the name of the book.

The JavaScript file (called variously red.js or general1.js) has a single function (my) that builds a URL comprising of various domains, which appear to exist purely to host a cgi file (actually just more JavaScript) that takes the name of the book as a parameter. When the URL is built, the browser is redirected to the URL. After bouncing between a couple of scripts you end up either at a site that appears to allow you to search for free books (but wants you to register) or a download site.

The forum pages themselves are fakes, they’re not legitimate posters, they’re constructed by a script. Whatever creates them seems to generate a html subfolder. Each book then seems to get its own subfolder (could be virtual) and the index page loads a file called med.js which has a lot of variables; from user names to post contents, to avatar image URLs.

jQuery is then used to load another file from the folder, the one I’m looking at uses 1v.f (by way of example). That file is again JavaScript, outputting the stolen forum page. It also merges seemingly random variables from the med.js file, which dynamically generates the page. There are a range of those files, each one corresponding to a the subdomain of the fake forum files they copied (9v.f relates to KUF for example).

On the ones branded like Sainsbury’s, when the index file in the book folder is called it uses jQuery to pull in the content from a file inside the router folder. The one I am currently looking at is called html.f2 in a subfolder called subby.

Identifying the Problem

Having figured out how the pages are constructed, I started trying to identify the attack vector that had been used. How had so many sites been compromised to host this stuff?

Many of the sites were running WordPress, but plenty were not. Most were hosted on Apache servers, but several were on nginx and at least one appeared to be using IIS. They ran a range of operating systems (CentOs, Gentoo, other Linux distros and Windows). They all seemed to run PHP (even the IIS one), but different versions.

The domains were from various registries, they were registered through different registrars, and they appeared to be hosted all over the place, from the UK and US to Australia, Macedonia, Brazil and all points in between.

I still haven’t found a pattern. It does appear the sites that have been taken over entirely are from poorly maintained or abandoned domains, but they would need to be for someone to replace their content without it being noticed immediately.

With so many differing servers I can only assume they are generating the structures as folders, rather than modifying htaccess (or similar) files to add rules, but I can’t state that as a fact. It’s going to need someone better at this than I am to figure out how access to so many sites has been achieved (we’re talking thousands of sites).

Requesting a Takedown

Having identified wunnibook.us as the source of the fake forum files, allowing the sites to appear to be from sites there were not (although, in some cases, these files are hosted on the hacked site), I tried to track down the host.

I wasn’t convinced by the contact details on the whois record, especially the email address, so decided to skip those. It gave the registrar as Internet.BS Corp. – I figured that had to be BS, but it turns out BS is the top-level domain (TLD) for the Bahamas, and Internet.BS is a registrar that has previously had a reputation for hosting ‘rogue’ pharmacy sites, but has since turned that around and was bought by CentralNic in 2014.

I focused on the domain host, looking at the nameservers, which were all subdomains of topdns.com. That seems to have its own dark history. So I used WhoIsHostingThis? to try and track down the host. They pointed me at Serverius, a company who seems to be based in the Netherlands, but the IP address of the A record pointed me at 3net, whose website appears defunct. I emailed both to request they take a look at the site. No joy so far.

What’s it all About?

I not 100% sure about the why yet. This was a lot of work to set up (hacking multiple domains, creating scripts, scraping content) just to offer people free content. Add to that the fact that a number of authors have reported these links are for books that aren’t even available yet. The evidence leads me to one conclusion: malware.

I could be wrong, it may just be a way to harvest email addresses for spam, but the fact that PDF has long been a vector for infection would suggest it’s designed as a way to get people to infect themselves (to what end I don’t know, maybe just to build a botnet).

The Long and Short

For authors, the message is simple: don’t panic. I don’t think the downloaded files will be offering free copies of your books. If they are, take the view of Cory Doctorow: piracy is better than obscurity.

This appears to be a grand scheme to capture traffic and drive it to a site where visitors can self-infect their computers, or at least that’s my best guess.

For readers, it’s an eye-opener about getting free content from untrusted sources. There are plenty of places to get free books, legitimately.

Something that stood out in the communications I received, was how many authors were willing to give me a copy of their material. So if you find yourself wanting something, try and get hold of the author and just ask, they may well be happy to gain a reader.