Beyond the Marketing: Assessing Anti-Bot Platforms through a Hacker's Lens
Have you ever had your account credentials leaked online? A password changed without your knowledge, or your cloud computing account bled dry? Perhaps you just barely missed a PS5 drop, or the newest pair of Jordans. Then you've been a victim of web automation, as have most people on the internet.
Even before the pandemic, bot related damages have plagued businesses for years. These losses materialize in the form of downtime, lower customer conversion rates, and various operational costs—from when an account is created to when a credit card has to be verified. In recent years botting has become considerably more widespread, and hence losses have only increased. Between account checkers, sniper bots, scalper bots, and general purpose scraping, it's hard to catch a break as an online retailer.
Enter the antibot industry. These companies—many founded specifically to identify and stop bad actors—often advertise the ability to stop non-human traffic, mitigating losses stemming from automation. By collecting behavioral data, device data, and other proprietary heuristics, these companies cross-reference collected data with what they believe to be human-like behavior. To prevent workarounds—such as spoofed fingerprints, replay attacks, or patched webdrivers—they employ various obfuscation techniques to make their client-side scripts resistant to reverse engineering. If you're wondering what this might look like, I've written two previous blog posts on analyzing the antibot used on Nike (and will be diving into more examples later in the series).
Although this industry has grown significantly in recent years, businesses continue to lose increasing amounts of money. For an attacker, the reason is obvious. The vast majority of prospective buyers are not well educated on what makes an effective antibot, and the market is riddled with misinformation and deceptive metrics. Not only do some of the most popular antibot vendors consistently fail to be effective, their clients are left with very little information to actually track their success. This means that not only are "protected" sites consistently falling victim to automation based attacks, many of them are not even aware they've been attacked until they see their accounts for sale or their data in a competitor's hands. While this is good if you want to purchase sneakers at retail, it also makes web fraud remarkably easy for criminals.
In this series of posts, I will attempt to shed some light on how easy it actually is to bypass popular antibot vendors, with the end goal of clearly establishing metrics for what makes an antibot "good."
Who am I?
In case you are wondering why I'm at all qualified to offer my opinion, I thought I'd provide a little bit of my background. If you don't need any convincing, feel free to skip this part.
- I've had to create solutions to these antibots for use at scale, with a focus on speed and reliability. Although my days of forging fingerprints are long past, anyone who has had to write a sneaker/retail bot, scrape airline flights, or interface with banking APIs has undoubtedly ran into an antibot vendor at some point. As my background lies in sneaker resell, I've been on the other side of this fight (without being a criminal). I'm still well known in these spaces, with people reaching out about my past open source projects on a daily basis. Recently, I even had the pleasure of helping Ian Carroll out with his new project.
- I've also seen the inner workings of the antibot industry. I understand what kinds of strategies are employed on the other side, from writing compilers to reverse engineering browsers. Although this series will focus primarily on the attacker's point of view (as that's where the bulk of my experience lies), my own opinions and insights will no doubt be flavored by my understanding of blue team tactics. Vendors have even approached me in the past, asking me to reverse engineer their product and give feedback.
- Despite my proximity and insider knowledge of the industry, I have no real stake in the game. For the sake of full disclosure, I've had exposure to the executive teams at many antibot companies (either through Twitter or offline), but I care much more about promoting good products and dissecting bad ones than running ads. My angle is the ultimate improvement of security technologies, and in my eyes that starts with understanding what works and what doesn't.
Hopefully you're somewhat persuaded that I can be trusted (or not, not telling you what to do), and we can begin diving into the technical side of things.
So what makes an antibot good, anyway?
For this series to be both accurate and effective, clear cut metrics have to be established. From an attacker's point of view, the only thing that matters is how much effort an attack is—from first looking at the script to scaling and maintaining a solution. By looking at the difficulty of each step an attacker has to perform, we can accurately assess each product. With this in mind, for each antibot I look at I will try to answer the following questions:
Learnability: How hard is it to understand/reverse engineer this product? Given that I've reverse engineered a previous version, how hard will it be to do it again?
The first step in any antibot solution is getting an initial understanding. If an attacker can't learn any useful information about the system, progress will be impossible. In particular, it's important to examine whatever obfuscations are being used, and how hard it is to determine what properties/behaviors are being fingerprinted. Is there complicated network flow that has to be carefully analyzed, or tricky browser signals that have to be taken into account? In other words, how big is the initial hill an attacker has to climb before writing a solution?
Automatability: How hard is it to automate a solution to this product? What kinds of solutions are possible?
The two main types of solutions are sandbox-based (using a webdriver, a DOM simulator, etc) and request-based. To this end, I will explore as many different solutions as possible and assess their feasibility. For instance, if I have to use an automated browser, do I have to edit the library/patch in certain values? If I can solve solely with requests, do I have to parse the script to generate a successful payload, or can I spoof a fingerprint without doing so? It's worth noting that a request based solution is always preferable—being faster as well as more cost effective—and the fewer steps the better (for an attacker).
Scalability: How easy is it to scale different kinds of solutions?
Answering this question will follow quite naturally from the previous one, mainly depending on the feasibility of a non sandbox solution. For a request solution, is generating certain telemetry difficult/expensive? How much data will I need to go undetected for attack-level traffic? An example of this might be mouse/keyboard telemetry, sometimes requiring use of a neural net to pass at scale.
If I'm forced to sandbox, it's important to determine how heavy my sandbox will have to be, and how widely my environments will have to vary. For instance, as the primary drawback of sandboxing is the required computing power, my sandboxing will be rather expensive if I'm forced to run a proof of work style challenge. Additionally, can I get away with a higher scaling sandbox like jsdom, or is there sufficient entropy in what data is collected that I'm effectively forced to construct a browser?
Maintainability: Assuming I already have a solution, depending on the type, how difficult will it be for me to maintain?
Every time an antibot updates or otherwise changes, an attacker has to either update their solution, or have code that does so automatically. Hence the difficulty of bringing a solution up to date can be a huge deterrent to potential attackers, resulting in time wasted and money lost. If fixes have to be done manually, does it take a lot of work for me to actually figure out what changed? How much time will it take me to patch? It's also useful to examine whether their system picks up on patterns or mistakes in my solution, and adapts accordingly. Importantly, is this feasible for me to maintain by myself, or would I need a team?
Just a quick disclaimer: I won't actually be writing solutions to these antibots for this series, only reverse engineering them. Hence the two previous metrics will be primarily informed by my past experience, and my own thoughts after doing a thorough reverse. This shouldn't affect the accuracy of my work, and if there is enough interest (and I have enough time) I will be happy to demonstrate proof of concepts. I've written solutions to most antibots in the past, or one of my friends will have, so there is significant data here without having to write a solution from the ground up.
Now that a few baseline metrics have been established, it's time to start picking apart some examples.
The first installment in the series will be posted in the near future. I will be focusing on web-based products (not mobile), and will be avoiding most captcha-based vendors. Most captcha-based solutions tend to be ineffective and heavily disruptive to the user experience, so I will more than likely be writing an opinion piece on them later anyways, analyzing the inner workings of a few examples.
As my writing is solely for educational purposes, I won’t be going too deep into the specifics of writing solutions, or releasing any code. As previously stated, my focus will lie primarily on exploring different obfuscation techniques and analyzing the effectiveness of collected telemetry, with the ultimate goal of fairly evaluating an antibot's ability to deter attackers.
Feel free to reach out to me on Twitter, Discord, or email. You can find my links on my main site. If you're a vendor and want to talk to me, I'm also more than happy to open a dialogue/hear whatever you have to say.
Thanks for reading, be sure to follow me on Twitter for my next post, where we'll dive into more technical aspects as well as a specific example.