A massive 1.5TB trove of data known as ALIEN TXTBASE, containing 23 billion rows of stealer logs, has been processed and integrated into Have I Been Pwned (HIBP). The data set includes 284 million unique email addresses, 493 million website/email pairs, and 244 million new passwords that had never been seen in HIBP before. New APIs now allow domain owners and website operators to identify exposed users. The public can still check if their personal email address appears – for free.
What Is ALIEN TXTBASE and Where Did It Come From?
Late last month, cybersecurity experts were tipped off by a government contact about a small sample of suspicious data files posted on Telegram. These files led researchers to a much larger dump of 744 separate files, distributed openly in a criminal Telegram channel — a growing hotspot for cybercriminal activity.
Each file contained credentials harvested by info-stealing malware – software that infects victims’ machines and records usernames, passwords, and the websites they were entered into. This type of malware often spreads via fake software downloads, cracked tools, or malicious browser extensions.
Stealer Logs: What Makes Them Dangerous?
Unlike a typical data breach that targets one website, stealer logs capture data from infected devices regardless of where the user logs in. That means a single victim’s credentials might include their Netflix account, banking login, work email, online shopping sites, and more — all in one file.
To monetize this, threat actors typically give away a portion of the stolen data for free, and charge a subscription to access new, ongoing logs. It’s a business model built on compromised identities.
Verifying the Data: Real Victims, Real Services
Since this data wasn’t taken from one compromised service, verifying it required some ingenuity. Researchers tested sample email addresses in login pages (like Netflix) to confirm whether those credentials were real. In many cases, websites like Netflix even revealed location-based login paths, matching the suspected region of the victim (e.g., /en-ph/
for the Philippines).
To go further, HIBP reached out to known subscribers and verified dozens — even thousands — of rows against their addresses. One German user, for example, had over 1,000 entries tied to him, painting a detailed picture of his digital life — from his Amazon and LinkedIn accounts to his preferences in cars and whisky. Creepy? Yes. Real? Absolutely.
New Features for Organizations: Domain-Based API Access
As part of this update, HIBP released two powerful new APIs designed for businesses:
- Query by Email Domain: Domain owners (e.g.,
@company.com
) can see what websites their users’ credentials appear on. - Query by Website Domain: Website operators (e.g.,
netflix.com
) can find which email addresses showed up logging into their site in the logs.
Previously, organizations had to check one email at a time. Now, with these domain-level APIs, they can assess their exposure in bulk.
These tools are part of HIBP’s Pwned 5 subscription tier, meant for enterprise users. However, individual email lookups remain completely free through the web interface.
A Major Update to Pwned Passwords
HIBP’s open-source Pwned Passwords database, which helps users and services avoid compromised passwords, just received a huge boost. From the ALIEN TXTBASE logs alone, 244 million new passwords were added, with another 199 million having their prevalence counts updated.
The service, which processes around 10 billion queries monthly, plays a critical role in helping developers and companies block weak or previously leaked passwords — anonymously and at no cost.
Behind the Scenes: Processing the ALIEN TXTBASE Logs
Handling this much data wasn’t easy. The full processing pipeline involved:
- Parsing 744 individual files (1.5TB total) locally using custom .NET apps
- Extracting 284M unique emails, 493M unique email/website pairs
- Uploading and validating records into Azure SQL
- Filtering out duplicates, validating domains, and integrating with existing HIBP systems
In a surprising twist, local processing turned out to be faster and cheaper than relying on cloud-based analytics alone. Despite all the modern cloud tools available, hand-crafted local tools still won the performance battle.
What This Means for You (and the Industry)
This update has far-reaching implications:
- Individuals can now search their email addresses for traces of malware-based data theft via HIBP’s notification page.
- Organizations can leverage powerful new APIs to detect compromised employee or customer accounts.
- Developers and security teams gain access to a richer set of Pwned Passwords to enforce better credential hygiene.
The ALIEN TXTBASE breach gives us rare visibility into the scale and scope of info-stealing malware operations — and thanks to HIBP’s public integration, that visibility now benefits everyone.
🔍 How to Check if You’re Affected (Free)
- Visit: https://haveibeenpwned.com/NotifyMe
- Enter your email and confirm with the link you receive
- Scroll to the bottom to see if your email shows up in the stealer logs
Final Thoughts
The ALIEN TXTBASE dataset might be one of the largest stealer log leaks ever publicly indexed, offering a sobering look at the dangers of info-stealing malware. But thanks to efforts like HIBP, we can turn leaked data into a force for good — improving security, protecting users, and giving victims a chance to respond before further damage is done.