PyTorch hit by severe security compromise

Illustration of a laptop with a magnifying glass exposing a beetle on-screen
(Image credit: Shutterstock / Kanoktuch)

A malicious dependency on PyTorch has been found tricking Python developers into downloading it and then stealing their sensitive data.

PyTorch recently disclosed that it had discovered a malicious dependency sharing its name with the framework’s “torchtriton’ library. Admins that installed PyTorch-nightly over the holidays were said to have been compromised, and the platform urged them to uninstall the framework and the fake ‘torchtriton’ dependency, immediately.

The trick with the same name works like this: when grabbing dependencies, PyPI takes precedence over PyTorch-nightly. Consequently, users pull the malicious dependency instead of the legitimate one.

Thousands of victims

"Since the PyPI index takes precedence, this malicious package was being installed instead of the version from our official repository. This design enables somebody to register a package by the same name as one that exists in a third party index, and pip will install their version by default," the PyTorch team said in its warning. 

Reports have claimed that the malicious dependency has already been downloaded more than 2,000 times already, and it grabs all sorts of sensitive data, from IP addresses and usernames, to current working directories. It also reads the contents of /etc/hosts, /etc/passwd, and The first 1,000 files in $HOME/*, among other things. 

The stolen data get uploaded to the h4ck.cfd domain via encrypted DNS queries, using the wheezy.io DNS server. 

The story, however, comes with a plot twist - as a notice on the h4ck.cfd domain appears to claim that the whole exercise was ethical research: 

"Hello, if you stumbled on this in your logs, then this is likely because your Python was misconfigured and was vulnerable to a dependency confusion attack,” the notice reads. To identify companies that are vulnerable the script sends the metadata about the host (such as its hostname and current working directory) to me. After I've identified who is vulnerable and [reported] the finding all of the metadata about your server will be deleted."

However some experts have claimed the binary collects more than “metadata” - it grabs SSH keys, .gitconfig, hosts and password files, all of which an ethical hacker wouldn’t touch. Furthermore, ‘torchtriton’ was observed using known anti-VM techniques to make sure it stays under the radar, and finally, the payload is obfuscated and contained entirely in the binary format. 

Malicious intent?

Still, in a statement to the BleepingComputer, the domain owner kept to his story of the white hacker: 

"Hey, I am the one who claimed torchtriton package on PyPi. Note that this was not intended to be malicious!

I understand that I could have done a better job to not send all of the user's data. The reason I sent more metadata is that in the past when investigating dependency confusion issues, in many cases it was not possible to identify the victims by their hostname, username and CWD. That is the reason this time I decided to send more data, but looking back this was wrong decision and I should have been more careful.

I accept the blame for it and apologize. At the same time I want to assure that it was not my intention to steal someone's secrets. I already reported this vulnerability to Facebook on December 29 (almost three days before the announcement) after having verified that the vulnerability is indeed there. I also made numerous reports to other companies who were affected via their HackerOne programs. Had my intents been malicious, I would never have filled any bug bounty reports, and would have just sold the data to the highest bidder.

I once again apologize for causing any disruptions, I assure that all of the data I received has been deleted.

By the way in my bug report to Facebook I already offered to transfer the PyPi package to them, but so far I haven't received any replies from them."

Via: BleepingComputer

Sead Fadilpašić

Sead is a seasoned freelance journalist based in Sarajevo, Bosnia and Herzegovina. He writes about IT (cloud, IoT, 5G, VPN) and cybersecurity (ransomware, data breaches, laws and regulations). In his career, spanning more than a decade, he’s written for numerous media outlets, including Al Jazeera Balkans. He’s also held several modules on content writing for Represent Communications.