Cybersecurity researchers from Phylum have found a new form of malware in a PyPI package that was using Unicode to hide.
Unicode is a global encoding standard used for different languages and scripts, covering more than 100,000 characters, whose goal is to simplify and streamline how characters are viewed in electronic and digital devices. With Unicode, every letter, digit, and symbol, get a unique numeric value, that stays the same, regardless of the program or platform in use.
The malware is called “onyxproxy”, it is an infostealer on the hunt for developer login credentials and authentication tokens. It was available on PyPI for a week, before being shut down, and during that time, it managed to get 183 downloads, meaning that up to 183 different developers are at risk of credential and identity theft.
Hiding in plain sight
The malware carries a package called “setup.py” which, according to the researchers, has “thousands” of suspicious code strings which use a combination of Unicode characters.
Observed on the surface, the characters look normal and benign - however, what the human eye sees, and what the program sees, are two vastly different things.
In onyxproxy, there are three critical identifiers: “__import__”, “subprocees”, and “CryptoUnprotectData”. These have a large number of variants, which makes them ideal for beating string-matching-based defenses, the researchers explain.
While the technique might sound complicated, the researchers claim it isn’t exactly sophisticated. However, should the abuse of Unicode for hiding malicious Python code become a trend, it might become cause for concern.
"But, whomever this author copied this obfuscated code from is clever enough to know how to use the internals of the Python interpreter to generate a novel kind of obfuscated code, a kind that is somewhat readable without divulging too much of exactly what the code is trying to steal," concludes Phylum.
- Here are the best malware removal tools right now