Monday, April 11, 2011

JavaScript Camouflaging - A Primer

In this discussion, we are simply walking through the nature of JavaScript obfuscation and camouflaging in order to understand the importance of dense code.

Views By:- RB and AKS

JavaScript is a widely used language for developing application and websites on internet. JavaScript is used for positive as well as for nefarious purposes. Since it is a client side scripting language, most of the time the scripts are available for walking through them to understand the purpose. However, JavaScript obfuscation is used heavily for making JavaScript readability a hard process and at the same time beating automated detection tools. Overall, the purpose of JavaScript obfuscation is to hide the source code so that it is not possible to steal it. At the same time, this technique is also used for malicious purposes by the attacker to bypass antivirus engines in order to execute rogue code successfully. These are the two main reasons for the wide usage of JavaScript obfuscation.

Exploit Placement and Detection:
Most of the Browser Exploit Packs (BEP's) use browser JavaScript rendering engine and heap spraying techniques to exploit vulnerabilities in browser. The aim is to exploit the heaps using JavaScript capabilities. Another thing that should be taken into account is the way exploit is developed and the way it is supported by JavaScript obfuscation. Consider an exploit; if it is wrapped and placed in [HTML] and [Body] tags, the antivirus engine definitely detects it. Is there a difference between handling scripts and executables by antivirus engines? Based on our analysis, antivirus engines follow a similar approach which is signature based pattern matching. A unique signature is created for antivirus engine and it is matched against the malicious script. If we talk about polymorphic code which carries self decrypting routine to reverse the script automatically, detection is not an easy process. However, for normal scripts the detection mechanism is easy. Considering the capabilities of JavaScript, being a client side language , it can be forced to execute scrambled code.

Camouflaging - Dense Code
Camouflaging (increasing data density) is a very robust approach of making JavaScript’s undetectable. The critical functions such as EVAL/ENESCAPE are something that antivirus engines always look for. Primarily, UNESCAPE is used for making a readable string out of escaped data. The unescaped string is harder to read from user perspective but antivirus engines possibly detect this and flag it as malicious. Another similar functions is EVAL, which is used collectively with UNESCAPE to design malicious scripts. For generic JavaScript obfuscation these are used but it is executed as EVAL(UNESCAPE(............)) in the code and combination is easily fetched as malicious. The question is these functions are required explicitly to run the code in hidden manner. Camouflaging is an art that is used by referencing these functions to different names or completely random names. A generic example is
as follows

var FGHTY678 = eval;
var VFGBH432 = unescape;


These variables can be generated dynamically because JavaScript supports on the fly variable referencing. For hard scenarios, a robust algorithm can be structured which provides random naming of variables that points to crucial functions. We will stick to our generic examples. At this point if we use the code as follows it is still not that hard to detect by antivirus engines

var VFGBH432 = unescape;
return FGHTY678(VFGBH432(....))


It performs the similar functionality except we can make variable names random. This process changes the signature and it makes the process tough for antivirus engines to detect it (cannot say purely undetectable). Another case which is easily detectable is as follows

function FGHTY678(FGATH789) { return eval(FGATH789);}
function VFGBH432(XCVTH789) { return unescape(XCVTH789);}


Only variable names are changed but function calls remain intact and hence signature detection is easy. The above discussed points do not provide reliable way of obfuscation. One must remember that even if we have camouflaged the function calls, the escaped code is still present as argument in the UNESCAPE function. It is readable by the antivirus engines. Overall, the code is not fully camouflaged and obfuscated. Now let's talk about hexadecimal encoding (you can choose anyone). The question is, "How the hexadecimal encoding impacts the obfuscation?". For example: if we encode the following string "JavaScript Obfuscation", the hexadecimal looks like as

"4a617661536372697074204f62667573636174696f6e"


Even though for humans, it is hard to read but for computer programs it is not a big deal. Can we use JavaScript to make it more obfuscated? Yes, there are certain inbuilt string manipulation functions that we can try on. JavaScript functionality can be used to design extensible and robust codes. Mere simple encoding does not hide the signatures. JavaScript functions like REPLACE is of much use. This function is used heavily in normal purposes because of its versatile nature. for example: a relative code can be structured as

var GHJKO786 = eval;
var KJLHM890 = unescape;

var FGHBN345 = "JavaScript-Obfuscation1JavaScript-Obfuscation2JavaScript-Obfuscation3JavaScript-Obfuscation4JavaScript-Obfuscation5JavaScript-Obfuscation6JavaScript-Obfuscation7";

KJLHM890.replace(/JavaScript-Obfuscation/gi,"!@#$%^&*");
GHJKO786(KJLHM890(FGHBN345));


The above stated code looks rigorous but it is using inbuilt JavaScript functions."JavaScript-Obfuscation" is replaced with a metacharacter string "!@#$%^&*". The "gi" option in REPLACE function is used to replace the string case sensitive and it is applied as global level. Further, it is also possible to use MATH.ROUND and MATH.RANDOM functions to randomize the custom function names.

Garbage Data Mangling
Garbage data serves useful with extra logic is used to place raw data in JavaScript code. The main idea behind this technique is to make the process harder for filtering actual data. Generally, it is of no use and it is not vital. This helps in resisting the signature matching process. We basically talk about logic flow in which certain logic remain true forever and scripts are placed inside that logic which execute all the time. A similar example can be used as follows

var GHJKO786 = eval;
var KJLHM890 = unescape;

if (VBNHJ789 != "7890")
{
FGHBN345 = "JavaScript-Obfuscation1JavaScript-Obfuscation2JavaScript-Obfuscation3JavaScript-Obfuscation4JavaScript-Obfuscation5JavaScript-Obfuscation6JavaScript-Obfuscation7";
}

var KJLHM890.replace(/JavaScript-Obfuscation/gi,"!@#$%^&*");
GHJKO786(KJLHM890(FGHBN345));


In reality, the logic (VBNHJ789 != "7890") is never true and hence FGHBN345 string is always true and viceversa. another example can be discussed as

function RTGHY123
{
var FGHYU009 = "Rocky";
for (temp =1 ; temp <= 20 ; temp++)
{
FJKLM765 = document.write;
}

if(FGHYU009 == 5678)
{
FJKLM765("HEYA");
}
}


The function RTGHY123 is a garbage function and does nothing but results in creating mess for the analysts and anybody.

At last, there are several other methods possible for obfuscation. Our sole purpose is to discuss the effectiveness of density in encoding mechanism in JavaScript that can be used to design better obfuscator.

Nothing is impossible until it is proclaimed so.