RackForms v7.8.8
rackforms blog
Site Tag Line
Latest Release: Build 958 - Arpil 18th, 2023
Contact Us  Purchase Options Purchase Options Contact Us
header-image


 

Welcome To The RackForms Blog General Section

Here we have posts not directly related to RackForms, but rather programming in general -- enjoy!

 

Archive Navigation:

 

Obsfucated C Code And The Heart Bleed Bug

» Languages & Technologies Covered:  C

Introduction

Last week I was reading up on the so called Heart Bleed bug when I came across a completely unrelated reference to the 'The International Obfuscated C Code Contest'.

Started in 1984, this contest seeks to find amusing and clever ways to exploit the C language and related systems, and more importunity, to wrap these exploits and oddities in the most obfuscated manner possible.

I decided to try my hand at decoding one of the entries, randomly picking this entry from 1995:

http://www.ioccc.org/years.html#1995_heathbar

We'll learn some cool tricks in a minute, but the point of this post is somewhat broader: it's that the C languages afford so many ways to create bad code that it's almost tragic. Heart Bleed, in my opinion, is the classic example of a developer who's literally too smart for there own good, and ends up abusing the language to create some truly stupendous bugs. The code used in the IOCCC competition is a prime example of this power gone bad -- very bad. We'll get back to this point at the end, but first, let's see if we can't learn a little something about bizarre C code.

So the code in this submission, like most others, starts as an amusing glob of nonsense, and must be decoded to be understood.

I should note one of the key factors in this contest is entries are size limited, ensuring the complexity comes not from overly long blocks of code.

And so while we start with this:

April

A dollop of basic indentation and renaming provides a rather manageable:

#include <stdio.h>
#include <stdlib.h>

int	add(int), x= -1, y, ret;

int main (int argc, char** args) {

    if(argc !=3)
        return 0;
		
    x++, argc--, args++;

    x = atoi((argc--,*args++));

    y = atoi((argc--,*args++));

    add(add(add(add(add(add(add(add(add(add(add(add(add(add(add(add(argc))))))))))))))));

    printf("%d\n", ret);

    return 0;

}

int add(int args){

    static int main = -1; main++;

    return 0,

    ret |= ((!!(x &1<< main) || ((!!( y &1<< main  ) || args) &&
        (!(!!(y &1<< main) && args)))) &&
        (!(!!(x &1 << main) && ((!!(y &1<< main) || args) &&
        (!(!!(y &1<< main) && args)))))) << main, (!!(y &1<< main) && args) ||
        (!!(x &1 << main) && ((!!(y &1 << main ) || args) && (!(!!(y &1 << main) && args))));
}


For our purposes we're not going to dive into the code's functionality, (read the 'hint' file at: http://www.ioccc.org/1995/heathbar.hint)

Instead, we're going to focus on the bits that had me scratching my head for a while.

The first item that struck me was the assignment of our command line arguments to x and y. Roughly commented it looks like:

// decrement arg count, increment args pointer by one to push past executable path (which is always the first argument)
x++, argc--, args++;

// x is equal to the integer value of the dereferenced value pointed at by args
x = atoi((argc--,*args++));

// started at 3, argc now equals 0, and y is equal to the pointer value of args which has been incremented twice. 
y = atoi((argc--,*args++));

There are a few really neat things going on here:

First, the atoi function takes a single parameter, so what in the heck is: (argc--,*args++)?

We find a similar trick in the add function with:

static int main = -1; main++; return 0,
[more code]

In my 10+ years of programming I'd never seen a return statement precede a block of critical code like that, especially in C.

Turns out in both cases this is the result of the comma operator.

This is one of those times when you're so used to an operator that you forget the designers of C had to define what a comma was. Without proper definitions symbols are meaningless, and the comma is no different. And so in my experience comma's are separators, either in function calls, definitions, or variable blocks.

Used in this manner though, a comma means "act as a binary operator for sequences".

That is, out of a sequence n, evaluate and discard all results except the last element of n.

The good news is our code uses both forms this operator can take, so let's dive in and see how they work.

The first use is with the atoi calls, where we decrement the argc variable, then dereference the next command line argument.

Using our new-found knowledge of the comma operator means this statement is understandable if we consider that argc is decremented independent of the dereference, as that's the only "thing" returned in the block. The dereferenced value is the only parameter the atio call processes.

The next use takes on a slightly different form, but the principal remains the same.

return 0, followed by more code

A comma here means the 0 is never returned, only the last element in the sequence is, our adder logic. It's how we can literally say, clear as day, "return 0", but not actually be doing so!

There's one other use of the comma operator in this code, can you spot it?

And here's where I'd like to bookend this thing:

See, while neat, if I was just a touch smarter I'm sure I'd be able to find good uses for the comma.

In fact, nothing's stopping me from creating a function like:

int sub(int t){
    return 0, x = x + add(10), t = t + 1;
}

And then calling it before the giant add() block like so:

x = x + sub(10);

But would this be good style, and is it adding anything we couldn't do with a more conventional pattern? To wit, to grab parameters in C we can absolutely use pointer arithmetic. Nothing stops us from doing so -- but again, would this not be a more concise, easier to understand way of doing the same thing:

int x = atoi(args[1]);
int y = atoi(args[2]);

When I look at the Heart Bleed bug I see code that's almost needlessly complex, and so it's no surprise when these types of errors pop up. The esoteric and often humorous styles and techniques used by the IOCCC entrants belie a deeper issue: I've never used the comma in this way, hell, never even knew it was possible because I'm simply not that smart. I know C but I'm nowhere near masterful, so I tend to focus on the simpler stuff.

The Heart Bleed bug was created by a coder that's way smarter than me, and yet despite the bigger brain he ended up creating this disaster.

Would it not stand to reason if they dumbed down to my level they'd have a much harder time creating such bugs?

 

Benchmarking - Native vs. Managed Code

» Languages & Technologies Covered:  C#, C++, Java, Mac, Windows

» Project Files:  Download

Introduction

Mac based PHP developers have surprisingly few choices when it comes to native IDE's -- Almost all are based on Eclipse, a Java based solution which is dog slow for larger projects.

This is especially true when it comes to large code files, something RackForms has plenty of. For example, the UI file that runs the main editor interface is just over 41,000 lines of JavaScript code. This is just one of hundreds of files, which means in many ways developing RackForms on my i7, SSD equipped iMac is a case-study in frustration and waiting.

With so much of my development time spent waiting, I started to wonder exactly why these IDE's are so slow. After some testing I believe I can shed some light on the situation, and what's interesting about this finding is I came upon it by accident.

The Project

Some time ago I wrote a program for parsing .WOFF font files, the delivery vessel of what's more commonly known as the 'web font'. WOFF files caught my attention because they're binary encoded and in many cases, poorly documented. Thus, I wanted to write a parser for this format as a programming challenge.

The resulting program opens a font file, parses the WOFF headers, and then reads and displays the files NameEntry records. NameEntry records are strings describing the font files properties, copyright info, font names, and other information.

The original program was written in C++ using the Qt framework, and for this test was ported to .NET C# in Visual Studio 2012.

It's key to note my original plan was to test the binary parsing routines of each language, so the resulting programs each wrapped that block in a loop, which is then executed 125 times.

I expected the C++ version to be faster, what I ended finding was quite surprising...

The Result

The C++ version executed in just over 180 milliseconds. The C# version hung and didn't finish.

Not believing the result I tried a quick test: perhaps it was the text box that was slowing us down. See, the program takes the NameEntry at each loop and appends it to an ever growing textarea. The Qt version handled this fine, perhaps the C# version not so much.

And sure enough that's exactly what it was.

Turns out appending to a text area in .NET is incredibly slow, as the string themselves are immutable. By contrast the StringBuilder::Append method works on mutable strings, which means we avoid the costs associated with building new strings at each loop of our test.

Changing our string code to use mutable strings made all the difference in the world and the program now ran using the code:

// Display Name Entries

foreach (NameEntry ne in NameEntries)
{
	sb.Append(ne.name + "\r\n");
}

// Append to Display Box.
textBox1.Text += sb.ToString();

With this change the two execution times were far closer:

C++ 181 Milliseconds.
C# 4100 Milliseconds.

Please ntoe the C++ is still 20 times faster, but at least now we're able to run the C# version without the hang.

The takeaway

One big lesson from all this is how much leniency C++ provides a developer for poorly optimized code. For example, take a look at the execution time of these two blocks for our C++ code:


// 578 Milliseconds
while(ne.hasNext()){
	NameEntry n = ne.next();
    ui->plainTextEdit->appendPlainText(n.name);
}

// 181 Milliseconds
while(ne.hasNext()){
	NameEntry n = ne.next();
	_text += n.name; _text += "\r\n";
}

ui->plainTextEdit->appendPlainText(_text);

 

The first block represents the same approach that tanked C#, and yet we still finish in barely 2/10 of a second, a full 7 times faster than the optimized C# version.

After seeing these results play out before me it's no longer a surprise why thing shave the potential to get so laggy.

Even in the best of times native code still runs this test 20 times faster, expand this out to a huge code file and parsing semantics and syntax evaluation and it's wait, wait, wait.

 

Executing Dynamic Code ON Windows

» Languages & Technologies Covered:  C, C++, Windows

Introduction

One of the core tenets of modern computing is the concept of the JIT, the Just In Time Compiler. JIT's are quite common, and in fact you're probably using one right now: most modern web browsers use a JIT to translate JavaScript code into native assembly. JIT's are also used extensively on the server side: the page you're reading right now from raw PHP, and a JIT powers all of Microsoft's ASP.NET platform and Java.

JIT's are everywhere, and so as a programmer I began to read up on how they worked. One area that kept piquing my interest was the simple question of how does a JIT run the code it creates? The answer is they run dynamically created code from memory. It's the "run from memory" part we'll look at today.

Code Execution

In traditional code execution a user launches an executable by double clicking an icon, a launcher shortcut, and so on.

On Windows machines, executable files use a file format called Portable Executable, or PE.

It's a complex process, but in general when the operating system loads the PE file your compiled source code is stored in memory with EXECUTE permissions, the rest with READ ONLY permissions. No other part of the memory the operating system assigns to your application can be executed and for good reason: memory with execute permissions can be a huge security risk.

We can summarize this state of affairs by saying the compilation process takes care of code permissions for us, and while convinient this also means the permissions for the contents of an executable are set in stone.

Clearly dynamic code be executed, however, as JIT's would not exist if we couldn't. It turns out the secret to running dynamic code is quite simple, but means adding this general logic to your applications source code:

  1. Create a function pointer.
  2. Create a heap.
  3. Place the code to execute into a char[].
  4. Alloc the heap.
  5. Copy the char[] into the allocated memory.
  6. Cast the allocated heap's void pointer to our function pointer.
  7. Execute the dynamic code by calling the casted function pointer.

In working code this translates to: (please note this code was only tested on 64-bit Windows in VS 2013)

 
#include "stdafx.h"
#include <iostream>
#include <Windows.h>

using namespace std;
   
void JIT()
{
	// create function pointer
	typedef long(*JittedFunc)(long);

	// create heap, 1 page max
	HANDLE h = HeapCreate(HEAP_CREATE_ENABLE_EXECUTE, 0, 4000);
	
	// create code

#if _WIN64
	unsigned char code[] = {
		0x89, 0x4C, 0x24, 0x08,
		0x57,
		0x8B, 0x44, 0x24, 0x10,
		0x83, 0xC0, 0x04,
		0x5F, 0xC3
	};
#else	
	unsigned char code[] = {
		0x55, 0x8b, 0xec, 0x81, 0xec, 0xc0, 0x00,			// set up
		0x00, 0x00, 0x53, 0x56, 0x57, 0x8d, 0xbd,
		0x40, 0xff, 0xff, 0xff, 0xb9, 0x30, 0x00,
		0x00, 0x00, 0xb8, 0xcc, 0xcc, 0xcc, 0xcc, 0xf3, 0xab,
		0x8b, 0x45, 0x08,						// mov eax,dword ptr [num]
		0x83, 0xc0, 0x04,						// add eax,4 
		0x5f, 0x5e, 0x5b, 0x8b, 0xe5, 0x5d, 0xc3			// tear down, ret
	};
#endif

	// alloc heap to size of jit code
	LPVOID a = HeapAlloc(h, HEAP_ZERO_MEMORY, sizeof(code));

	// copy code to allocated memory location
	memcpy(a, code, sizeof(code));

	// point function pointer to memory location with dynamic code
	JittedFunc j = (JittedFunc)a;

	// execute dynamic code
	long t = j(8);

	// display result of code call in console
	cout << t;

	// free and destroy memory
	if(HeapFree(h, 0, a) == 0)
		cout << "Heap Free Error";

	if(HeapDestroy(h) == 0)
		cout << "heap Destroy Error";

}

int main()
{
	JIT();
	int g;
	cin >> g;
	return 0;
}

    
    

Of course it goes without saying that this is a very simple example; the code we inject and execute simply adds 4 to the value passed in to the function pointer. But it is dynamic, which is kinda neat!

The takeaway

One way to think about what we're doing above is to realize the CPU is always looking for somethign to do--so long as the memory locations we're feeding the CPU are valid and have proper permissions for the opcodes being given everything hums along as normal.

It should also be said my instruction stream was quite literally pulled from a debug session in Visual Studio, which means the function/stack setup and tear down code is hard-wired to use one argument of type long. This is not very portable, and I'm personally wondering if there's an easier way to create the setup and prologue ABI code. it would be nice, for example, to just use the function pointers signature to handle this, and then only worry about the function's internal logic.

Conclusion

Executing dynamic code is not something you'll need very often. In fact, unless we're in a specialty sector like media encoding or performance computing the chances you'll need to do so are very slim.

However, it does force us to think about code security, which may lead us to writing safer code down the road.

 

So many features, So little time