2012-09-06

Linux - Embedding a file in an executable

Original http://www.linuxjournal.com/content/embedding-file-executable-aka-hello-world-version-5967


Embedding a File in an Executable, aka Hello World, Version 5967


Jun 12, 2008  By Mitch Frazier

I recently had the need to embed a file in an executable. Since I'm working at the command line with gcc, et al and not with a fancy RAD tool that makes it all happen magically it wasn't immediately obvious to me how to make this happen. A bit of searching on the net found a hack to essentially cat it onto the end of the executable and then decipher where it was based on a bunch of information I didn't want to know about. Seemed like there ought to be a better way...
And there is, it's objcopy to the rescue. objcopy converts object files or executables from one format to another. One of the formats it understands is "binary", which is basicly any file that's not in one of the other formats that it understands. So you've probably envisioned the idea: convert the file that we want to embed into an object file, then it can simply be linked in with the rest of our code.
Let's say we have a file name data.txt that we want to embed in our executable:
  # cat data.txt
  Hello world
To convert this into an object file that we can link with our program we just useobjcopy to produce a ".o" file:
  # objcopy --input binary \
            --output elf32-i386 \
            --binary-architecture i386 data.txt data.o
This tells objcopy that our input file is in the "binary" format, that our output file should be in the "elf32-i386" format (object files on the x86). The --binary-architecture option tells objcopy that the output file is meant to "run" on an x86. This is needed so that ld will accept the file for linking with other files for the x86. One would think that specifying the output format as "elf32-i386" would imply this, but it does not.
Now that we have an object file we only need to include it when we run the linker:
  # gcc main.c data.o
When we run the result we get the prayed for output:
  # ./a.out
  Hello world
Of course, I haven't told the whole story yet, nor shown you main.c. When objcopydoes the above conversion it adds some "linker" symbols to the converted object file:
   _binary_data_txt_start
   _binary_data_txt_end
After linking, these symbols specify the start and end of the embedded file. The symbol names are formed by prepending _binary_ and appending _start or _end to the file name. If the file name contains any characters that would be invalid in a symbol name they are converted to underscores (eg data.txt becomes data_txt). If you get unresolved names when linking using these symbols, do a hexdump -C on the object file and look at the end of the dump for the names that objcopy chose.
The code to actually use the embedded file should now be reasonably obvious:
#include 

extern char _binary_data_txt_start;
extern char _binary_data_txt_end;

main()
{
    char*  p = &_binary_data_txt_start;

    while ( p != &_binary_data_txt_end ) putchar(*p++);
}
One important and subtle thing to note is that the symbols added to the object file aren't "variables". They don't contain any data, rather, their address is their value. I declare them as type char because it's convenient for this example: the embedded data is character data. However, you could declare them as anything, as int if the data is an array of integers, or as struct foo_bar_t if the data were any array of foo bars. If the embedded data is not uniform, then char is probably the most convenient: take its address and cast the pointer to the proper type as you traverse the data.
Mitch Frazier is an Associate Editor for Linux Journal.