CS492 Project 2: Encoding Data in Images Or Something Else That Isn't Encoding Data In Images

Due Wednesday, December 12

Alternative Projects

If you have a different plan for project 2, and we've talked about it, feel free to just ignore this project decription.

Introduction

Images have become highly popular on the Internet, often at high resolutions and including an RGB color space. As an example, a 1280 by 720 image has 921,600 pixels. If it is a color image, using RGB, it is represented by 3 times this many numbers, or 2,764,800. As such, there is the potential to embed data inside the image itself, in a way that could be difficult to discover. If one bit is included in each channel for each pixel, this would allow the inclusion of 2,764,800 bits, or 345,600 bytes. This is 337.5 KiB. Since posession of an image is not normally considered suspicious, embedding a message inside one should not raise any eyebrows.

Decoding

  1. Load the image into the GPU.
  2. Either replace the original image, or create a new buffer containing the values of each bit. You can use either one invocation per pixel or one per byte, but if you choose one per pixel, you will have to use a separate kernel for step 4. Each bit is the value of red, green, or blue for a single pixel, modulus 2.
  3. Each byte is encoded over multiple pixels. Considering the order of 8 pixels as rgbrgbrgbrgbrgbrgbrgbrgb, 3 bytes would be contained, one bit per image channel. If you consider the image to be type unsigned char*, and cease to consider it as a collection of pixels, then each 8 unsigned char values indicate a single byte.
  4. With one invocation per byte (in the output data) produce a buffer containing the original data. Note that this means that invocations will not align with pixels. Note that you will be reducing the data you have been working with to one eighth its original size.

Encoding

This is the reverse of decoding. Essentially, load the image using SOIL_load_image or something else, copy it onto the GPU, copy the data to be embedded onto the gpu, then adjust each channel of each pixel to be even or odd depending on the corresponding bit. You can operate either per byte or per pixel. Then, save the modified image, making sure to use RGB color space.

Length of Encoded Data

This method does not produce a natural method to show when the encoded data ends. This is similar to data transmission on a computer network. Main methods are:
  1. Encode the size of the data at or near the beginning. A lot of network protocols do this.
  2. Use a terminator of some sort, for example, in the case of a null-terminated string. This is fine for encodings such as ASCII that only use the null as a terminator, but can be an issue if encoding more complicated structures.

Compression

Image compression may distort the embedded data by unifying slightly-varying colors of adjacent pixels. As such, I'd recommend only using lossless formats with this program. This could cause trouble if, for example, you wanted to post a message with embedded data on a Facebook profile, since Facebook may distribute a more highly compressed version of the original image.

Secrecy

This is somewhat more secret than the common trick of appending the message to the end of an image file. This is because it is possible to build an automatic detection program for data appended to an image, since it will exceed the expected number of pixels. With this method, since the data is hidden in a pattern in the image itself, no such detector would work. However, it will slightly reduce the detail in the colors of the image holding the embedded data, which to a sophisticated viewer comparing with the original, may be apparent. Encrypting the message prior to embedding is unlikely to change this characteristic, but would provide an additional barrier to message discovery, plus randomize the pattern of even vs. odd color values in the image.

Loading Data to Embed

Do this however you like. Loading from a file would be reasonable, but beware of binary files, since the length will have to be encoded somehow. In a lot of cases, binary files are unaffected by nonsense trailing data, since in many cases they already include a header describing how long the file is expected to be. In others, such as a program, the code will include a return statement at some point that can serve as a terminator. So you might just get away without considering this at all!