Monday, August 11, 2008

Is a Picture Worth a Thousand Words?

Or rather, "Is a Picture Equal to a Thousand Words"? It's hard to say. One could go about comparing the average amount of disk space that a small picture file occupies to the average amount of disk space a thousand words would occupy as a basic text file. However, directly comparing file sizes is not this straightforward...

File format can significantly affect file size. The same exact picture with the same exact dimensions (length x width) will produce different file sizes depending on its format (i.e., JPEG, BMP, TIFF, PNG, etc.). A quick Google search has revealed over 200 image file formats, with many more under development. A comparison of three popular word processors resulted in different file sizes. To get a baseline reading, it makes sense to save a word file as a blank document. Doing so, the Notepad file is 0 kilobytes, WordPerfect X3 is 1.64 kilobytes, and Microsoft Office 2003 comes in at a whopping 23.5 kilobytes. Using Notepad offers two advantages. First, you eliminate the confounding variable of file size of the actual program because it is 0 kilobytes. Total disk space of a file containing a thousand words would have to expressed as "x - 1.64 kilobytes" and "x - 23.5 kilobytes for WordPerfect X3 and Microsoft Office 2003, respectively. Second, other variables such as font type, size, and format, etc. are eliminated because these could erroneously inflate file size, a problem that is present in Notepad to a minimal degree.


Another important consideration is word length. File size will depend on the average size of the words. Larger words (i.e., disestablishmentarianism) would take up more space than smaller words (i.e., play). Using data from the UDHR in Unicode database, English has an average word length of 5.10 characters. Dangme, a language used in Ghana and Togo, has an average word length of 2.76 characters whereas Amharic, used in North Central Ethiopia, has an average word length of 49.06 characters. Thus, language will also affect file size.

For pictures, the situation becomes less clear. What is considered as an average picture size? Pictures taken for personal use on a digital camera depend on the camera settings but will usually be larger on average. Pictures on websites are usually compressed to save space. A Google search has revealed only one thing, that picture size varies greatly. The most common picture format is JPEG with an average picture size of 1 megabyte. TIFF format is extremely large compared to JPEG resulting in high quality pictures averaging in at 9.9 megabytes. PNG clocks in between JPEG and TIFF with an average file size of 6.5 megabytes. However, it is evident that average size depends on a complex interaction between format, purpose, and the equipment used for taking and storing the picture.

Conclusion: Using random strings of letters that are 5 characters in length, a Notepad file containing 1000 words is 5.95 kilobytes. Compared to the average file size of the most common picture file formate JPEG, which is 1 megabyte, we see that the picture file is 168.07 times larger. So is a picture really worth (or equal to) a thousand words? I argue that it depends on many variables, but most likely no.