Synthetic HDR Proof of Concept
In recent times, it has been very difficult to miss the craze for HDR (High Dynamic Range) photography. Put simply, the technique involves creating a series of photographs of the same scene, with varying exposure in each case. The images are then recombined into a single high dynamic range image, often at 32-bit depth, that retains highlight and shadow detail that no individual image from the series can contain. These HDR images are then tone mapped back to an 8- or 16-bit image that is within the capabilities of displays and printers (and, for that matter, the human eye).
7 images taken 1 f stop apart
For the above example, I shot 7 separate images, all at f/16, at shutter speeds one stop apart across the whole range, with a 100mm macro lens and bellows on a Bronica ETRS with a Megavision E4 Monochrome digital back (which has a 12-bit, 16 megapixel, 4096x4096 CCD image sensor). No single image manages to capture both the texture of the black cloth (a black towel) and the detail within the highlights on the brass weights. When I attempted to merge them to a single HDR image with Photoshop, I found this to be impossible because my machine didn't have enough memory. Since my PC is a dual-core Athlon with 4GB of RAM, there was no way I was going to be able to merge these images at full resolution, so I made 800x800 versions of all of them instead for the purposes of this article. The image above is the exact output of the HDR merge operation on Photoshop -- it's clear that it has done a good job of retaining detail across the range of exposures.
32-bit HDR image (converted to 8-bit without contrast alteration)
But let's imagine that the standard approach to HDR isn't possible -- maybe there was movement in the scene, or it was only decided after the shoot that HDR was going to be necessary in order to pull out enough shadow detail from an existing image.
Synthetic HDR image
In this article, I describe a technique that I'm going to call Synthetic HDR -- that is, a means of extracting an HDR image from a single original shot. What I'm about to describe isn't snake oil -- it's not just the same as creating several versions of the same image (e.g. in Adobe Camera Raw) then HDR merging them -- this technique really can extract up to an extra 8 bits or so of dynamic range from a single image. As you can see from the above examples, the results aren't identical, but I think it's clear that they are sufficiently close that it does demonstrate that the technique works. There will be some science, and some maths, but there will also be a step-by-step 'how-to' for reproducing these results armed only with Photoshop CS2. At some point I'll probably write a Photoshop plugin that will streamline the process, but for now, if you follow the steps exactly, you should get equivalent results.
Let us assume that we are starting (as in the above example) with a single source image that has a 12-bit dynamic range:
I chose one of the darker originals because it demonstrates just how much shadow detail can be recovered with this approach -- initially, it doesn't appear that there is even any information in the shadows at all.
A technique for increasing the sensitivity of image sensors, at the expense of resolution, has been known for a long time. Binning typically involves subdividing an image sensor into (usually) square blocks of pixels, then summing the intensity across the block. In some cases, this is carried out within the image sensor chip with analogue electronics. Alternatively, binning can be applied digitally later.
Typically, with 2x2 binning, as shown in the above diagram, you get a reduction of image resolution of a factor of 2 in each axis, or a 4 times reduction overall. However, quite literally, not all of the information is lost. If, for example, the original pixels were represented as 8 bits (i.e. with 256 possible values), the resulting binned pixel has 1024 potential values -- this equates to 10 bits of dynamic range. Doing, say, 5x5 binning on the same image data would give a 25 times multiple of the number of possible pixel values, resulting in (almost) 8 extra bits of dynamic range. At the expense of resolution, then, it is clear that it is possible to increase the dynamic range of an image more or less arbitrarily.
Of course, this isn't usually how binning tends to be used in practice. Generally, it tends to be used to increase the light sensitivity of the sensor, so typically the same bit depth as the original image is retained, with any value greater than the limit being hard-clipped to the upper end of the range. Used in this way, 2x2 binning gives a 4 times increase in light sensitivity, equivalent to 2 f-stops.
OK, we now know that we can have all the dynamic range we could reasonably want, if we are prepared to live with greatly reduced image resolution. But can we do better than this?
Returning to the 'traditional' HDR technique that involves making a series of photographs at different exposures then recombining the images later, such that the 'good' (mid-range) tones from each component image are used preferentially to assemble the final image. Here comes the big idea: perhaps we could simulate different exposures with binning, then use established HDR techniques to recombine them into a final image, such that the inevitable loss of resolution is just confined to shadow areas?
Amazingly enough, it seems to work. Here's how.
In my experiment, I three extra images from the original base image. Rather than binning as-such, I used a related technique, convolution, which I used to sum groups of pixels similarly to binning, except that the n by n matrix is moved along one pixel at a time, rather than n pixels at a time as it is with binning. This means that the resulting image is the same resolution as the source image, though blurred slightly (this is very close to what a blur filter does, but there are slight differences).
Taking the source image, I then filtered it with a Photoshop 'Custom' filter (you can find it in the menus as 'Filters | Other | Custom...') set up as follows:
Note (important!) that Scale is set to 1 -- normally this is used to rescale any increases or reductions in image brightness, but I deliberately want it set to 1 because we specifically don't want any scaling. Here, I've set up a 3x3 grid of 1 values -- this has the effect of doing 3x3 binning, with a resulting 9 times increase in dynamic range (a bit over 3 bits). The resulting image is as follows:
Image after a 3x3 convolution ('binning') transform
It's interesting to compare this with the original -- the roughly 3 stops of extra effective exposure has made the cloth visible and blown out the highlights.
Next, I did a 5x5 binning convolution based on the original image, again with a Photoshop Custom filter:
which gave results that look like this:
Image after a 5x5 convolution ('binning') transform
This version of the image now very clearly shows detail in the cloth, representing a 25 times increase in exposure, a bit more than 4 and a half stops.
At this point, I decided to take this to something of an extreme by now applying a further 3x3 convolution:
Image after a 5x5 and a 3x3 convolution ('binning') transform
This is maybe going a bit far, but it does serve to show just how much can be dragged out of the fine detail. The noise looks quite pleasingly like film grain, though (obviously) this image is so blown out that it's useless for anything other than HDR.
The next step was to do an HDR merge on the images, which resulted in (after mapping back to 8 bits) the following image:
Synthetic HDR image
The detail on the metal looks a bit flat and lifeless in comparison with the 'real' HDR image, but this is mostly just differences in brightness and contrast and nothing that couldn't be fixed with some dodging and burning. What is very clear, though, is that where the original had almost no visible shadow detail, the synthetic HDR image is almost identical to the 'real' HDR image.
Comparison with Levels
The obvious question that people will inevitably ask is, how does synthetic HDR compare with simply creating a series of Levels layers and then compositing them? By way of an example, here's what a 100% crop of the bottom right hand corner of the original image looks like, brightened to fill the full 0-255 grey scale range:
100% crop, bottom right corner of original image, auto-Levels
Not pretty -- really evil noise and gross posterisation. A 100% crop of the same area of the synthetic HDR image is as follows:
100% crop, bottom right corner of synthetic HDR image, auto-Levels
Definitely much nicer. No visible posterisation, and what noise there is looks like film grain. Perhaps surprisingly, there is little to choose between the images in terms of perceived sharpness -- I suspect that this is partly because our own human visual system probably does something not unlike synthetic HDR in making sense of the very noisy, pixellated original image. There are some slight square-edged artifacts, most likely due to the square edges of the convolution kernels that I used, but this could be avoided by using circular kernels instead of rectangular ones, but they would be a little more difficult to explain so that will probably have to wait for the plugin when I write it.
EV settings for Photoshop's HDR merge with binned images
When you use the Photoshop merge function, you'll need to set relative EV (exposure value) settings for each version of the image. Calculating this is easy enough, if you remember that EV is based on a powers-of-two series. An EV of 0 means no change, 1 means times 2 in terms of the amount of light, 2 means times 4, 4 means times 16, etc. If you have a specfic multiple in mind that's a power of two, it's clearly trivial to convert back to EV. However, if you have a multiple that isn't so convenient (e.g. for our 3x3 and 5x5 convolution kernels that have a x 9 and x 25 multiplier respectively), you need to calculate the base 2 logarithm of the multiplier. Equivalently, you can use any kind of log function and then divide the result by log 2. For convenience, the multipliers I used above were as follows:
- Original image: 0 EV
- 3x3 binned image: 9 times = 3.169 EV
- 5x5 binned image: 25 times = 4.643 EV
- 5x5 then 3x3 binned image: 9 x 25 times = 225 times = 3.169 + 4.643 = 7.812 EV.
If you want to use my convolution kernels, the above EV values should work fine. Don't be tempted to just guess or approximate them, because it will damage the image.
- Paul E. Debevec and Jitendra Malik. Recovering High Dynamic Range Radiance Maps from Photographs, Proceedings of SIGGRAPH 97, Computer Graphics Proceedings, Annual Conference Series, pp. 369-378 (August 1997, Los Angeles, California). Addison Wesley. Edited by Turner Whitted. ISBN 0-89791-896-7