Tech Feature: Ssao And Temporal Blur
Screen space ambient occlusion (SSAO) is the standard solution for approximating ambient occlusion in video games. Ambient occlusion is used to represent how exposed each point is to the indirect lighting from the scene. Direct lighting is light emitted from a light source, such as a lamp or a fire. The direct light then illuminates objects in the scene. These illuminated objects make up the indirect lighting. Making each object in the scene cast indirect lighting is very expensive. Ambient occlusion is a way to approximate this by using a light source with constant color and information from nearby geometry to determine how dark a part of an object should be. The idea behind SSAO is to get geometry information from the depth buffer.
There are many publicised algorithms for high quality SSAO. This tech feature will instead focus on improvements that can be made after the SSAO has been generated.
SSAO Algorithm
SOMA uses a fast and straightforward algorithm for generating medium frequency AO. The algorithm runs at half resolution which greatly increases the performance. Running at half resolution doesn’t reduce the quality by much, since the selesai result is blurred.
For each pixel on the screen, the shader calculates the position of the pixel in view space and then compares that position with the view space position of nearby pixels. How occluded the pixel gets is based on how close the points are to each other and if the nearby point is in front of the surface normal. The occlusion for each nearby pixel is then added together for the selesai result.
SOMA uses a radius of 1.5m to look for nearby points that might occlude. Sampling points that are outside of the 1.5m range is a waste of resources, since they will not contribute to the AO. Our algorithm samples 16 points in a growing circle around the main pixel. The size of the circle is determined by how close the main pixel is to the camera and how large the search radius is. For pixels that are far away from the camera, a radius of just a few pixels can be used. The closer the point gets to the camera the more the circle grows - it can grow up to half a screen. Using only 16 samples to select from half a screen of pixels results in a grainy result that flickers when the camera is moving.
Grainy result from the SSAO algorithm
Bilateral Blur
Blurring can be used to remove the grainy look of the SSAO. Blur combines the value of a large number of neighboring pixels. The further away a neighboring pixel is, the less the impact it will have on the final result. Blur is run in two passes, first in the horizontal direction and then in the vertical direction.
The issue with blurring SSAO this way quickly becomes apparent. AO from different geometry leaks between boundaries causing a bright halo around objects. Bilateral weighting can be used to fix the leaks between objects. It works by comparing the depth of the main pixel to the depth of the neighboring pixel. If the distance between the depth of the main and the neighbor is outside of a limit the pixel will be skipped. In SOMA this limit is set to 2cm.
To get good-looking blur the number of neighboring pixels to sample needs to be large. Getting rid of the grainy artifacts requires over 17x17 pixels to be sampled at full resolution.
Temporal Filtering
Temporal Filtering is a method for reducing the flickering caused by the low number of samples. The result from the previous frame is blended with the current frame to create smooth transitions. Blending the images directly would lead to a motion-blur-like effect. Temporal Filtering removes the motion blur effect by reverse reprojecting the view space position of a pixel to the view space position it had the previous frame and then using that to sample the result. The SSAO algorithm runs on screen space data but AO is applied on world geometry. An object that is visible in one frame may not be seen in the next frame, either because it has moved or because the view has been blocked by another object. When this happens the result from the previous frame has to be discarded. The distance between the points in world space determines how much of the result from the previous frame should be used.
Explanation of Reverse Reprojection used in Frostbite 2 [2]
Temporal Filtering introduces a new artifact. When dynamic objects move close to static objects they leave a trail of AO behind. Frostbite 2’s implementation of Temporal Filtering solves this by disabling the Temporal Filter for stable surfaces that don’t get flickering artifacts. I found another way to remove the trailing while keeping Temporal Filter for all pixels.
Shows the trailing effect that happens when a dynamic object is moved. The Temporal Blur algorithm is then applied and most of the trailing is removed.
Temporal Blur
(A) Implementation of Temporal Filtered SSAO (B) Temporal Blur implementation
I came up with a new way to use Temporal Filtering when trying to remove the trailing artifacts. By combining two passes of cheap blur with Temporal Filtering all flickering and grainy artifacts can be removed without leaving any trailing.
When the SSAO has been rendered, a cheap 5x5 bilateral blur pass is run on the result. Then the blurred result from the previous frame is applied using Temporal Filtering. A 5x5 bilateral blur is then applied to the image. In addition to using geometry data to calculate the blending amount for the Temporal Filtering the difference in SSAO between the frames is used, removing all trailing artifacts.
Applying a blur before and after the Temporal Filtering and using the blurred image from the previous frame results in a very smooth image that becomes more blurred for each frame, it also removes any flickering. Even a 5x5 blur will cause the resulting image to look as smooth as a 64x64 blur after a few frames.
Because the image gets so smooth the upsampling can be moved to after the blur. This leads to Temporal Blur being faster, since running four 5x5 blur passes in half resolution is faster than running two 17x17 passes in full resolution.
Upsampling
All of the previous steps are performed in half resolution. To get the selesai result it has to be scaled up to full resolution. Stretching the half resolution image to twice its size will not look good. Near the edges of geometry there will be visible bleeding; non-occluded objects will have a bright pixel halo around them. This can be solved using the same idea as the bilateral blurring. Normal linear filtering is combined with a weight calculated by comparing the distance in depth between the main pixel and the depth value of the four closest half resolution pixels.
Summary
Combining SSAO with the Temporal Blur algorithm produces high quality results for a large search radius at a low cost. The total cost of the algoritm is 1.1ms (1920x1080 AMD 5870). This is more than twice as fast as a normal SSAO implementation.
SOMA uses high frequency AO baked into the diffuse texture in addition to the medium frequency AO generated by the SSAO.
Temporal Blur could be used to improve many other post effects that need to produce smooth-looking results. SOMA uses high frequency AO baked into the diffuse texture in addition to the medium frequency AO generated by the SSAO.
Ambient Occlusion is only one part of the rendering pipeline, and it should be combined with other lighting techniques to give the selesai look.
References
- http://gfx.cs.princeton.edu/pubs/Nehab_2007_ARS/NehEtAl07.pdf
- http://dice.se/wp-content/uploads/GDC12_Stable_SSAO_In_BF3_With_STF.pdf
// SSAO Main loop
//Scale the radius based on how close to the camera it is
float fStepSize = afStepSizeMax * afRadius / vPos.z;
float fStepSizePart = 0.5 * fStepSize / ((2 + 16.0));
float fStepSizePart = 0.5 * fStepSize / ((2 + 16.0));
for(float d = 0.0; d < 16.0; d+=4.0)
{
//////////////
// Sample four points at the same time
vec4 vOffset = (d + vec4(2, 3, 4, 5))* fStepSizePart;
{
//////////////
// Sample four points at the same time
vec4 vOffset = (d + vec4(2, 3, 4, 5))* fStepSizePart;
//////////////////////
// Rotate the samples
vec2 vUV1 = mtxRot * vUV0;
vUV0 = mtxRot * vUV1;
vec3 vDelta0 = GetViewPosition(gl_FragCoord.xy + vUV1 * vOffset.x) - vPos;
vec3 vDelta1 = GetViewPosition(gl_FragCoord.xy - vUV1 * vOffset.y) - vPos;
vec3 vDelta2 = GetViewPosition(gl_FragCoord.xy + vUV0 * vOffset.z) - vPos;
vec3 vDelta3 = GetViewPosition(gl_FragCoord.xy - vUV0 * vOffset.w) - vPos;
vec4 vDistanceSqr = vec4(dot(vDelta0, vDelta0),
dot(vDelta1, vDelta1),
dot(vDelta2, vDelta2),
dot(vDelta3, vDelta3));
vec4 vInvertedLength = inversesqrt(vDistanceSqr);
vec4 vFalloff = vec4(1.0) + vDistanceSqr * vInvertedLength * fNegInvRadius;
vec4 vAngle = vec4(dot(vNormal, vDelta0),
dot(vNormal, vDelta1),
dot(vNormal, vDelta2),
dot(vNormal, vDelta3)) * vInvertedLength;
////////////////////
// Calculates the sum based on the angle to the normal and distance from point
fAO += dot(max(vec4(0.0), vAngle), max(vec4(0.0), vFalloff));
// Rotate the samples
vec2 vUV1 = mtxRot * vUV0;
vUV0 = mtxRot * vUV1;
vec3 vDelta0 = GetViewPosition(gl_FragCoord.xy + vUV1 * vOffset.x) - vPos;
vec3 vDelta1 = GetViewPosition(gl_FragCoord.xy - vUV1 * vOffset.y) - vPos;
vec3 vDelta2 = GetViewPosition(gl_FragCoord.xy + vUV0 * vOffset.z) - vPos;
vec3 vDelta3 = GetViewPosition(gl_FragCoord.xy - vUV0 * vOffset.w) - vPos;
vec4 vDistanceSqr = vec4(dot(vDelta0, vDelta0),
dot(vDelta1, vDelta1),
dot(vDelta2, vDelta2),
dot(vDelta3, vDelta3));
vec4 vInvertedLength = inversesqrt(vDistanceSqr);
vec4 vFalloff = vec4(1.0) + vDistanceSqr * vInvertedLength * fNegInvRadius;
vec4 vAngle = vec4(dot(vNormal, vDelta0),
dot(vNormal, vDelta1),
dot(vNormal, vDelta2),
dot(vNormal, vDelta3)) * vInvertedLength;
////////////////////
// Calculates the sum based on the angle to the normal and distance from point
fAO += dot(max(vec4(0.0), vAngle), max(vec4(0.0), vFalloff));
}
//////////////////////////////////
// Get the selesai AO by multiplying by number of samples
// Get the selesai AO by multiplying by number of samples
fAO = max(0, 1.0 - fAO / 16.0);
------------------------------------------------------------------------------
// Upsample Code
vec2 vClosest = floor(gl_FragCoord.xy / 2.0);
vec2 vBilinearWeight = vec2(1.0) - fract(gl_FragCoord.xy / 2.0);
float fTotalAO = 0.0;
float fTotalWeight = 0.0;
for(float x = 0.0; x < 2.0; ++x)
for(float y = 0.0; y < 2.0; ++y)
{
// Sample depth (stored in meters) and AO for the half resolution
float fSampleDepth = textureRect(aHalfResDepth, vClosest + vec2(x,y));
float fSampleAO = textureRect(aHalfResAO, vClosest + vec2(x,y));
// Calculate bilinear weight
float fBilinearWeight = (x-vBilinearWeight .x) * (y-vBilinearWeight .y);
// Calculate upsample weight based on how close the depth is to the main depth
float fUpsampleWeight = max(0.00001, 0.1 - abs(fSampleDepth – fMainDepth)) * 30.0;
// Apply weight and add to total sum
fTotalAO += (fBilinearWeight + fUpsampleWeight) * fSampleAO;
fTotalWeight += (fBilinearWeight + fUpsampleWeight);
}
// Divide by total sum to get selesai AO
float fAO = fTotalAO / fTotalWeight;
-------------------------------------------------------------------------------------
// Temporal Blur Code
//////////////////
// Get current frame depth and AO
vec2 vScreenPos = floor(gl_FragCoord.xy) + vec2(0.5);
float fAO = textureRect(aHalfResAO, vScreenPos.xy);
// Get current frame depth and AO
vec2 vScreenPos = floor(gl_FragCoord.xy) + vec2(0.5);
float fAO = textureRect(aHalfResAO, vScreenPos.xy);
float fMainDepth = textureRect(aHalfResDepth, vScreenPos.xy);
//////////////////
// Convert to view space position
// Convert to view space position
vec3 vPos = ScreenCoordToViewPos(vScreenPos, fMainDepth);
/////////////////////////
// Convert the current view position to the view position it
// Convert the current view position to the view position it
// would represent the last frame and get the screen coords
vPos = (a_mtxPrevFrameView * (a_mtxViewInv * vec4(vPos, 1.0))).xyz;
vec2 vTemporalCoords = ViewPosToScreenCoord(vPos);
//////////////
// Get the AO from the last frame
float fPrevFrameAO = textureRect(aPrevFrameAO, vTemporalCoords.xy);
float fPrevFrameDepth = textureRect(aPrevFrameDepth, vTemporalCoords.xy);
vPos = (a_mtxPrevFrameView * (a_mtxViewInv * vec4(vPos, 1.0))).xyz;
vec2 vTemporalCoords = ViewPosToScreenCoord(vPos);
//////////////
// Get the AO from the last frame
float fPrevFrameAO = textureRect(aPrevFrameAO, vTemporalCoords.xy);
float fPrevFrameDepth = textureRect(aPrevFrameDepth, vTemporalCoords.xy);
/////////////////
// Get to view space position of temporal coords
vec3 vTemporalPos = ScreenCoordToViewPos(vTemporalCoords.xy, fPrevFrameDepth);
///////
// Get weight based on distance to last frame position (removes ghosting artifact)
float fWeight = distance(vTemporalPos, vPos) * 9.0;
// Get to view space position of temporal coords
vec3 vTemporalPos = ScreenCoordToViewPos(vTemporalCoords.xy, fPrevFrameDepth);
///////
// Get weight based on distance to last frame position (removes ghosting artifact)
float fWeight = distance(vTemporalPos, vPos) * 9.0;
////////////////////////////////
// And weight based on how different the amount of AO is (removes trailing artifact)
// Only works if both fAO and fPrevFrameAO is blurred
fWeight += abs(fPrevFrameAO - fAO ) * 5.0;
////////////////
// Clamp to make sure atleast 1.0 / FPS of a frame is blended
fWeight = clamp(fWeight, afFrameTime, 1.0);
fWeight += abs(fPrevFrameAO - fAO ) * 5.0;
////////////////
// Clamp to make sure atleast 1.0 / FPS of a frame is blended
fWeight = clamp(fWeight, afFrameTime, 1.0);
fAO = mix(fPrevFrameAO , fAO , fWeight);
------------------------------------------------------------------------------