Compute shaders for IBL

Image Based Lighting

Image based lighting is widely known and used by now, there are plenty good articles on it like the one by Chetan Jaggi1 or the implementation used in UE42, so be sure to check those on the mathematics and other details behind IBL.

However, in the case you need to generate the IBL maps more often (e.g. not possible to cache them), then it's better to use compute shaders instead of fragment shaders (combined with render to texture). Using fragment shaders will be much slower, mainly due to the fact of framebuffer switches (for each plane of cube texture) which are expensive, especially on GPUs with tiled-architectures.

Irradiance Map

For the convolution of the diffuse irradiance map there's just one change required to have the best peformance, which is to avoid using a sampler, i.e. samplerCube (which may also no be possible on certain platforms), and use images instead as they have better memory coherency for the compute shader execution:

  layout(binding = 0, rgba8) uniform readonly imageCube environmentMap;
  layout(binding = 1, rgba16f) writeonly uniform imageCube irradianceMap;

In Vulkan (and OpenGL) a cube texture is treated as an array of six 2D textures, because of this we have to convert from the compute task IDs (i.e. gl_GlobalInvocationID) to the cube coordinates, this is straightfoward (note that we had to flip some of the UV coordinates, for more details see Table 3.11 from link):

  vec3 cubeCoordToWorld(ivec3 cubeCoord, vec2 cubemapSize)
  {
      vec2 texCoord = vec2(cubeCoord.xy) / cubemapSize;
      texCoord = texCoord  * 2.0 - 1.0; // -1..1
      switch(cubeCoord.z)
      {
          case 0: return vec3(1.0, -texCoord.yx); // posx
          case 1: return vec3(-1.0, -texCoord.y, texCoord.x); //negx
          case 2: return vec3(texCoord.x, 1.0, texCoord.y); // posy
          case 3: return vec3(texCoord.x, -1.0, -texCoord.y); //negy
          case 4: return vec3(texCoord.x, -texCoord.y, 1.0); // posz
          case 5: return vec3(-texCoord.xy, -1.0); // negz
      }
      return vec3(0.0);
  }

We also must convert from 3D coordinates (from which to sample) to 2D coordinates, that we must clamp to the [0, size -1] range, and a texture index in the array:

  ivec3 texCoordToCube(vec3 texCoord, vec2 cubemapSize)
  {
      vec3 abst = abs(texCoord);
      texCoord /= max3(abst);

      float cubeFace;
      vec2 uvCoord;
      if (abst.x > abst.y && abst.x > abst.z) 
      {
          // x major
          float negx = step(texCoord.x, 0.0);
          uvCoord = mix(-texCoord.zy, vec2(texCoord.z, -texCoord.y), negx);
          cubeFace = negx;
      } 
      else if (abst.y > abst.z) 
      {
          // y major
          float negy = step(texCoord.y, 0.0);
          uvCoord = mix(texCoord.xz, vec2(texCoord.x, -texCoord.z), negy);
          cubeFace = 2.0 + negy;
      } 
      else 
      {
          // z major
          float negz = step(texCoord.z, 0.0);
          uvCoord = mix(vec2(texCoord.x, -texCoord.y), -texCoord.xy, negz);
          cubeFace = 4.0 + negz;
      }
      uvCoord = (uvCoord + 1.0) * 0.5; // 0..1
      uvCoord = uvCoord * cubemapSize;
      uvCoord = clamp(uvCoord, vec2(0.0), cubemapSize - vec2(1.0));

      return ivec3(ivec2(uvCoord), int(cubeFace));
  }

Using cubeCoordToWorld we get the cube coordinates and convert to tangent space:

  ivec3 cubeCoord = ivec3(gl_GlobalInvocationID);
  vec3 worldPos = cubeCoordToWorld(cubeCoord, cubemapSize);
  // tagent space from origin point
  vec3 normal = normalize(worldPos);
  vec3 up = vec3(0.0, 1.0, 0.0);
  vec3 right = normalize(cross(up, normal));
  up = cross(normal, right);

After we can get samples via imageLoad using the texture array coordindates:

  // spherical to cartesian, in tangent space
  vec3 sphereCoord = vec3(sinTheta * cosPhi,  sinTheta * sinPhi, cosTheta);
  // tangent space to world
  vec3 sampleVec = sphereCoord.x * right + sphereCoord.y * up + sphereCoord.z * normal; 
  // world to cube coord
  ivec3 sampleCoord = texCoordToCube(sampleVec, cubemapSize);

  irradiance += imageLoad(environmentMap, sampleCoord).rgb * cosTheta * sinTheta;

And finally we simply write via imageStore:

  irradiance *= PI * invTotalSamples;
  imageStore(irradianceMap, cubeCoord, vec4(irradiance, 1.0));

Pre-filtered Map

Compared to the irradiance map the pre-filtered map uses mipmaps based on the number of roughness bins you want to use, because of this we'll have to set the target mipmap for the compute shader, for OpenGL we must use glBindImageTexture before dispatching:

  const int textureSize = 1024;
  const int maxMipLevels = 5;
  for (int mip = 0; mip < maxMipLevels; ++mip)
  {
    const int mipmapSize = textureSize >> mip;
    const float roughness = mip / (float)(maxMipLevels - 1);

    glBindImageTexture(1, prefilteredMapTexID, mip, GL_TRUE, 0, GL_READ_WRITE, GL_RGB16F);
    // after set uniforms (i.e. roughness and mipmapSize size) and dispatch
  }

While for Vulkan you must create an image view with the current mip level (equivalent of glBindImageTexture) and set it to the descriptor set before dispatching:

  VkImageViewCreateInfo viewInfo = {};
  viewInfo.sType                 = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO;
  // set other fiels as usual
  viewInfo.subresourceRange.baseMipLevel   = baseMipLevel;
  viewInfo.subresourceRange.levelCount     = m_mipLevels - baseMipLevel;
  viewInfo.subresourceRange.baseArrayLayer = 0;
  viewInfo.subresourceRange.layerCount     = 6;
  vkCreateImageView(m_device, &viewInfo, nullptr, &mipViewHandle);

  // after set it to a descriptor set
  VkWriteDescriptorSet descriptorWrite;
  descriptorWrite.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
  VkDescriptorImageInfo imageInfo;
  imageInfo.sampler = VK_NULL_HANDLE;
  imageInfo.imageView = mipViewHandle;
  imageInfo.imageView = cubemapImageLayout;

  // finally set uniforms (i.e. roughness and mipmapSize size), bind the descriptor and dispatch

As in the previous case we use cubeCoordToWorld to get the world coordinates:

  ivec3 cubeCoord = ivec3(gl_GlobalInvocationID);
  vec3 worldPos = cubeCoordToWorld(cubeCoord, u_mipmapSize);
  // tagent space from origin point
  vec3 N = normalize(worldPos);
  // assume view direction always equal to outgoing direction
  vec3 R = N;
  vec3 V = N;

However we have to use a samplerCube and sample via textureLod if we want to use the PDF based filtering to get rid of the aliasing artifacts as described by Chetan Jaggi1:

  float totalWeight = 0.0;   
  vec3 prefilteredColor = vec3(0.0);     
  for(uint i = 0u; i < totalSamples; ++i)
  {
      // generate sample vector towards the alignment of the specular lobe
      vec2 Xi = hammersleySequence(i, totalSamples);
      vec3 H = importanceSampleGGX(Xi, N, u_roughness);
      float dotHV = dot(H, V);
      vec3 L = normalize(2.0 * dotHV * H - V);

      float dotNL = max(dot(N, L), 0.0);
      if(dotNL > 0.0)
      {
          float dotNH = max(dot(N, H), 0.0);
          dotHV = max(dotHV, 0.0);
          // sample from the environment's mip level based on roughness/pdf
          float D = d_ggx(dotNH, roughness);
          float pdf = D * dotNH / (4.0 * dotHV) + 0.0001; 

          float saTexel  = 4.0 * PI / (6.0 * originalSamples);
          float saSample = 1.0 / (totalSamples * pdf + 0.0001);
          float mipLevel = roughness == 0.0 ? 0.0 : 0.5 * log2(saSample / saTexel); 

          prefilteredColor += textureLod(environmentMap, L, mipLevel).rgb * dotNL;
          totalWeight += dotNL;
      }
  }
  prefilteredColor = prefilteredColor / totalWeight;

If quality is acceptable w/o the PDF-based filtering, or you want to a more simplified filtering method, you can use instead an imageCube and texCoordToCube for sampling:

  if(dotNL > 0.0)
  {
      ivec3 sampleCoord = texCoordToCube(L, mipmapSize);
      prefilteredColor += imageLoad(environmentMap, sampleCoord).rgb * dotNL;
      totalWeight += dotNL;
  }

BRDF LUT Map

Using a compute shader for the integrated BRDF map is straightforward especially as it's also computed once, the only small note as a general good practice (if performance matters most) is using a constant variable for the texture size instead of using textureSize (which on most platforms is implemented as an uniform behind the scenes), but of course the performance impact is negligible.

  const vec2 cubemapSize = vec2(1024.0, 1024.0);

Shader Dispatch

To dispatch use the texture size and numbers of faces, however note that the local size (which by default is 1) will affect performance and is hardware dependent, in my case for an 1024 texture size using 32 gave slightly better performance:

  // must match local_size in GLSL
  int workGroupSize = 32;
  // in vulkan
  vkCmdDispatch(cmd, cubeMapSize / workGroupSize, cubeMapSize / workGroupSize, 6);
  // in OpenGL
  glDispatchCompute(cubeMapSize / workGroupSize, cubeMapSize / workGroupSize, 6); 

For a better understanding why local size does impacts peformance I recommend reading DirectCompute Optimizations and Best Practices4. However, the limitation is that local size is a compile-time constant so you will have to precompile the shaders for multiple sizes and then use the optimum one based on the hardware used.