Errors With Continuous Output. Also, Terrain.

Last time I said I would be talking about my work on terrain and some errors I ran into over the course of developing my framework for this year.  So, surprisingly, that’s what’s about to happen.

I’ll talk about terrain first because it’s visually more appealing (and I actually remembered to take screenshots of it!) and also likely infinitely less boring to people that aren’t me.  Right now I’m taking a fairly traditional approach of utilizing a heightmap to generate a 2D array of vertices, and decoding the color values of each pixel into height values.  I’m still early into my implementation, so things are very basic but functional.  Here’s a screen capture:

So, again, I’m still very early into my implementation.  That image represents approximately 3 days worth of work.  I still have a lot of work to do in spatial partitioning, LOD, and blending different detail maps to get something that looks and performs as well as I need it to.  I also found an Nvidia sample that utilizes the tessellation hardware to great effect that I want to try to incorporate.  However, right now I’m able to generate 10 square kilometers worth of game world terrain and maintain over 60fps so I’m not unhappy with how things are going right now.  To get a sense of the scale, here are some images:

TerrainWork2

TerrainWorkNow, unfortunately these screenshots have different terrains in them and it kind of ruins the effect of what I was trying to illustrate.  But, the movement of the terrain is minuscule as the car moves out of view into the far distance.  I would have retaken them to fix this inconsistency, except I’ve made dramatic changes to the lighting model that I want to save for my next post.  Sorry!

And that brings me to the other issue I wanted to talk about; errors in the framework.  Some of them are caused by subtle problems but are still obvious.  For example, setting the depth function to D3D11_COMPARISON_EQUAL can be a great way to re-use depth information and eliminate pixel overdraw on subsequent material passes.  However, if one pass multiplies the viewprojection matrix against the world matrix on the cpu, and another does this on the gpu, you will not get the result you want because of differing precision.  It seems obvious as I type it, but it took me about 30 minutes to realize what was going on.  But at least it was incredibly obvious that something WAS wrong because the output was broken.

Things get tougher when you get a continuous output from your error and you don’t realize anything is wrong, and this can happen a lot in graphics.  A good example is depth data usage in my lighting and postprocessing.  The image in my last post was actually completely incorrect, but I didn’t realize it when it was initially taken.  Tweaking some data lead me to realize that moving the camera far enough away caused illumination to zero out, which makes no sense.  But for a large range of distances it seemed correct, so it took me a while to even realize there was a problem.  A good way to simulate this (but wasn’t my actual problem) is to bind the active depth buffer as the shader resource view my lighting is sampling, ensuring I get an all black texture.  That’s obviously wrong, but the output you get can seem right within the right data set.  This problem also manifested itself in my depth of field effect.  When I first implemented it, I happened upon values that worked against my incorrect depth data.  It wasn’t until I started tweaking focal distance and width that I realized something was very wrong.

Those problems might be specific to me, but I think the real takeaway here is that when you implement something new in graphics, you really need to test a wide range of data to ensure your effect is really doing what you want, and not doing it by circumstance.  If you don’t, a fundamental error can persist in your code base for a long, long time without anyone realizing, and it makes it much harder to add new features when basic code additions don’t do what you should reasonably expect them to.

So, that’s it for this time.  I wish I’d taken some screenshots of output of the errors I talked about, but I think the information is still pretty worthwhile without pretty pictures.  Next time (hopefully soon), I’ll talk about the start of my move to a physically based rendering model, beginning with replacing the specular reflection model.

Back to Square One?

So, in my last post, I said that my big goal was to thread my renderer and to do a better job of writing it against a public interface so that it could be a stand-alone library.  Good goals, but now that they’re basically finished I have a lot of new goals.  However, since that post was made a week+ after the work it described, and it’s now been a month since that post, I think my primary goal is to get better at writing these more regularly.  It might mean that the technology changes in each update is less dramatic, but I think it’ll let me spend more time to properly illustrate what I’m working on at the time and lead to overall better posts.  We’ll see how that turns out.

So, previously I had reworked my rendering pipeline to be multi-threaded, and managed to draw a triangle.  Which is great and terrible at the same time.  But, since the rewrite was still 85% built on my framework from last year, once the core worked in the new thread-safe system it was much less work to re-implement everything else I had last year.  I was quickly able to bring in refactored versions of my previous material and lighting processes and get to a point where I could render a lit scene that was as complex as I cared to hard code (since the editor wasn’t quite ready to go yet).  As you can tell from the image, I did’t care to hard code very much.

So, that’s what you can see; let’s talk about what you can’t see.  Last year, I built my render target class to own a back buffer and a depth buffer, and created a system to copy a depth buffer from one render target into another.  I was told it was strange to not share a single depth buffer across my render targets, but there were cases where I wanted the main geometry buffer for a process and I wanted to change it for current use without changing it for geometry.  There were also cases where I down and up scaled render targets, which necessitated different depth buffers with different sizes; it’s how I achieved my geometry occluded outlines last year.  It worked for me, but it didn’t come without it’s difficulties.  And as I was re-implementing render targets this year, I realized that I overlooked a lot of potential performance gains last year.

I was re-using depth information for geometry occluded outlines, and I was using it to generate lighting information, but I wasn’t actually using it to cut down on overdraw.  And overdraw was a huge problem in Evac (especially the final level, which was huge).  So, I re-factored my render target system to instead keep a pool of back and depth buffers, and a render target now makes associations to the buffers in the pool that it wants.  This allows easy sharing of buffers between targets (and without DirectX complaining about binding a buffer for read and write simultaneously!) in a way that let me do a Z Pre-pass process that is intended to build the shared depth buffer but remain lightweight by only ever engaging the vertex shader.  Once that depth buffer is built, all processes that utilize it can set their depthstencil state to D3D11_COMPARISON_EQUAL to ensure that each screen pixel is engaged by a pixel shader only once.  And since the pixel shader is often where you can find pipeline bottlenecks, optimizing it’s usage can be key to keeping a high performing renderer.  So, ensuring that each process pass at most engages a pixel shader once per pixel is a huge boon and I was very pleased with the results.

The other thing you can tell from the image is that I’ve started work on my post-processing system.  It’s not going to be terribly different from last year’s base system, but I’m looking to implement a new set of effects to give the game a filmic look.  It starts with the depth of field effect you can see here, but there’s a lot more in the works to eventually make the game look good.  I hope.  Also, with the re-organization and re-optimization of the whole system, I’m hoping to have the frame time available to do a LOT more post-processing this year.

So, that’s all for this time.  There were actually a lot of errors going on behind the scenes around this time that looked correct enough to not notice them right away, but which created huge problems when I tried to tweak data.  I’ll probably talk about them next time as I think that they’re pretty interesting, and might be useful to anyone implementing a similar system.  Also, I’ll be talking about my initial work on terrain, so look forward to that.  Hopefully it won’t take a month.

New Year, Old Problems

It’s a new school year, and my senior year, so of course I decided that the most reasonable course of action was to put together a new team to build a new game in a new custom engine.  And I’d heavily rebuild my rendering system, too!  I’m smart like that.  And with new work come new posts, so aren’t you all just so lucky?  Also, this post should have been made last Thursday as I’ve made pretty major strides since the information I’m about to talk about, but… school.

At the end of last year I was fairly happy with what I had accomplished, but there were also a lot of things I wanted to add and/or fix that I just never found the time for.  Story of everyone’s life.  My initial work has targeted two major refactor issues that are unsurprisingly intertwined: making the renderer a standalone library and multithreading.

Two years ago, I saw Reggie Meisler give a talk to Game Engine Architecture Club about a basic render-thread system in DX9 (sorry, I don’t have a link to the slides offhand) and it got me started thinking.  Then last year at GDC, I saw a great talk by Bryan Dudash about utilizing deferred contexts in DX11 and the gears really started turning.  I quickly realized that I could combine the knowledge from those two talks into my existing framework and get a lot of benefit, but I also knew that the messy interface (or lack thereof really) into my renderer prevented me from realistically ensuring thread-safety.  And that’s why that never happened last year.

The general concept can be illustrated in this amazing diagram I drew in MS Paint.

There’s two layers of parallelism in my current system, with the ability to add more later as time and performance dictate.  The first layer is to put the actual rendering on a separate thread that constantly loops, consuming the last command list buffer sent to it.  That was the basis of Reggie’s talk and has been a common technique for a while.  The second layer is to utilize deferred contexts to build the command lists for each pass process in parallel, and is the basic implementation discussed in Bryan Dudash’s GDC presentation.  Of course, currently I’m only utilizing those two layers to draw a triangle into a render target and then composite that render target into the final presentation buffer, which is super impressive and all, but it provides a successful proof of concept that I can move forward from.

It turns out that DX11 does a pretty great job of keeping itself thread-safe as long as you don’t try to utilize a single context over two threads, so the major task in getting this working was in keeping my own data thread-safe.  This necessitated three things: a simple public interface into the renderer, a transfer system for the command list buffers, and a transfer system for entities and assets.  The public interface was simple once I refactored and reorganized my classes, and now allows the renderer to run as a standalone library, fulfilling one of my major initial desires with this refactor.

The command list buffer is a read/write/pivot transfer system where the gameloop side copies to the write buffer, the renderloop side copies from the read buffer, and any copy causes a swap between the target buffer and the pivot.  This does introduce a lock into the system, but I used a Benaphore to keep it as lightweight as possible. Here’s the code:

/*
Project: Graphics
Purpose: Declaration of the command list buffer container
Coder/s: Matt Sutherlin (matt.sutherlin@digipen.edu)
Copyright: “All content © 2013 DigiPen (USA) Corporation, all rights reserved.”
*/

#pragma once

#include "../Definitions/ProcessEnums.hpp"
#include "../Definitions/DirectXIncludes.hpp"
#include "../Renderer/Benaphore.hpp"

struct CommandListLayer
{
  CommandListLayer() {
    m_bIsDirty = false;

    for (unsigned lI = 0; lI < Passes::NumberOf; ++lI)
    {
      m_commandLists[lI] = nullptr;
    }
  }
  ~CommandListLayer() {
    for (unsigned lI = 0; lI < Passes::NumberOf; ++lI)
    {
      SAFE_RELEASE(m_commandLists[lI]);
    }
  }

  bool                m_bIsDirty;
  ID3D11CommandList*  m_commandLists[Passes::NumberOf];
};

class CommandListBuffer
{
public:
  CommandListBuffer() {
    for (unsigned lI = 0; lI < CommandListLayers::NumberOf; ++lI)
    {
      m_layers[lI] = new CommandListLayer();
    }
  }
  ~CommandListBuffer() {
    for (unsigned lI = 0; lI < CommandListLayers::NumberOf; ++lI)
    {
      delete m_layers[lI];
    }
  }
  ID3D11CommandList** GetLayerCommandLists(CommandListLayers::Layer pLayer) {
    if (pLayer == CommandListLayers::Read)
    {
      m_lock.Lock();

      if (m_layers[CommandListLayers::Pivot]->m_bIsDirty)
      {
        m_layers[CommandListLayers::Read]->m_bIsDirty = false;

        std::swap(m_layers[CommandListLayers::Read], m_layers[CommandListLayers::Pivot]);
      }

      m_lock.Unlock();
    }

    return m_layers[pLayer]->m_commandLists;
  }
  void SwapWriteLayer() {
    m_lock.Lock();

    m_layers[CommandListLayers::Write]->m_bIsDirty = true;

    std::swap(m_layers[CommandListLayers::Write], m_layers[CommandListLayers::Pivot]);

    m_lock.Unlock();
  }
private:
  CommandListLayer*   m_layers[CommandListLayers::NumberOf];
  Benaphore           m_lock;
};

The entity/asset system was by far the most complicated to implement, but it came down to a single container class that enqueued data changes (add, update, or remove) until a synchronization was called.  Here is the code:

/*
Project: Graphics
Purpose: Declaration of the transfer buffer container
Coder/s: Matt Sutherlin (matt.sutherlin@digipen.edu)
Copyright: “All content © 2013 DigiPen (USA) Corporation, all rights reserved.”
*/

#pragma once

#include <unordered_map>
#include <queue>
#include <type_traits>

//Q01:  How does this work?
//A01:  The game engine could potentially make requests for adding, removing, or
//      deleting resources at any time.  While the ID3D11Device is thread-free, we 
//      still need to be careful with resource management.
//
//      Adds can create their new resources immediately (and need to do so to return 
//      a resource ID to the caller), but should not be added to traversal lists 
//      while process threads are running.  So we defer that until the next game loop.
//
//      Updates should only occur at the start of a new game loop, so they're deferred 
//      until the game thread calls for a synch.  We need to ensure the data we're 
//      traversing doesn't get changed out from under us, so we synch this before 
//      producer threads start for the frame.
//
//      Deletes have two levels of synchronization to deal with.  Removing the resource 
//      from traversal lists needs to happen at the next game loop and removing it 
//      from memory needs to happen at the next render loop after that.
//
//Q02:  What are the limitations?
//A02:  t_entry objects need to have an UpdateData function, and that function needs to 
//      take a t_data object.
//
//      t_id objects need to be able to be initialized by setting = 0, and need to 
//      properly increment when post-incremented.
//
//      t_entry and t_data objects MUST be pointer types.

template <typename t_entry, typename t_data, typename t_id>
class TransferBuffer
{
  typedef std::pair<t_id, t_entry>                              t_entryPair;
  typedef std::pair<t_id, t_data>                               t_dataPair;
  typedef typename std::unordered_map<t_id, t_entry>::iterator  t_iterator;
private:
  std::unordered_map<t_id, t_entry>   m_entries;
  std::queue<t_entryPair>             m_pendingAdditions;
  std::queue<t_id>                    m_markedDeletions;
  std::queue<t_entry>                 m_pendingDeletions;
  std::queue<t_dataPair>              m_pendingUpdates;
  t_id                                m_nextID;
public:
  TransferBuffer() {
    m_nextID = 0;
  }

  ~TransferBuffer() {

  }

  //This should only ever be called by the game engine!
  t_id AddEntry(t_data pData) {
    t_id lReturnID = m_nextID++;

    t_entry lEntry = new std::remove_pointer<t_entry>::type(pData);
    m_pendingAdditions.push(t_entryPair(lReturnID, lEntry));

    return lReturnID;
  }

  //This should only ever be called by the game engine!
  void RemoveEntry(t_id pID) {
    m_markedDeletions.push(pID);
  }

  //This should only ever be called by the game engine!
  void UpdateEntry(t_id pID, t_data pData) {
    t_data lData = new std::remove_pointer<t_data>::type();
    memcpy(lData, pData, sizeof(std::remove_pointer<t_data>::type));

    m_pendingUpdates.push(t_dataPair(pID, lData));
  }

  //This should only ever be called by parallel producer threads!
  t_iterator GetEntries() {
    return m_entries.begin();
  }

  t_iterator GetEnd() {
    return m_entries.end();
  }

  //This should only ever be called by the synchronous game thread!
  //Should get called once per game loop before threading deferred contexts
  void SynchAdd() {
    unsigned lSizeAdditions = m_pendingAdditions.size();

    for (unsigned lI = 0; lI < lSizeAdditions; ++lI)
    {
      t_entryPair lEntryPair = m_pendingAdditions.front();
      m_pendingAdditions.pop();
      m_entries.emplace(lEntryPair.first, lEntryPair.second);
    }
  }

  //This should only ever be called by the synchronous game thread!
  //Should get called directly after SynchAdd
  void SynchUpdate() {
    unsigned lSizeUpdates = m_pendingUpdates.size();

    for (unsigned lI = 0; lI < lSizeUpdates; ++lI)
    {
      t_dataPair lDataPair = m_pendingUpdates.front();
      m_pendingUpdates.pop();

      auto lIter = m_entries.find(lDataPair.first);

      if (lIter != m_entries.end())
      {
        lIter->second->UpdateData(lDataPair.second);
        delete lDataPair.second;
      }
    }
  }

  //This should only ever be called by synchronous game thread!
  //Should get called directly after SynchUpdate
  void SynchMarkedDelete() {
    unsigned lSizeDeletions = m_markedDeletions.size();

    for (unsigned lI = 0; lI < lSizeDeletions; ++lI)
    {
      t_id lID = m_markedDeletions.front();
      m_markedDeletions.pop();

      auto lIter = m_entries.find(lID);

      if (lIter != m_entries.end())
      {
        t_entry lEntry = lIter->second;
        m_pendingDeletions.push(lEntry);
        m_entries.erase(lIter);
      }
    }
  }

  //This should only ever be called by the parallel consumer thread!
  //Should only get called at the start of a render loop
  void SynchPendingDelete() {
    unsigned lSizeDeletions = m_pendingDeletions.size();

    for (unsigned lI = 0; lI < lSizeDeletions; ++lI)
    {
      t_entry lEntry = m_pendingDeletions.front();
      m_pendingDeletions.pop();
      delete lEntry;
    }
  }
};

And that’s how I solved my big thread-safety issue. The FAQ at the top of the file is worth reading, and while I’m preplanning for t_id to be some kind of GUID functor, I’m just using unsigned int for that type in all cases right now. I welcome any questions, comments, or criticisms of my methodology, but try not to be too hard on me as this is the first time I’ve actually posted code I’ve written.

But that’s all for now. Like I said, I’ve actually managed to take the system much further in the last week, but I’m just swamped right now.  Hopefully I’ll have time to make the next post before that information is also out of date, but no promises.

GoodGraphics25.png, More Particles, Gloss Maps, And Starting Cleanup

I spent the last week down in San Francisco for GDC, and the amount of knowledge gained from the various talks I attended is staggering.  I can’t wait for the videos and slides to get uploaded to the Vault and to start in earnest at trying to get some of those tricks and techniques integrated into my own graphics engine.  I’m thinking that a switch over to generous usage of Compute Shaders and Deferred Contexts should help me speed up performance quite nicely in my architecture.  Hopefully future posts will be able to show if that pans out as expected.

However, until that happens, back in the real world I’m finishing up feature implementation and tying up loose ends heading into gold submission for the game.  Since I’ve been back (and, honestly, a little bit while I was in San Francisco), I’ve been working at improving particle systems, working with artists to get whatever else we can into the engine, and handling lingering technical issues.  Let’s start with particles!  First, a video of my more recent efforts.

Work has progressed pretty steadily on particle systems since they first got implemented.  I’ve fixed a pretty bad heap corruption that the memory manager was hiding (decrement index 0 and then calculate distance from camera!), I’ve actually added camera distance calculation to make alpha blending work properly, billboarded the sprites, written some extra particle operators to deal with special case functionality we want, and then spent a huge amount of time just tweaking values to find really good effects.  It’s the kind of thing that, given time, makes me want to write a genetic algorithm to find good effects of certain type.  You know, rather than me spending hours and hours tinkering until I get a good fire, I can just go to sleep and wake up to some good options.  Dreams, right?  But, the fire is really a test case and not something that’s necessarily planned to be in game, so let’s talk about something that is.  Here’s another video!

One way that we’re looking to add player feedback to the game is to use particle systems to “drain” power from power nodes and then to “push” that power into mechanics objects.  So, Hayden and I wrote an operator to take in a variable control point and have particles update their velocities each frame to direct them towards that point.  The inspiration here is Team Fortress 2’s medic gun.  While we still have some work to do to tighten up the player tracking and to get a nice curve on it like TF2, I still think it’s already looking pretty decent.

The last “big” thing I’ve worked on since the last update is gloss maps.  My artists have wanted to use them for a while, and I finally worked with them to make it happen.  And it was pretty easy, too.  The idea here is to have an additional map that contains the n.s exponent value for the specular equation.  As a result, a single material can have variable shininess across it’s surface, which is pretty cool.  I actually separated out the depth texture from the normals texture in the pre-light accumulation stage a while ago so that the scene normals could store to an RGBA8_UNORM and the depth could be an R32_FLOAT, so it was easy to integrate the gloss maps.  Since they only store a single value per pixel, I was able to stuff them into the alpha channel of the normal maps, change 3 lines in my shaders, and everything worked.  Pretty simple!

Now I just need to implement texture animation (I know that I mentioned I’d already have it finished, but expect it tomorrow?) and then I’m on to doing detection routines for things like available resolutions, MSAA levels, etc to finish outstanding TCRs.  You know, stuff I should have already done a long time ago.  But, I did also get 3D positional sound working in engine this week (have I mentioned I also do the audio programming for this project), so that was also pretty exciting.  And maybe makes up for it?  I’m not sure.

Anyway, that’s everything that’s been happening since the last update.  Oh, and you’ll notice in the second video and the screen capture that Max has finished his gravity flipping mechanic.  So, there should be some cool levels utilizing it soon.  While I plan to continue posting as I make progress on my graphics engine, I’m concerned that the content from here to the end of the semester won’t be too exciting.  Texture animation and bug fixing?  It needs to be done, but it isn’t very flashy.  Maybe I’ll find time to slide in some extra post processing.  We’ll see!

GoodGraphics24.png, Particles, Particles, PATICLEZ!

It seems like I just made one of these, huh?  And while this is just a first pass at the system, and running on programmer art, I’m excited enough to post about it.  After that, hopefully this post isn’t a huge let down.

So, particle effect systems!  I put it off all year because I just had so much stuff on my plate and all of it seemed super important and core to just getting the game to display.  And a good portion of it was features that have since been deprecated due to changes in design direction.  Not that I’m mad, it’s the nature of the beast here at DigiPen and I accept that.  But, now that we’re in the home stretch, it’s become a real crunch to finally get particles done for the huge polish factor they can add.  And after about a week at it (although I had to spend a fair amount of time during that week tracking down what turned out to be two major buffer under runs that were causing huge stability issues), I finally have it working and, I think, good enough to show people.  So, here you go!

I also need to give a huge shout out to Hayden Jackson for all the help he gave me while developing this.  The insight, feedback, and source code was absolutely invaluable, and I never would have designed a system that was nearly as elegant in the given time frame.  So, thanks dude!

Next up is finally integrating my spritesheet animation system from last year, and that might unfortunately be the last graphical feature I have time to add before I need to clean up loose ends and fix outstanding technical requirements before submission.  We’ll see, as I’d really like to find the time to add HDR and SSAO, but I’m trying to be realistic here.  Either way, I’m off to GDC next week and I’m glad I got particles in before that or else I’d have spent all week being driven crazy by it.  Anyone else that’s going, feel free to hit me up for grabbing some drinks!  Otherwise, expect more posts on getting prepped for final submission after I get back.

Edit:  I realized that a screen capture is a terrible way to show off a particle system, so I took this short video of it in action.  Enjoy!