HLSL Syntax Highlighting for Sublime Text 3

Don’t worry, this’ll be a quick one.

If you write shaders, then you’ve certainly dealt with the issue of what to write them in.  Since there’s no true shader IDE (And seriously, why isn’t there?  They’re mainstream enough at this point that I can’t imagine it’s a niche market anymore, which is what I assume is the reason for ignoring it.), you find a text editor that’s lightweight and customizeable enough to help you get the job done as well as possible and you just deal with it.  And in a lot of cases, someone eventually writes a basic syntax highlighter for your editor of choice, and most of us are happy just to get anything that we embrace it and carry on.  I experienced it with NShader on Visual Studio, with MJP’s file for NotePad++, and with what I started with for Sublime Text 3 (sorry to whoever wrote it, I honestly don’t remember where it came from at this point).

In the year or so that I’ve been using ST3, I’ve found it easy enough to customize and extend it that I’ve been highly motivated to actually do those things.  Part of that work has been building my own syntax file for HLSL.  Now, my syntax file isn’t overly complex or even as complete as I’d like it to be, but… regex (Sidenote:  if someone much better at regex than me looks at this and laughs at my pitifulness, I fully welcome help making improvements!).  However, what I did was define function definitions and callsites, and that opens up an interesting possibility in ST3 beyond just more syntax highlighting; with that information, if you put your shader files into a sublime-project file, you get automatically hooked up to ST3’s Goto Definition framework.  Here’s what that looks like:

And, once you have this working, it’s a short jump to getting a library-wide autocomplete functioning.  Unfortunately, I cannot share my autocomplete indexing solution, but I can share my syntax file.  Here it is!  Hopefully you enjoy it, and again I fully welcome suggestions for improvements.  Also, here is my theme file, if you’re interested in a very HLSL-specific color coding, but it’s absolutely not required.

Enum Madness!

Last time I wrote a class for a bitfield and a macro that allowed us to automatically size the bitfield storage based on size of flag set and platform storage sizes.  And that’s great and all, but we only went from something like Bitflag<unsigned int> m_flags to something like Bitflag<BITFLAG_SIZE(28)> m_flags.  It’s better from the perspective of what I was writing about in that last post, but you still have a hardcoded 28 sitting there, completely detached from the flag set it is trying to describe.  Wouldn’t it be great if you could feed your flag set directly into the BITFLAG_SIZE macro, and then changes to the flag set automatically change the macro, which changes the type?  Why, yes, yes it would.

So, can we do it?  Absolutely!  But, before we get to that answer, we’re going to take a detour through enums.

I love enums.  Especially unscoped enums wrapped inside of their own namespaces.  I use them all the time.  Most of that usage is to describe entries in a statically sized array, along the lines of this:

namespace CommandListLayers
{
  enum Layer
  {
    Read,
    Pivot,
    Write,
    NumberOf
  };
}

I have a system that uses an array of 3 command lists to move render data between the game thread and the render thread; the read layer, the write layer, and the pivot layer.  I can initialize the array with CommandListLayers::NumberOf, and any code I write can deal with each of the layers via the associated enumerator.  It’s clean and it’s easy.  I really like the NumberOf trick, but it’s also kind of tedious to always insert it at the end.  Or, being the stupid human that I am, maybe I forget to do it.  Who knows!  So, one day I came up with this dumb macro.

#define ARRAY_ENUM_BUILDER(name1, name2, ...) \
  namespace name1 { \
    enum name2 { \
      __VA_ARGS__, \
      NumberOf \
    }; \
  }

And that allows the enum declaration to look like this.

ARRAY_ENUM_BUILDER(
  CommandListLayers, Layer, 
  Read,
  Pivot,
  Write
)

So, I declared the namespace and enum name, all enumerator entries, and the NumberOf gets tacked onto the end with the correct value.  Great!  So, what does this have to do with my Bitflag class and the BITFLAG_SIZE macro?  Well, given the enum stuff I was already doing in my code, the next logical step was to do something like this.

namespace ProcessToggleFlags
{
  enum Flags
  {
    Lighting = (1 << 0),
    DepthOfField = (1 << 1),
    Wireframe = (1 << 2),
    DebugLines = (1 << 3),
    DebugText = (1 << 4),
    NumberOf = 5
  };
}

So, this is a little less great.  I have to manually assign a value to NumberOf since it would otherwise take the next integral value beyond (1 << 4), which isn’t 5, which is what I would want NumberOf to be in this case.  So, the macro as it exists for building enums that describe arrays isn’t going to cut it.  That’s where this template code comes in!

namespace enum_helpers
{
  template <unsigned long long T>
  struct enum_size
  {
    static const unsigned long long count = enum_size<(T >> 1)>::count + 1;
  };

  template <>
  struct enum_size<0>
  {
    static const unsigned long long count = 0;
  };
}

Using that template code, I can now create a new macro specifically to build enums for bitfields.  And that macro looks like this.

#define BITFLAG_ENUM_BUILDER(name1, name2, ...) \
  namespace name1 { \
    enum name2 : unsigned long long { \
      __VA_ARGS__, \
      Last, \
      NumberOf = enum_helpers::enum_size<Last - 1>::count \
    }; \
  }

Which allows my enum to becomes this.

BITFLAG_ENUM_BUILDER(
  ProcessToggleFlags, Flags, 
  Lighting = (1 << 0),
  DepthOfField = (1 << 1),
  Wireframe = (1 << 2),
  DebugLines = (1 << 3),
  DebugText = (1 << 4)
)

It works more or less the same as the ARRAY_ENUM_BUILDER macro, but it tacks on a Last entry with the sole purpose of being the parameter into the enum_size template, giving us the correct value for NumberOf when dealing with bit flags.  So, that’s great, right?  Now we can move from Bitflag<BITFLAG_SIZE(28)> m_flags to Bitflag<BITFLAG_SIZE(ProcessToggleFlags::NumberOf)> m_flags.  And that’s definitely better.  More or less where I want to be with this.

However, there are still some things I’d like to improve.  It would be nice if I could get the element count from __VA_ARGS__ or iterate over the contents.  Then I could generate the type of the enum, insert the shift values, and any necessary suffixes on the 1’s from inside the macro.  And it isn’t like it’s completely undoable, but it moves into the realm of really gross macro code.  Code I will probably end up writing for myself, but it’s not overly scaleable so I didn’t want to include it here.

But, that’s it.  This template and 2 macros make my life a lot easier.  Hopefully it will do the same for someone else.  Here is the download link if you want all of the source: EnumHelpers.hpp.  And if anyone has suggestions for improvements (especially for doing the things I mentioned in the previous paragraph!), I’d love to hear those.

Until next time!

Oh, Hey There: The Return (Also, I Do Things With Templates And Bitfields That May Or May Not Be Dumb)

Going long periods between blog posts has always been fairly par for the course for me, but this last break was pretty excessive even by my standards.  Two years is really just too long!  Of course, shortly after that last post I ended up getting a job at 343 Industries (which is, incidentally, amazing!) and I can’t really blog about what I do at work, which lead to said hiatus.  What are you going to do?  Go two years between blog posts, apparently.

Anyway, the last two years have predominantly been writing HLSL, and that’s been great and I love my job, but recently I’ve been starting to worry that my C++ is getting rusty.  Can’t let that happen, how would I feel superior to other programmers if I’m not pro at C++?  Right?  Right!  So, with Halo 5 shipped I’ve had a little more free time and a thought occurred to me that got me jump started back into writing some “real” code.  It’s a little bit of utility around a bitfield class, and I think that it’s useful, but I also couldn’t find any real reference to anyone else having done it.  That either means that I’m an absolute genius or there are very legitimate reasons why no one does what I’m about to show and I’m just not seeing them.  Which feels pretty likely, but you tell me!

So, I started with a bitfield class that just used an unsigned int as storage and had pretty basic bit and bulk getters and setters.  Nothing super fancy, but effective for what I was doing with it.  The next logical step was to template the storage type, and that was easy, which gave me the following code.

template <typename t_field>
class Bitflag
{
public:
  //Default Constructor
  Bitflag() : m_flags(0) {}

  //Initial Value Constructor
  Bitflag(t_field pFlags) : m_flags(pFlags) {}

  //Bit Get
  bool Get(t_field pIndex) const {
    return ((m_flags & pIndex) == 0) ? false : true;
  }

  //Bulk Get
  t_field Get() const {
    return m_flags;
  }

  //Bit Set
  void Set(t_field pIndex, bool pState) {
    // Optimized based on information found at 
    // https://graphics.stanford.edu/~seander/bithacks.html#ConditionalSetOrClearBitsWithoutBranching
    // Safe to squelch this warning

    #pragma warning(push)
    #pragma warning(disable : 4804)

    m_flags = (m_flags & ~pIndex) | (-pState & pIndex);

    #pragma warning(pop)
  }

  //Bulk Set
  void Set(t_field pFlags) {
    m_flags = pFlags;
  }
private:
  t_field   m_flags;
};

And for the purpose of what I’m actually blogging about, the Bitflag class never actually gets any fancier or more complicated.  Instead, I looked at what I had, and I realized that rather than really using the flexibility of the templated type to optimize storage size for the class, I just got lazy and slapped every usage with unsigned int.  Which just took me back to where I was before I even templated the class.  The hell, right?  This lead me to the question, “Could I write code that, given the size of the flag set I want to be able to store in a Bitflag, would always set the templated type to the smallest appropriate type?”  And if the answer was yes, then it could serve a few purposes; actually optimize my storage size, automatically change if necessary as the size of the represented flag set changed, automatically change if necessary as I compiled on other platforms where storage sizes might be different.  That sounded great, so I dove into it, and it turns out that the answer is indeed yes.

I’ll start with the code that I ended up writing, and then I’ll explain what it’s doing, why I had it do that, and where it might go next.

#define BITFLAG_SIZE(val) BitflagHelpers::bitflag_type_selector<val>::value_type

namespace BitflagHelpers
{
  static const int g_bits_per_byte        = 8;

  static const int g_undefined_ushort     = -1;
  static const int g_undefined_uint       = -2;
  static const int g_undefined_ulong      = -3;
  static const int g_undefined_ulonglong  = -4;

  template <int t>
  struct bitflag_type
  {
    typedef int type;
  };

  // Be careful with this case when it comes to serialization
  template <>
  struct bitflag_type<sizeof(unsigned char) * g_bits_per_byte>
  {
    typedef unsigned char type;
  };

  // We protect from doubled specialization in the case that a type is 
  // the same size as the previous type by setting that instantiation 
  // to a negative global value.  Remove the warning for specing a 
  // signed value into an unsigned type.  Do something better later?
  #pragma warning(push)
  #pragma warning(disable : 4309)

  template <>
  struct bitflag_type<(sizeof(unsigned short) != sizeof(unsigned char)) 
    ? (sizeof(unsigned short) * g_bits_per_byte) : (g_undefined_ushort)>
  {
    typedef unsigned short type;
  };

  template <>
  struct bitflag_type<(sizeof(unsigned int) != sizeof(unsigned short))
    ? (sizeof(unsigned int) * g_bits_per_byte) : (g_undefined_uint)>
  {
    typedef unsigned int type;
  };

  template <>
  struct bitflag_type<(sizeof(unsigned long) != sizeof(unsigned int))
    ? (sizeof(unsigned long) * g_bits_per_byte) : (g_undefined_ulong)>
  {
    typedef unsigned long type;
  };

  template <>
  struct bitflag_type<(sizeof(unsigned long long) != sizeof(unsigned long))
    ? (sizeof(unsigned long long) * g_bits_per_byte) : (g_undefined_ulonglong)>
  {
    typedef unsigned long long type;
  };

  #pragma warning(pop)

  template <int t>
  struct bitflag_type_selector
  {
    typedef 
      typename std::conditional<(t <= sizeof(unsigned char) * g_bits_per_byte), 
        bitflag_type<sizeof(unsigned char) * g_bits_per_byte>::type,
      typename std::conditional<(t <= sizeof(unsigned short) * g_bits_per_byte), 
        bitflag_type<sizeof(unsigned short) * g_bits_per_byte>::type,
      typename std::conditional<(t <= sizeof(unsigned int) * g_bits_per_byte), 
        bitflag_type<sizeof(unsigned int) * g_bits_per_byte>::type,
      typename std::conditional<(t <= sizeof(unsigned long) * g_bits_per_byte), 
        bitflag_type<sizeof(unsigned long) * g_bits_per_byte>::type,
      bitflag_type<sizeof(unsigned long long) * g_bits_per_byte>::type>::type>::type>::type>::type value_type;
  };
}

So, the idea is that when you have a Bitflag variable, rather than specifying a type, you give it the BITFLAG_SIZE macro with the size of the flag set you want to be able to store.  So, instead of something like Bitflag<unsigned int> flags, you’d write something like Bitflag<BITFLAG_SIZE(28)> flags.  Under the hood, the macro uses a set of templates that take advantage of the C/C++ language standard that doesn’t define specific sizes for unsigned integral types, just relations; it says that unsigned char <= unsigned short <= unsigned int <= unsigned long <= unsigned long long.  Everything else works because of those relationships.

The bitflag_type specializations all check to make sure that any two adjacent types don’t have the same size, and set the specialization to a special value in that case to prevent double specialization, which would cause a compile failure.  In the case of Win64, unsigned int and unsigned long are both 32 bits, so the unsigned long spec ends up being -3 instead of 32.  And then it never gets used as a result, which is perfectly fine.

The final piece was the bitflag_type_selector, which makes use of std::conditional to allow the template specializations to be assigned to ranges.  Without that, it’d be pretty tedious to write all the code that’d allow anything but exact size matches to the types to be paired to the proper specialization of bitflag_type.  So, yay for std::conditional!

One thing to watch out for here is data serialization for a Bitflag that’s using an unsigned char for its storage.  Take the case of a flag mask of 33.  That will get serialized as 33 by any basic serialization scheme for an unsigned short, unsigned int, unsigned long, or unsigned long long, which is great.  But, for an unsigned char, it will see 3 and 3, which probably isn’t what you wanted.  It’s solvable for sure, but I feel it’s worth mentioning.  I did look into using uint8_t, but it turns out that this is very implementation specific, and a lot of implementations are just typedef’s of unsigned char anyway.

While not a requirement of this setup by any means, I like to store my flag sets in enumerations.  So, for me, the next step was to be able to feed the enumeration into the BITFLAG_SIZE macro and always get the right size.  I ended up doing that (with more templates), and that will be the subject of the next blog post.  One that hopefully comes sooner than this one did!  I guess we’ll see, I tend to have a problem keeping up with my desired posting schedule.

But that is the end of this post.  Hopefully you found this useful, and hopefully I’m not insane and/or stupid.  I welcome any feedback, and you are certainly welcome to use the code provided in whatever project you want.  If you do, I’d love to know about it!  Here’s the file if you just want to download it instead of copy/pasting the various blocks I posted above: Bitflag.hpp.

I’d also like to thank Brennan Conroy and Robert Francis for dealing with a full day of my inane ramblings and providing useful insight while I worked on this.  I probably wouldn’t be making this post if it wasn’t for their help.

How I Spent My Summer Vacation

Last summer I got the opportunity to intern at Blizzard and while it’s something that I’ve mentioned, I haven’t really delved into the experience or what I worked on at all.  That is, until now!  Because features I worked on are now shipping and I am super stoked to see them and to share them.

Before I get started, I will say that in the interest of not violating anything I signed with Blizzard, I won’t be talking too deeply about technical implementation.  On the other hand, while I’m very proud of what I did during my internship, none of it is anything I would say is cutting edge and therefore it’s probably pretty easy to replicate without me walking you through it.

So, my internship was as an engine programmer on Team 1 (Starcraft 2/Heroes of the Storm), and my work was primarily in graphics.  I did a wide range of things; working on the core renderer, writing some HLSL, working new features into the art pipeline, and doing a lot of bug fixing.  It was a very rewarding experience, and an amazing time in my life.  One of the bigger features I implemented was occlusion based unit outlining, and it’s something that’s seeing pretty widespread usage in the Heroes technical alpha right now.

The concept here is pretty straight forward.  Designers wanted to be able to tag objects as occluders or occludees (or neither), and then have any portion of an occludee that’s occluded by an occluder be outlined.  And they wanted it done in team color.  Up to this point I hadn’t ever used the stencil buffer in my work, but I was familiar enough with it to see it as a logical choice to handle this.  Occluders and occludees were each masked with different bits of the stencil buffer, the blur would then mask a third stencil bit, and the final copy/resolve would only operate where there was an overlap of the occluder/blur bits and no overlap with the occludee bit.  That made the effect work, but there were a lot of additional considerations in regards to performance and how much it actually helped players that I won’t really be talking about.  And that may or may not be that interesting anyway.

Here’s a video where you can see my effect in action!

While I worked on a lot of other things during my internship, that was by far the biggest feature, and the one I can most readily point at and say, “I did that!”.  And it’s shipped.  I cannot stress enough how rad it is to see something I did be so prevalent in an amazing game like Heroes of the Storm.

So, that’s it for Blizzard work.  I have a number of posts to make about work on my own renderer, and those should be coming soon.  I made a lot of fixes, tweaks, and updates to shaders and general math that improved things a lot in the final month or so of the semester and I’m pretty excited to share them as well.  But, I’m also just getting started with summer classes to finish out my degree, so it might be a little while before I get them written up.  We’ll see!

I Have No Clue What I’m Doing! (Part 1 Of… A Lot)

So, this is a series that I imagine will continue for… ever.  But it will definitely be something that pops up a lot in the early post-school years.  And would have popped up a lot during school if I had thought to approach it this way at the time.  But I didn’t.  Yet another way in which I have no clue what I’m doing.  Haha?  But, in all seriousness, the idea of this post (and the many that will follow) will be to illustrate how things that I thought to be correct or good ideas earlier in the blog turned out to be very bad ideas after implementation and thorough testing and iteration happened.  Who would have guessed that I wasn’t going to get everything right the first time always?  …Everyone.

This first entry in the series is in regards to my entity transfer buffer, and my general threading model considerations, as described in a blog post from earlier this year.  In it, I devoted a bit of code to solving concurrency concerns as they related to updating renderable data from the core engine into my rendering library and to maintaining data that might be getting rendered after the core tried to delete it.  To the second point, more experience with how DirectX 11 command buffers store “in-flight” data quickly made it clear that I didn’t need to handle this at all.  To the first point, the code presented didn’t even fully solve the problem it was trying to address (the way I was buffering add/remove/update commands into the container still had data race issues), plus it introduced a memory leak in the buffered update, and in general presented a terrible calling convention where I expected the external caller to allocate a data struct that I would internally delete.  Just… truly horrific stuff.  But, on top of all of that, I’ve come to realize that a lot of this code was written to solve a problem I shouldn’t be trying to deal with in the first place.

Our engine was built as a synchronous core with potentially asynchronous systems.  At least for this project, attempting to write a thread-safe interface into my rendering library was absolutely overkill since there weren’t any threading related issue with transferring data from the core into my library.  By making my code only as complicated as it needed to be, rather than as complicated as it could be, I made it a lot more stable and functional.  Of course, this opens up another line of questioning; if I wanted to clean up my code and make my library publicly available, wouldn’t it be smart to have a thread-safe interface?  And to that, I’d say… maybe.  It might be nice, but unless you’re making a truly professional grade, absolutely everything must perform at maximum performance, tip top engine, I’m not sure that making the core architecture asynchronous is a great idea.  For smaller scale engines like the one that we built this year, it makes a lot more sense to keep the core simple and let each system handle it’s own threading as it sees fit.  You still get a lot of good performance and generally decent scalability this way, without all of the headache and hassle of managing an entire engine’s worth of thread ordering, syncing, etc.

In the end, this is what my TransferBuffer class ended up being:


template <typename t_entry, typename t_data, typename t_id>
class TransferBuffer
{
typedef typename std::unordered_map<t_id, t_entry>::iterator t_iterator;
private:
 std::unordered_map<t_id, t_entry> m_entries;
 t_id m_nextID;
public:
 TransferBuffer() {
   m_nextID = 0;
 }

 ~TransferBuffer() {
 }

 //This should only ever be called by the game engine!
 t_id AddEntry(t_data pData) {
   t_id lReturnID = m_nextID++;

   t_entry lEntry = pData->CreateEntity();
   m_entries.emplace(lReturnID, lEntry);

   return lReturnID;
 }

 //This should only ever be called by the game engine!
 void RemoveEntry(t_id pID) {
   auto lIter = m_entries.find(pID);

   if (lIter != m_entries.end())
   {
     t_entry lEntry = lIter->second;
     delete lEntry;
     m_entries.erase(lIter);
   }
 }

 //This should only ever be called by the game engine!
 void UpdateEntry(t_id pID, t_data pData) {
   auto lIter = m_entries.find(pID);

   if (lIter != m_entries.end())
   {
     lIter->second->UpdateData(pData);
   }
 }

 //This should only ever be called by parallel producer threads!
 t_iterator GetEntries() {
   return m_entries.begin();
 }

 t_iterator GetEnd() {
   return m_entries.end();
 }

 //This should only be called from the interface to expose mesh data for physics!
 t_entry GetFromID(t_id pID) {
   auto lIter = m_entries.find(pID);
   t_entry lEntry = nullptr;

   if (lIter != m_entries.end())
   {
     lEntry = lIter->second;
   }

   return lEntry;
 }
};

And then, of course, this class didn’t really need to be a template at all.  Since all entity and entitydata objects inherit from the same base IEntity and EntityData types, I was able to make an array of TransferBuffer<IEntity*, EntityData*, unsigned> to store my various entity types.  And this allowed me to remove gross switch statements from each operation that my entity manager had to perform, instead accessing the array based on an enum that defined index by entity type.  So, in the end, a lot less code, a lot more stability, and nothing really lost in the translation.

And, that’s it for the first installment of me being incredibly wrong about things.  In other news, I gave a talk about my actual, finished multi-threaded rendering system at Game Engine Architecture Club last month, so once that video gets uploaded to YouTube expect a post about it with links to my slides.  Also, I recently got into the Heroes of the Storm tech alpha and was delighted to see features that I wrote last summer in heavy use in the game (!!!), so also expect a post about that in the very near future.  Stay tuned for those updates; otherwise, it’s the final push through finals and into graduation, followed by plenty of sleep!