Run-time memory editing with PDB files
March 1, 2023
This blog post describes the use of PDB files to automatically generate type descriptions for use by our custom editor, replacing an arduous manual step that was required before.
You can read all about our custom run-time editor in a previous blog entry, which describes how we use type information to edit ECS data at run-time.
In a nut-shell: our custom editor uses type description meta-data for all our component data types to draw those components when given the component’s memory address.The type description meta-data includes the type name, size, offset of each field, and more. Previously we authored this type description meta-data manually.
Author: Leon Lubking
What is a PDB file?
‘PDB’ stands for ‘Program Database’. It is a file optionally generated by the MSVC compiler during linking. When creating the compiled binary the code generally becomes unreadable for a human, since it is translated for optimal consumption by a computer. The PDB file provides information to help make sense of the compiled binary, and contains much of the information required for the debugger to work.
Names and line numbers for data structures, functions, etc.
Type names, sizes, field names, field sizes, field offsets, etc.
…and much more!
This is precisely the information required for our meta-data generation step, we just need to be able to extract the information. For this Microsoft provides the DIA SDK, which we’ll discuss shortly.
Note that this is specific to the MSVC compiler - basically Microsoft Visual Studio. While our engine should run cross-platform, we intend to use the editor features only while authoring, which we do exclusively on Windows. Other compilers do provide similar functionalities to PDB files, but we did not investigate those. Potentially we could include them in the future.
Generating PDB files
Before a PDB file can be used, it needs to be generated during linking.
By default a PDB file is generated for projects built in the Debug configuration in Visual Studio - this is what allows the debugger to work fully on those. It is not generated for the Release configuration. Fortunately, PDB file generation can be enabled for any build configuration by using these compiler arguments:
The /Zi argument will cause the PDB file to be generated.The /DEBUG argument will add additional debug symbols and information to the PDB file which is required to generate the meta-data.
Using PDB files does not inherently affect the run-time performance of the compiled code, but doing so might disable certain compiler optimizations in order to make debugging easier. You can prevent this from happening by using the following linker arguments:
The /OPT:REF argument will eliminate functions and data that are never referenced and /OPT:ICF will remove duplicated code.The /OPT optimizations reduce the size of the generated code and increase program speed, but are recommended only for non-debug builds that generate PDB files since they make stack traces harder to read.
With these arguments added, we observed no run-time performance impact when generating PDB files, which seems to be supported by other’s findings as well.
We noticed a negligible increase in build time for our Release build, but your mileage may vary.
Reading a PDB file
Microsoft has provided the Microsoft Debug Interface Access (DIA) SDK to parse PDB files.
The SDK should automatically be installed with Visual Studio, located, for the author, in the Program Files (x86) folder. See the ‘Getting Started’ page for more information on using it in a custom project.
Note that the mentioned msdiaXX.dll file may have a different version number than listed on the page.
To be able to compile we also needed to modify our Visual Studio installation to include an additional component. Namely, the ‘Visual C++ ATL Support' component in the ‘Individual Components’ tab, accessed via the 'Visual Studio Installer’ application.
Once this was done, we were able to compile the sample project, located in the ‘Samples’ folder of the SDK. Running the sample project is done by executing the Dia2Dump.exe file from the command line and providing the path of a PDB file. If this fails with a message "CoCreateInstance failed - HRESULT = 80040154" then you will need to register the msdiaXXX.dll by running the following command from a command prompt with administrator rights: "C:\Windows\SysWOW64\regsvr32.exe" "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\DIA SDK\bin\amd64\msdia140.dll" Note that the exact paths and DLL name may differ based on your target platform, Visual Studio version, etc. The sample project takes a PDB file and dumps all the information contained in it to a log. This provides a lot of insight about the data available in a PDB file, and the source code is also a good reference for using the SDK. Putting Dia2Dump.exe to work on one of our PDB files yields a whopping 45mb of text data when dumped to a text file. Interestingly, it includes information about ECS component structures that are used by the code:
Now familiar with the data that can be provided by the DIA SDK, it’s time to write a custom implementation.The documentation provides the information required to load the SDK and open PDB files, at which point the SDK exposes a massive amount of ‘symbols’ that can be queried for information.
Basically, all the data in the PDB is a ‘symbol’ - functions, types, enums, fields, variables, etc.Each symbol has different data associated with it, and each symbol can have other symbols related to it. For example the symbol for a field can have a type symbol associated with it, which describes the type for that field. The DIA SDK provides methods for querying a symbol for more information.Symbols can also have children. For example a symbol for a struct can have children for its fields.
Unfortunately the DIA SDK documentation does not do a very good job at explaining which information lives where, and so it becomes a process of trial and error to see what additional data can be extracted from a symbol. The PrintSymbol functionality from the samples is very useful for this purpose, as it can dump (nearly) all of this information for a given symbol.
Our implementation primarily makes use of the ‘global scope’ symbol returned when loading a PDB file (as described in the documentation), and then uses the 'find_children()' method to iterate over its child symbols.
Adding custom meta-data
In addition to the basic information about the data structures, it would also be useful to add custom drawing attributes to fields, so that those fields can be drawn in specific ways. For example, adding meta-data to a float field to have the editor limit the input to a min / max range.
Our editor manipulates run-time memory directly, so being able to limit input to values that will not crash the application is particularly valuable.
Adding additional meta-data that can be parsed by the PDB is not simple, though. The compiler strips any fields from the compiled binary that are not used, and the PDB only contains the absolute minimum information that is relevant for debugging. Both in order to keep build times and file sizes to a minimum. Decades of iteration have made this a particularly thorough process.
In other words, any attempt to add additional information to the PDB file that isn’t actually used at run-time, will be met with fierce resistance from the compiler.
Some experimentation revealed that 'static const int' fields with a valid field initializer are not stripped from the PDB. Even their integer value can be retrieved from the PDB.At run-time 'static const int' fields are initialized only once during the application’s lifetime, when their type information is first loaded. They take up no memory in whichever data structure they are placed, effectively making them free to add. While parsing the PDB file they are still found nested inside whatever structure or namespace they reside in.
This allows the 'static const int' fields to be used to describe additional information for other fields in the same scope. We’ve made macro’s for exactly that purpose. For example, these are the macro’s to limit a numeric field type to a min / max range in the editor:
// Min value the field can have in the editor UI.
#define EDIT_MIN(FIELD, VALUE) static const int m_meta_data__MIN__##FIELD## = VALUE;
// Max value the field can have in the editor UI.
#define EDIT_MAX(FIELD, VALUE) static const int m_meta_data__MAX__##FIELD## = VALUE;
float m_numeric_field_a = 123.456f;
What this will effectively generate is the following:
static const int m_meta_data__MIN__m_numeric_field_a = 0;
static const int m_meta_data__MAX__m_numeric_field_a = 999.9999f;
float m_numeric_field_a = 123.456f;
To extract the 'static const int' information from the PDB files we filter for 'SymTagData' symbols when iterating over the global scope children. The found symbols will include the added 'static const int' fields.We then parse out the target field name, attribute type (MIN, MAX, etc.) and the value and add this to our meta-data, which our custom editor can use to draw the target field in a specific way.
Defining additional meta-data for a field is especially useful for types that are defined via typedefs. Typedefs are fully resolved during compilation, and so any fields using the typedef type will show up as being the fully resolved type in the PDB.
For example, if you have defined 'typedef float color_t' then a field 'color_t m_my_color;' will show up in the PDB as being of type 'float' - not 'color_t'.
This provides no way to differentiate between the base type and the typedef type, and so the editor would show a float array with 4 values, instead of a color picker.
However, we can annotate the field using the technique mentioned above, to draw the field in a specific way.For this case we created a 'COLOR(m_my_color)' macro, which leaves a 'static const int' field that the editor picks up on and uses to draw 'm_my_color' using a color picker.
The downside of the 'static const int' approach is that the data that can be included in the 'static const int' field is limited to integer values, and whatever you can add to its field name, without making that field name invalid. Unfortunately the 'static const' field trick doesn't seem to work for types like string; they won’t show up in the PDB or the value cannot be retrieved, therefore the integer field is the best we can do. It may also be a compiler specific quirk, so if we try other compilers we’ll need to adapt. For now it works well though.
After switching to PDB files to generate type description meta-data, our custom editor now requires no manual work whenever data types are changed or added, making it much easier to use.
We can even include custom meta-data per field to draw the field in a specific way, using a minimalistic and localized macro implementation.
This does come at the cost of parsing the PDB files, which we now do the first time you open the editor window. This takes anywhere between 0.5 to 5 seconds, depending on the build configuration used.
In the future we can look into generating the meta-data from the PDB files as part of the build process, writing the meta-data to a custom file that can be loaded at run-time much quicker. This would also mean we don’t need to include the PDB files when we send someone an editor-enabled build; as long as they have the meta-data files, the editor can work for them.
In the future we may also look into supporting other compilers, once the necessity arrives.