Run-time memory editing with PDB files

March 1, 2023 Technology

This blog post describes the use of PDB files to automatically generate type descriptions for use by our custom editor, replacing an arduous manual step that was required before.

You can read all about our custom run-time editor in a previous blog entry, which describes how we use type information to edit ECS data at run-time.

In a nut-shell: our custom editor uses type description meta-data for all our component data types to draw those components when given the component’s memory address. The type description meta-data includes the type name, size, offset of each field, and more. Previously we authored this type description meta-data manually.

Author: Leon Lubking

The code that needed to be authored manually for each component

The resulting component in the editor

Each field in each custom data structure required adding a line to a special meta-data file. Omitting that line meant everything would still run, but our custom editor wouldn’t be able to interpret the component memory and would just draw it as binary - without being able to present it intelligibly. Since we are creatures of convenience, most of the time that meant the meta-data wasn’t authored until it was really needed, which limited the editor’s utility.

Fortunately, one of our blog readers offered a great suggestion: why not use the PDB files to extract the required meta-data, instead of authoring it manually?

That turned out to be an excellent suggestion.

What is a PDB file?

‘PDB’ stands for ‘Program Database’. It is a file optionally generated by the MSVC compiler during linking. When creating the compiled binary the code generally becomes unreadable for a human, since it is translated for optimal consumption by a computer. The PDB file provides information to help make sense of the compiled binary, and contains much of the information required for the debugger to work.

It includes:

Names and line numbers for data structures, functions, etc.
Type names, sizes, field names, field sizes, field offsets, etc.
…and much more!

This is precisely the information required for our meta-data generation step, we just need to be able to extract the information. For this Microsoft provides the DIA SDK, which we’ll discuss shortly.

Note that this is specific to the MSVC compiler - basically Microsoft Visual Studio. While our engine should run cross-platform, we intend to use the editor features only while authoring, which we do exclusively on Windows. Other compilers do provide similar functionalities to PDB files, but we did not investigate those. Potentially we could include them in the future.

Generating PDB files

Before a PDB file can be used, it needs to be generated during linking.

By default a PDB file is generated for projects built in the Debug configuration in Visual Studio - this is what allows the debugger to work fully on those. It is not generated for the Release configuration. Fortunately, PDB file generation can be enabled for any build configuration by using these compiler arguments:

/Zi /DEBUG

The /Zi argument will cause the PDB file to be generated. The /DEBUG argument will add additional debug symbols and information to the PDB file which is required to generate the meta-data.

Using PDB files does not inherently affect the run-time performance of the compiled code, but doing so might disable certain compiler optimizations in order to make debugging easier. You can prevent this from happening by using the following linker arguments:

/OPT:REF /OPT:ICF

The /OPT:REF argument will eliminate functions and data that are never referenced and /OPT:ICF will remove duplicated code. The /OPT optimizations reduce the size of the generated code and increase program speed, but are recommended only for non-debug builds that generate PDB files since they make stack traces harder to read.

With these arguments added, we observed no run-time performance impact when generating PDB files, which seems to be supported by other’s findings as well.

We noticed a negligible increase in build time for our Release build, but your mileage may vary.

Difference in build time in seconds

Reading a PDB file

Microsoft has provided the Microsoft Debug Interface Access (DIA) SDK to parse PDB files.

The SDK should automatically be installed with Visual Studio, located, for the author, in the Program Files (x86) folder. See the ‘Getting Started’ page for more information on using it in a custom project.

Note that the mentioned msdiaXX.dll file may have a different version number than listed on the page.

To be able to compile we also needed to modify our Visual Studio installation to include an additional component. Namely, the ‘Visual C++ ATL Support' component in the ‘Individual Components’ tab, accessed via the 'Visual Studio Installer’ application.

The latest version at time of writing, newer versions may be available at time of reading

Once this was done, we were able to compile the sample project, located in the ‘Samples’ folder of the SDK.

Running the sample project is done by executing the Dia2Dump.exe file from the command line and providing the path of a PDB file. If this fails with a message "CoCreateInstance failed - HRESULT = 80040154" then you will need to register the msdiaXXX.dll by running the following command from a command prompt with administrator rights:

"C:\Windows\SysWOW64\regsvr32.exe" "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\DIA SDK\bin\amd64\msdia140.dll"

Note that the exact paths and DLL name may differ based on your target platform, Visual Studio version, etc.

The sample project takes a PDB file and dumps all the information contained in it to a log. This provides a lot of insight about the data available in a PDB file, and the source code is also a good reference for using the SDK.

Putting Dia2Dump.exe to work on one of our PDB files yields a whopping 45mb of text data when dumped to a text file. Interestingly, it includes information about ECS component structures that are used by the code:

The data retrieved from a PDB file about the comp_window_t component, using Dia2Dump.exe

Custom implementation

Now familiar with the data that can be provided by the DIA SDK, it’s time to write a custom implementation.

The documentation provides the information required to load the SDK and open PDB files, at which point the SDK exposes a massive amount of ‘symbols’ that can be queried for information.

Basically, all the data in the PDB is a ‘symbol’ - functions, types, enums, fields, variables, etc. Each symbol has different data associated with it, and each symbol can have other symbols related to it. For example the symbol for a field can have a type symbol associated with it, which describes the type for that field. The DIA SDK provides methods for querying a symbol for more information. Symbols can also have children. For example a symbol for a struct can have children for its fields.

Unfortunately the DIA SDK documentation does not do a very good job at explaining which information lives where, and so it becomes a process of trial and error to see what additional data can be extracted from a symbol. The PrintSymbol functionality from the samples is very useful for this purpose, as it can dump (nearly) all of this information for a given symbol.

Our implementation primarily makes use of the ‘global scope’ symbol returned when loading a PDB file (as described in the documentation), and then uses the 'find_children()' method to iterate over its child symbols.

When iterating over the global scope’s children, a symbol type can be provided as a filter. This is recommended given the potentially enormous amount of child symbols. In this case ‘SymTagEnum’ is used, to parse enums.

In a separate loop, not shown in the example, ‘SymTagUDT' is used to filter all user-defined types. This makes it possible to extract all relevant information related to data structures and their fields, including:

Struct and field names and their sizes

Field offsets from the start of the struct

Enum types, size, backing type, keys and values

Note that the process of extracting this information isn’t entirely straightforward. Liberal use of the PrintSymbol functionality is required to make sense of the available information. Some information is expressed inconsistently, for example the size may be returned in bits or bytes by the same SDK method, based on symbol context. Some of the type names will be formulated differently from those returned by, for example, 'typeid(SomeType).name()' and require some additional processing to fit your use case. If you have incremental linking enabled, some symbols may even appear more than once with different contents.

Fortunately, with some wrangling, all the information required for our editor implementation was able to be extracted.

Adding custom meta-data

In addition to the basic information about the data structures, it would also be useful to add custom drawing attributes to fields, so that those fields can be drawn in specific ways. For example, adding meta-data to a float field to have the editor limit the input to a min / max range.

Our editor manipulates run-time memory directly, so being able to limit input to values that will not crash the application is particularly valuable.

Adding additional meta-data that can be parsed by the PDB is not simple, though. The compiler strips any fields from the compiled binary that are not used, and the PDB only contains the absolute minimum information that is relevant for debugging. Both in order to keep build times and file sizes to a minimum. Decades of iteration have made this a particularly thorough process.

In other words, any attempt to add additional information to the PDB file that isn’t actually used at run-time, will be met with fierce resistance from the compiler.

Some experimentation revealed that 'static const int' fields with a valid field initializer are not stripped from the PDB. Even their integer value can be retrieved from the PDB. At run-time 'static const int' fields are initialized only once during the application’s lifetime, when their type information is first loaded. They take up no memory in whichever data structure they are placed, effectively making them free to add. While parsing the PDB file they are still found nested inside whatever structure or namespace they reside in.

This allows the 'static const int' fields to be used to describe additional information for other fields in the same scope. We’ve made macro’s for exactly that purpose. For example, these are the macro’s to limit a numeric field type to a min / max range in the editor:

Adding custom meta-data

Our editor manipulates run-time memory directly, so being able to limit input to values that will not crash the application is particularly valuable.

In other words, any attempt to add additional information to the PDB file that isn’t actually used at run-time, will be met with fierce resistance from the compiler.

Usage:

What this will effectively generate is the following:

To extract the 'static const int' information from the PDB files we filter for 'SymTagData' symbols when iterating over the global scope children. The found symbols will include the added 'static const int' fields. We then parse out the target field name, attribute type (MIN, MAX, etc.) and the value and add this to our meta-data, which our custom editor can use to draw the target field in a specific way.

Typedefs

Defining additional meta-data for a field is especially useful for types that are defined via typedefs. Typedefs are fully resolved during compilation, and so any fields using the typedef type will show up as being the fully resolved type in the PDB.

For example, if you have defined 'typedef float color_t[4]' then a field 'color_t m_my_color;' will show up in the PDB as being of type 'float[4]' - not 'color_t'.

This provides no way to differentiate between the base type and the typedef type, and so the editor would show a float array with 4 values, instead of a color picker.

However, we can annotate the field using the technique mentioned above, to draw the field in a specific way. For this case we created a 'COLOR(m_my_color)' macro, which leaves a 'static const int' field that the editor picks up on and uses to draw 'm_my_color' using a color picker.

Considerations

The downside of the 'static const int' approach is that the data that can be included in the 'static const int' field is limited to integer values, and whatever you can add to its field name, without making that field name invalid. Unfortunately the 'static const' field trick doesn't seem to work for types like string; they won’t show up in the PDB or the value cannot be retrieved, therefore the integer field is the best we can do.

It may also be a compiler specific quirk, so if we try other compilers we’ll need to adapt. For now it works well though.

Results

After switching to PDB files to generate type description meta-data, our custom editor now requires no manual work whenever data types are changed or added, making it much easier to use.

We can even include custom meta-data per field to draw the field in a specific way, using a minimalistic and localized macro implementation.

This does come at the cost of parsing the PDB files, which we now do the first time you open the editor window. This takes anywhere between 0.5 to 5 seconds, depending on the build configuration used.

In the future we can look into generating the meta-data from the PDB files as part of the build process, writing the meta-data to a custom file that can be loaded at run-time much quicker. This would also mean we don’t need to include the PDB files when we send someone an editor-enabled build; as long as they have the meta-data files, the editor can work for them.

In the future we may also look into supporting other compilers, once the necessity arrives.

Newsletter

Contact

Run-time memory editing with PDB files

March 1, 2023 Technology

What is a PDB file?

Generating PDB files

Reading a PDB file

Custom implementation

Adding custom meta-data

Adding custom meta-data

Typedefs

Considerations

Results

Keep me posted

Stay updated on our journey to create massive, procedurally generated open worlds.