Developing a Memory Leak Detector for Windows
Preface
Recently, I finished reading "The Self-Taught Programmer: Linking, Loading, and Libraries" (referred to as "Linking" hereafter) and gained a lot of insights. I thought about whether I could create some related code snippets. Coincidentally, I learned about a memory leak detection tool called Visual Leak Detector for Windows Visual Leak Detector. This tool tracks memory allocation and deallocation by replacing the DLL interface responsible for memory management in Windows. Therefore, I have decided to reference Visual Leak Detector (referred to as VLD hereafter) and create a simplified memory leak detection tool to deepen my understanding of DLL linking.
Background Knowledge
The book "Linking" provides a detailed explanation of the linking principles for executable files under Linux and Windows. In Windows, the executable file format is called Portable Executable (PE) file. The interpretation of DLL files is as follows:
DLL, short for Dynamic-Link Library, is the equivalent of shared objects in Linux. The DLL mechanism is widely used in the Windows system, to a great extent, even the structure of the Windows kernel relies heavily on the DLL mechanism. DLL files and EXE files in Windows are essentially the same concept, as they are both binary files in PE format. The only difference is that the PE file header contains a symbol that indicates whether the file is an EXE or a DLL. The extension of DLL files is not necessarily .dll, it could also be something else like .ocx (OCX control) or .CPL (Control Panel program).
There are also Python extension files like .pyd. The concept of memory leak detection that we are discussing here is called the symbol export/import table in DLLs.
Symbol Export Table
When a PE needs to provide some functions or variables to other PE files, we call this behavior symbol exporting.
Simply put, in Windows PE, all exported symbols are centrally stored in a structure called the Export Table, which provides a mapping between symbol names and symbol addresses. Symbols that need to be exported should be annotated with the modifier __declspec(dllexport)
.
Symbol Import Table
Symbol import table is the key concept here, which corresponds to the symbol export table. Let's first take a look at the concept explanation:
If we use functions or variables from a DLL in a program, we refer to this behavior as symbol importing.
The structure in Windows PE that stores the symbols of variables and functions that modules need to import, as well as information about their location, is called the Import Table. When Windows loads a PE file, one of the tasks is to determine the addresses of all the functions that need to be imported and adjust the elements in the Import Table to the correct addresses. This allows the program to locate the actual addresses of the functions and make the necessary calls at runtime by querying the Import Table. The most important structure in the Import Table is the Import Address Table (IAT), which stores the actual addresses of the imported functions.
By now, you might have already guessed how we are going to implement the memory leak detection :) That's right, it's by hacking the import table. Specifically, we will modify the addresses of the memory allocation and deallocation functions in the import table of the modules we want to monitor, replacing them with our own custom functions. This way, we will be able to track every memory allocation and deallocation made by the modules, giving us the freedom to perform the desired checks.
For more detailed knowledge about DLL linking, you can refer to the book "Linking" or other materials.
Memory Leak Detector
After understanding the principle, the next step is to implement memory leak detection based on that principle. The following explanation will be based on my own implementation, which I have uploaded to my Github: LeakDetector.
Replace Function
Let's take a look at the key function, located in RealDetector.cpp:
Let's analyze this function, as the comment says, the purpose of this function is to change the address of a certain function in the IAT to the address of another function. Let's take a look at lines 34-35:
idte = (IMAGE_IMPORT_DESCRIPTOR*)ImageDirectoryEntryToDataEx((PVOID)importModule,
TRUE, IMAGE_DIRECTORY_ENTRY_IMPORT, &size, §ion);
The ImageDirectoryEntryToDataEx
function can return the address of a certain structure in the file header of a module. The IMAGE_DIRECTORY_ENTRY_IMPORT
specifies the import table structure, so the returned idte
points to the import table of the module.
The code in lines 36-40 is checking the validity of idte
. In line 41, idte->FirstThunk
points to the actual Import Address Table (IAT). Therefore, lines 41-48 are used to search for the module that contains the function to be replaced based on the module name. If the module is not found, it means that the function from that module is not being called, and an error will be prompted and returned.
After finding the module, naturally, we need to locate the function to be replaced. On lines 55-62, we open the module to which the function belongs, and on line 64, we find the address of the function. Since the IAT does not save names, we need to first locate the function based on its original address, and then modify that address. Lines 68-80 are doing exactly that. Once the function is successfully found, we simply modify the address to replacement
.
So far, we have successfully replaced the functions in IAT.
Module and Function Names
Although we have implemented the replacement of the IAT function patchImport
, this function requires specifying the module name and function name. How can we know which module and function are used for memory allocation and deallocation in the program? To investigate this issue, we need to use the tool Dependency Walker on Windows. Create a new project in Visual Studio, use new
to allocate memory in the main
function, compile in Debug mode, and then use depends.exe
to open the compiled exe file. You will see a similar interface as shown below (using my project LeakDetectorTest as an example):
It can be seen that LeakDetectorTest.exe uses the malloc
and _free_dbg
functions from uscrtbased.dll (not shown in the image). These two functions are the ones we need to replace. Please note that the actual module function names may vary depending on your Windows and Visual Studio versions. In my case, I am using Windows 10 and Visual Studio 2015. What you need to do is use depends.exe to check the actual functions being called.
Analyzing Call Stack
To record memory allocation, it is necessary to record the call stack information at the time. Here, I don't intend to provide a detailed guide on how to obtain the current call stack information in Windows. The relevant function is RtlCaptureStackBackTrace
, and there are many related resources available online. You can also take a look at the printTrace
function in my code here.
Detecting Memory Leaks
At this point, we have collected all the Dragon Balls, and now it's time to officially summon Shenron.
I want to create a feature that can detect memory leaks locally (this is different from VLD, which performs global detection and supports multi-threading). So, I added another layer of encapsulation called LeakDetector
on top of the actual replacement function class RealDetector
and exposed the interface of LeakDetector
to the user. To use it, simply construct a LeakDetector
, which will replace the function and start detecting memory leaks. When the LeakDetector
is destroyed, it will restore the original function, terminate the memory leak detection, and print the results of the memory leak detection.
Test with the following code:
#include "LeakDetector.h"
#include <iostream>
using namespace std;
void new_some_mem()
{
char* c = new char[12];
int* i = new int[4];
}
int main()
{
auto ld = LDTools::LeakDetector("LeakDetectorTest.exe");
new_some_mem();
return 0;
}
The code directly new
some memory, without releasing it and then exits directly. The program prints the following result:
============== LeakDetector::start ===============
LeakDetector init success.
============== LeakDetector::stop ================
Memory Leak Detected: total 2
Num 1:
e:\program\github\leakdetector\leakdetector\realdetector.cpp (109): LeakDetector.dll!LDTools::RealDetector::_malloc() + 0x1c bytes
f:\dd\vctools\crt\vcstartup\src\heap\new_scalar.cpp (19): LeakDetectorTest.exe!operator new() + 0x9 bytes
f:\dd\vctools\crt\vcstartup\src\heap\new_array.cpp (15): LeakDetectorTest.exe!operator new[]() + 0x9 bytes
e:\program\github\leakdetector\leakdetectortest\leakdetectortest.cpp (12): LeakDetectorTest.exe!new_some_mem() + 0x7 bytes
e:\program\github\leakdetector\leakdetectortest\leakdetectortest.cpp (19): LeakDetectorTest.exe!main()
f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl (74): LeakDetectorTest.exe!invoke_main() + 0x1b bytes
f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl (264): LeakDetectorTest.exe!__scrt_common_main_seh() + 0x5 bytes
f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl (309): LeakDetectorTest.exe!__scrt_common_main()
f:\dd\vctools\crt\vcstartup\src\startup\exe_main.cpp (17): LeakDetectorTest.exe!mainCRTStartup()
KERNEL32.DLL!BaseThreadInitThunk() + 0x24 bytes
ntdll.dll!RtlUnicodeStringToInteger() + 0x253 bytes
ntdll.dll!RtlUnicodeStringToInteger() + 0x21e bytes
Num 2:
e:\program\github\leakdetector\leakdetector\realdetector.cpp (109): LeakDetector.dll!LDTools::RealDetector::_malloc() + 0x1c bytes
f:\dd\vctools\crt\vcstartup\src\heap\new_scalar.cpp (19): LeakDetectorTest.exe!operator new() + 0x9 bytes
f:\dd\vctools\crt\vcstartup\src\heap\new_array.cpp (15): LeakDetectorTest.exe!operator new[]() + 0x9 bytes
e:\program\github\leakdetector\leakdetectortest\leakdetectortest.cpp (11): LeakDetectorTest.exe!new_some_mem() + 0x7 bytes
e:\program\github\leakdetector\leakdetectortest\leakdetectortest.cpp (19): LeakDetectorTest.exe!main()
f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl (74): LeakDetectorTest.exe!invoke_main() + 0x1b bytes
f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl (264): LeakDetectorTest.exe!__scrt_common_main_seh() + 0x5 bytes
f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl (309): LeakDetectorTest.exe!__scrt_common_main()
f:\dd\vctools\crt\vcstartup\src\startup\exe_main.cpp (17): LeakDetectorTest.exe!mainCRTStartup()
KERNEL32.DLL!BaseThreadInitThunk() + 0x24 bytes
ntdll.dll!RtlUnicodeStringToInteger() + 0x253 bytes
ntdll.dll!RtlUnicodeStringToInteger() + 0x21e bytes
The program correctly identified the two instances where memory was allocated but not released, and printed out the complete call stack information. The required functionality has been completed up to this point.
Conclusion
When you are not familiar with program linking, loading, and libraries, you may be confused about how to find functions in shared libraries, let alone replacing the library's functions with our own. Here, we take detecting memory leaks as an example and discuss how to replace functions in Windows DLLs. For a more detailed implementation, you can refer to the source code of VLD.
Also, I want to say that "The Self-Cultivation of Programmers: Linking, Loading, and Libraries" is really a good book. Just expressing my heartfelt admiration, not promotional content.
Original: https://wiki.disenone.site/en
This post is protected by CC BY-NC-SA 4.0 agreement, should be reproduced with attribution.
Visitors. Total Visits. Page Visits.
This post is translated using ChatGPT, please feedback if any omissions.