In this chapter, we'll learn how the page.o object file becomes page.exe executable file.
link.exe is our Linker program on Windows. Although we can use Clang (which has it's own Linker module) to create page.exe file, we'll have more control and get to understand how the Linker works if we use the Linker program directly.
Let's try to give page.o file to the Linker to see what it does:
C:\Users\teles\Desktop\C>link page.o
Microsoft (R) Incremental Linker Version 14.31.31104.0
Copyright (C) Microsoft Corporation. All rights reserved.
page.o : error LNK2019: unresolved external symbol __imp__GetSystemInfo@4 referenced in function _main
page.o : error LNK2019: unresolved external symbol ___acrt_iob_func referenced in function _printf
page.o : error LNK2019: unresolved external symbol ___stdio_common_vsprintf referenced in function __vsnprintf_l
page.o : error LNK2019: unresolved external symbol ___stdio_common_vfprintf referenced in function __vfprintf_l
LINK : error LNK2001: unresolved external symbol _mainCRTStartup
page.exe : fatal error LNK1120: 5 unresolved externals
link.exe first goes through the Symbol Table of page.o and checks for undefined external symbols. In the above output, it's generating an error because it's not able to find those undefined external symbols anywhere.
We need to provide another object file to link.exe where it can find these symbols. But we don't have any other object files, right!? We simply wrote a page.c source file and compiled it down to page.o. But we included all the necessary header files in our source code and that should suffice, right?
In most cases, the header files that we include simply indicate the function signature (like number of function parameters, parameter types and return type) to the Compiler and sometimes the header files also tell the Compiler on how to name the symbols in the Assembly source code that the Compiler creates. Header files don't actually provide the implementation of the functions but just the definition of those functions.
Let's take an example of GetSystemInfo()
function that we've used in our page.c source file.
page.c
:
#include <stdio.h>
#include <windows.h>
int main(void){
SYSTEM_INFO si;
GetSystemInfo(&si);
printf("The page size for this system is %u bytes.\n", si.dwPageSize);
return 0;
}
GetSystemInfo() function is from Win32 APIs so it must be defined somewhere in windows.h
header file or some other header file which might be included in windows.h
.
To find out, we can search for "GetSystemInfo" string in the Preprocessor output of page.c file here!
There we'll see that GetSystemInfo function is actually defined in sysinfoapi.h file which was included in winbase.h which in turn was included in windows.h file!
And the definition of GetSystemInfo function looks like this:
__declspec(dllimport) void __stdcall GetSystemInfo(
LPSYSTEM_INFO lpSystemInfo);
That's it. There is nothing more to find about GetSystemInfo function in header files.
The __declspec(dllimport)
prefix is Windows-specific and is just an annotation to the function. Whenever the C Compiler on Windows sees this annotation, it adds __imp_
prefix to the symbol name of that function to let the Linker know that this function needs to be imported from a DLL file. So now you know how did the compiler come up with the name __imp__GetSystemInfo@4
in the Symbol Table. And that @4
suffix says that the parameter passed to this function has 4 bytes. __imp__GetSystemInfo@4
is a Decorated Name.
You can read about Decorated Names here: https://docs.microsoft.com/en-us/cpp/build/reference/decorated-names
And read about __declspec(dllimport)
annotation here: https://docs.microsoft.com/en-us/cpp/build/importing-function-calls-using-declspec-dllimport
So, all that the Linker has for GetSystemInfo function name, is the symbol name __imp__GetSystemInfo@4
.
The Linker's job is to search (resolve) that symbol. But as we saw earlier, Linker generated an error when we gave it only the page.o file. Linker doesn't know where to find that symbol. We need to give it a few more hints to help resolve those unresolved symbols.
What file can we provide to the Linker to find our unresolved symbols?
Well, we can create another object file of our own where we have the implementation of those unresolved symbols and give it to the Linker, or, we can pick one of the standard object files that the Microsoft C Runtime Library (CRT) (or glibc on Linux) provides! We certainly don't want to implement our own printf
or GetSystemInfo
functions so Microsoft CRT is the way to go. But where do we find these standard object files from Microsoft?
Introducing, Static Libraries!
Static Libraries are a collection of object files. A Static Library is a file that literally contains an array of object files inside it. And we can use a special tool called lib.exe to create these Static Libraries or explore the contents of existing Static Libraries (and on Linux, we have a tool called ar which does the same job as lib.exe). And on Windows, Static Libraries have .lib extension (on Linux, they have .a extension which stands for archive).
Our link.exe program is a Static Linker. The job of this program is to prepare all headers, search for symbols and also grab code from Static Libraries for unresolved symbols to create the final executable. This process is called Static Linking. The extension of this would be the Dynamic Linking where the Linking process happens when the executable actually starts running. Dynamic Linking process is carried out by Dynamic Linker which is part of the Loader (Operating System). And the libraries that the Dynamic Linker links to (at runtime) are called Dynamic Libraries or Shared Libraries (like .dll or .so files). Dynamic Libraries have completely different file structure as compared to Static Libraries. But our point of discussion here is about Static Libraries.
Static Libraries are essential if we want to build a functioning executable file. Even if we want to dynamically link to a DLL file, we need a special type of Static Library called Import Library for that. Let's take examples of COFF Standard Libraries and COFF Import Libraries to understand this a bit more!
COFF Standard Libraries and COFF Import Libraries, both have .lib extension and have similar file structure. And both are considered as Static Libraries because they are used by the Static Linker. But both are used for different purposes during Linking.
In the previous chapter, if you remember, we had a Section in our page.o object file called .drectve
. We had described it as "Compiler adds some Linker metadata in this Section". Let's see what Linker metadata is present in that .drectve
Section.
dumpbin /section:.drectve /rawdata page.o
Output:
Microsoft (R) COFF/PE Dumper Version 14.31.31104.0
Copyright (C) Microsoft Corporation. All rights reserved.
Dump of file page.o
File Type: COFF OBJECT
SECTION HEADER #E
.drectve name
0 physical address
0 virtual address
2A size of raw data
5EA file pointer to raw data (000005EA to 00000613)
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
100A00 flags
Info
Remove
1 byte align
RAW DATA #E
00000000: 20 2F 44 45 46 41 55 4C 54 4C 49 42 3A 75 75 69 /DEFAULTLIB:uui
00000010: 64 2E 6C 69 62 20 2F 44 45 46 41 55 4C 54 4C 49 d.lib /DEFAULTLI
00000020: 42 3A 75 75 69 64 2E 6C 69 62 B:uuid.lib
Summary
2A .drectve
We can see that there is "/DEFAULTLIB:uuid.lib" string in that Section. "/DEFAULTLIB:uuid.lib" is a command-line option to the link.exe program. More here about the /defaultlib option: https://docs.microsoft.com/en-us/cpp/build/reference/defaultlib-specify-default-library
"/DEFAULTLIB:uuid.lib" asks link.exe to use uuid.lib as a default Library.
If we use /verbose option in link.exe, we can see exactly what it's doing under the hood. So let's try to pass only the page.o file again to the Linker but with /verbose
command-line option this time:
link /verbose page.o
:
Output:
Microsoft (R) Incremental Linker Version 14.31.31104.0
Copyright (C) Microsoft Corporation. All rights reserved.
Starting pass 1
Processed /DEFAULTLIB:uuid.lib
Searching libraries
Searching C:\Program Files (x86)\Windows Kits\10\\lib\10.0.18362.0\\um\x86\uuid.lib:
Finished searching libraries
Searching libraries
Searching C:\Program Files (x86)\Windows Kits\10\\lib\10.0.18362.0\\um\x86\uuid.lib:
Finished searching libraries
Searching libraries
Searching C:\Program Files (x86)\Windows Kits\10\\lib\10.0.18362.0\\um\x86\uuid.lib:
Finished searching libraries
Finished pass 1
page.o : error LNK2019: unresolved external symbol __imp__GetSystemInfo@4 referenced in function _main
page.o : error LNK2019: unresolved external symbol ___acrt_iob_func referenced in function _printf
page.o : error LNK2019: unresolved external symbol ___stdio_common_vsprintf referenced in function __vsnprintf_l
page.o : error LNK2019: unresolved external symbol ___stdio_common_vfprintf referenced in function __vfprintf_l
LINK : error LNK2001: unresolved external symbol _mainCRTStartup
Unused libraries:
C:\Program Files (x86)\Windows Kits\10\\lib\10.0.18362.0\\um\x86\uuid.lib
page.exe : fatal error LNK1120: 5 unresolved externals
We see that link.exe is actually searching for uuid.lib file even if we haven't mentioned anything about it. It's doing so because it's pickup up this Library name from .drectve
Section of page.o file! But unfortunately, our unresolved symbols aren't defined in uuid.lib file so we still get the same error that we saw previously but this time we know which Libraries link.exe is searching for by default (and that default library is mentioned in .drectve
Section of the object file).
As we mentioned earlier, lib.exe program can be used to explore these .lib files. And again, .lib files just contain a collection of object files inside them.
To see what object files uuid.lib Library file contains, we can use the following command:
lib /list uuid.lib
Output:
Microsoft (R) Library Manager Version 14.31.31104.0
Copyright (C) Microsoft Corporation. All rights reserved.
d:\os\obj\x86fre\onecore\admin\wsc\src\client\ui\wscapi\objfre\i386\iwscapi_i.obj
d:\os\obj\x86fre\onecore\base\diagnosis\pdi\pla\src\uuid\objfre\i386\interface.obj
d:\os\obj\x86fre\onecore\base\cluster\admin\uuid\objfre\i386\cluadmex_i.obj
d:\os\obj\x86fre\onecore\base\diagnosis\pdi\scripteddiag\engine\uuid\objfre\i386\interface.obj
d:\os\obj\x86fre\onecore\base\pnp\hotplug\uuid\objfre\i386\comhotplug_i.obj
...
...
We can even extract a copy of one these object files like this:
lib /extract:d:\os\obj\x86fre\onecore\admin\wsc\src\client\ui\wscapi\objfre\i386\iwscapi_i.obj /out:iwscapi_i.obj uuid.lib
And by the way, lib.exe and link.exe tools search for these libraries in the path mentioned in LIB environment variable.
You can view this path from Developer Command Prompt like:
echo %LIB%
For me the output was:
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.31.31103\lib\x86;C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\ucrt\x86;C:\Program Files (x86)\Windows Kits\10\\lib\10.0.18362.0\\um\x86
As we know now, we need to provide a library file to link.exe so that it can search that library and resolve our unresolved symbols.
And to our convenience, we have a bunch of .lib files to choose from!
Here is a list of Library files! This list includes both the Standard Libraries and the Import Libraries: https://docs.microsoft.com/en-us/cpp/c-runtime-library/crt-library-features
And the way our page.exe file is built (by the Linker) depends on what Library files we choose at this stage.
Let's take libcmt.lib and msvcrt.lib library files for example.
The above Microsoft docs description for libcmt.lib file says that libcmt.lib Library "statically links the native CRT startup into your code".
"Statically links" means the Linker will literally copy the relevant code from the object files present inside that Library and puts it into our final executable (page.exe). The opposite of this would be the scenario where the Linker only stores the reference to a DLL file into the final executable and does not put the complete function implementation code into the executable. The import library helps the Linker do this.
Hence, as a result, the size of a statically linked executable file is typically greater than the size of those executables that are built with Import libraries.
Let's see what difference does it make when we try to build page.exe with libcmt.lib and msvcrt.lib:
link /verbose libcmt.lib page.o
The above command creates our page.exe file! No "unresolved symbol" errors!
The output logs will be as shown below:
Microsoft (R) Incremental Linker Version 14.31.31104.0
Copyright (C) Microsoft Corporation. All rights reserved.
Starting pass 1
Processed /DEFAULTLIB:uuid.lib
Searching libraries
Searching C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.31.31103\lib\x86\libcmt.lib:
Found _mainCRTStartup
Loaded libcmt.lib(exe_main.obj)
Found _atexit
Referenced in libcmt.lib(exe_main.obj)
Loaded libcmt.lib(utility.obj)
Found ___security_init_cookie
Referenced in libcmt.lib(exe_main.obj)
Loaded libcmt.lib(gs_support.obj)
...
...
...
In the Linker logs, we can also find information about where did the Linker find our unresolved symbols.
(View the full logs of the above command here!)
In the logs, we can see that libcmt.lib file has all the code related to printf()
and hence the Linker statically links it into our page.exe file. The following log snippet shows where exactly did the Linker find the printf() code implementation:
Searching C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\ucrt\x86\libucrt.lib:
Found ___acrt_iob_func
Referenced in page.o
Loaded libucrt.lib(_file.obj)
Found ___stdio_common_vsprintf
Referenced in page.o
Referenced in libvcruntime.lib(undname.obj)
Loaded libucrt.lib(output.obj)
But for GetSystemInfo()
, libcmt.lib file forwards the Linker request to kernel32.dll. The following logs suggest that:
Searching C:\Program Files (x86)\Windows Kits\10\\lib\10.0.18362.0\\um\x86\kernel32.lib:
Found __imp__GetSystemInfo@4
Referenced in page.o
Loaded kernel32.lib(KERNEL32.dll)
GetSystemInfo() will have a dynamic link to an address inside kernel32.dll Dynamic Library file which will have the implementation for that function.
Let's check the size of the page.exe file using dir page.exe
and it's around 99.8 kB. The Section sizes can be viewed using dumpbin page.exe
and we'll get the output as:
Microsoft (R) COFF/PE Dumper Version 14.31.31104.0
Copyright (C) Microsoft Corporation. All rights reserved.
Dump of file page.exe
File Type: EXECUTABLE IMAGE
Summary
2000 .data
7000 .rdata
1000 .reloc
11000 .text
Let's see what happens when we use msvcrt.lib Library file!
link /verbose msvcrt.lib page.o
View the complete logs of the above command here!
dir page.exe
says that page.exe has a total size of 8.5 kB (as compared to 99.8 kB when built with libcmt.lib Library)!. And dumpbin page.exe
gives the following output:
Microsoft (R) COFF/PE Dumper Version 14.31.31104.0
Copyright (C) Microsoft Corporation. All rights reserved.
Dump of file page.exe
File Type: EXECUTABLE IMAGE
Summary
1000 .data
1000 .rdata
1000 .reloc
1000 .text
You can see that the Section sizes and the overall file size have drastically reduced when using msvcrt.lib file as our Library file for building page.exe.
That's because, msvcrt.lib gives references to DLL files for many of our unresolved symbols and doesn't provide the actual code implementation.
printf()
is linked to following DLL files (as opposed to previous example where the printf() code was found in an object file):
Searching C:\Program Files (x86)\Windows Kits\10\lib\10.0.18362.0\ucrt\x86\ucrt.lib:
Found ___acrt_iob_func
Referenced in page.o
Loaded ucrt.lib(api-ms-win-crt-stdio-l1-1-0.dll)
Found ___stdio_common_vsprintf
Referenced in page.o
Loaded ucrt.lib(api-ms-win-crt-stdio-l1-1-0.dll)
GetSystemInfo() is of course linked to kernel32.dll file as done previously:
Searching C:\Program Files (x86)\Windows Kits\10\\lib\10.0.18362.0\\um\x86\kernel32.lib:
Found __imp__GetSystemInfo@4
Referenced in page.o
Loaded kernel32.lib(KERNEL32.dll)
But it's important to note that msvcrt.lib is not an Import Library because it doesn't redirect the Linker to DLL files directly. It contains objects that redirect the Linker to Import Libraries and then those Import Libraries redirect the Linker to DLL files. An example of an Import Library would be the kernel32.lib file shown in the above log snippet. The whole purpose of kernel32.lib file is to redirect the Linker to kernel32.dll. Import Libraries contain something called as pseudo-object files which we'll explore in the next chapter!
Now we know the subtle difference between Standard Libraries (containing objects) and Import Libraries (containing pseudo-objects that directly reference DLL files).
dumpbin also provides a lot of interesting information on Library files. For example, we can view the Symbol Table of each of the object files included in msvcrt.lib file:
dumpbin /symbols msvcrt.lib
The truncated output looks like:
...
COFF SYMBOL TABLE
000 00FD7862 ABS notype Static | @comp.id
001 00000011 ABS notype Static | @feat.00
002 00000000 UNDEF notype External | __vsnprintf
003 00000000 UNDEF notype WeakExternal | _vsnprintf
Default index 2 Alias record
String Table Size = 0x1B bytes
COFF SYMBOL TABLE
000 00FD7862 ABS notype Static | @comp.id
001 00000011 ABS notype Static | @feat.00
002 00000000 UNDEF notype External | __vsnprintf_s
003 00000000 UNDEF notype WeakExternal | _vsnprintf_s
Default index 2 Alias record
String Table Size = 0x1F bytes
COFF SYMBOL TABLE
000 00FD7862 ABS notype Static | @comp.id
001 00000011 ABS notype Static | @feat.00
002 00000000 UNDEF notype External | __mbslen
003 00000000 UNDEF notype WeakExternal | __tcsclen
Default index 2 Alias record
String Table Size = 0xE bytes
...
The word "Static" mentioned in the above Symbol Table output has a different meaning than what is being described in this chapter. That "Static" word used in Symbol Table is a storage class and has nothing to do with "Static Linker". This was explained in previous chapter when discussing about Symbol Table. For more on Storage Class, here: https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#storage-class
One distinguishing difference between .lib standard library and import library is the output of the lib /list <file>
command. For a standard library, it lists a lot of object files but for an import library, there will be a few object files and the rest will be DLL file names.
For example, vcruntime.lib
is a DLL import library. The output of lib /list vcruntime.lib
is:
(output truncated)
Microsoft (R) Library Manager Version 14.31.31104.0
Copyright (C) Microsoft Corporation. All rights reserved.
d:\a01\_work\38\s\Intermediate\vctools\vcruntime.nativeproj__1422463853\objr\x86\softmemtag.obj
d:\a01\_work\38\s\Intermediate\vctools\vcruntime.nativeproj__1422463853\objr\x86\chandler3_noexcept.obj
VCRUNTIME140.dll
VCRUNTIME140.dll
VCRUNTIME140.dll
VCRUNTIME140.dll
VCRUNTIME140.dll
VCRUNTIME140.dll
VCRUNTIME140.dll
VCRUNTIME140.dll
VCRUNTIME140.dll
VCRUNTIME140.dll
VCRUNTIME140.dll
...
In this chapter, we've learnt how to use link.exe to link page.o with other object files or DLL files using Standard and Import Libraries and how to use lib.exe to explore the content of these Libraries!
In the next chapter, we'll learn how exactly the Linker works and how it builds the page.exe file. We'll also learn about the internal structure of page.exe and how it's built!