Windows Process Memory Usage Demystified

“how much memory is your process using?” — i bet you were asked that question, or asked it yourself, more times than you can remember. but what do you really mean by memory ?

i never thought it would be hard to find a definitive resource for what the various memory usage counters mean for a windows process. but try it: google “windows task manager memory columns,” and you’ll see confusing, conflicting, inconsistent, unclear explanations of what the different metrics represent. if we can’t even agree on what “working set” or “commit size” means, how can we ever monitor our windows applications successfully?

first, we will need a sample application that will allocate various kinds of memory for our experiments. i’ve written one for this blog post: it is simply called memory . you can find it on github . currently, it supports multiple kinds of allocations: reserve, commit, shareable memory, and more.

to monitor application memory usage, we will use sysinternals vmmap , a long-time favorite on my blog. it offers unparalleled insight into what your application is doing in terms of memory. simply choose a process when launching vmmap, and view memory utilization categorized by type (private, shared, reserved, committed) and purpose (image, heap, stack, mapped file). you can also run it from the command line, for example:

vmmap.exe -p myapp output.csv

image title

armed with these tools, let’s get to business and try to characterize the various kinds of memory usage in windows processes. we must begin with the virtual memory size of the process — the amount of address space that is in use.

virtual memory

windows applications do not access physical memory directly. any address in your application is a virtual address that is translated by the cpu to a physical address when accessed. although it is often the case that there is more virtual memory available than ram to back it up, virtual memory is still limited. on 32-bit windows with a default configuration, each process can allocate up to 2gb of virtual memory. on 64-bit windows, each 64-bit process can allocate up to 128tb of virtual memory (this limit used to be 8tb until windows 8.1).

each page of virtual memory can be in one of three states: free, reserved, and committed:

free pages is available for subsequent allocations (and excluding unusable pages, discussed later).

reserved pages are not available for subsequent allocations, but they are not backed by physical memory. in other words, you may not access reserved pages, and you may not assume that at some point the system will have sufficient physical memory to back them up. for example, try running memory.exe reserve 10000000 to allocate approximately 10tb of reserved memory. this should work just fine (on a 64-bit system, of course), although you probably don’t have enough physical memory to back up 10tb of virtual addresses.

committed pages may be accessed by your application. the system guarantees that when you access a committed page, there will be physical memory to back it up. the physical memory is allocated on-demand, when you first access the page. even though the system doesn’t allocate physical memory immediately, this guarantee implies that there is a system-wide limit on how much memory can be committed by all processes. this limit is called the commit limit . if an allocation would exceed the commit limit, the system does not satisfy it. go ahead and try it: memory.exe commit 10000000 .

to further complicate things, committed memory can be shared with other processes. if two processes share 100mb of physical memory, the 100mb virtual region is committed in both processes, but it only counts once towards the commit limit.

it makes sense to examine the following aspects of a process’ virtual memory usage:

committed bytes . this information is available in vmmap under total > committed and the process > page file bytes performance counter.

reserved bytes . this information is available in vmmap as the delta between total > size and total > committed. it can be calculated as the difference between non-free bytes and committed bytes.

non-free bytes . this information is available in vmmap under total > size, or as the process > virtual bytes performance counter.

free bytes . this information is available in vmmap under free > size. it can also be deduced from the size of the virtual address space (2gb, 3gb, 4gb, 8tb, or 128tb — depending on the system configuration), and the non-free bytes value.

this tells almost the whole story. here’s a statement that at this point might sound fairly accurate:

the non-free bytes value is exactly the amount of virtual memory that is available for subsequent allocations.

unfortunately, it is not entirely accurate. the windows memory manager guarantees (for historical reasons) that new allocations are aligned on a 64kb boundary. therefore, if your allocations are not all divisible by 64kb, some memory regions might be lost for future allocations. vmmap calls them unusable , and it is the only tool that can reliably display them. to experiment, run memory.exe unusable 100 . vmmap will report around 100mb of unusable virtual memory, which is theoretically free and invisible to any other tool. however, that memory cannot be used to satisfy future allocations, so it is as good as dead.

shareable memory

as i noted earlier, physical memory can be shared across multiple processes: more than one process may have a virtual page mapped to a certain physical page. some of these shared pages are not under your direct control, e.g. dll code is shared across processes; some other shared pages can be allocated directly by your code. the reason it’s important to understand shared memory usage is that a page of shared memory might be mistakenly attributed to all processes sharing that page. although it definitely occupies a range of virtual addresses in each process, it’s not duplicated in physical memory.

there’s also a matter of terminology to clarify here. all shared pages are committed, but not all committed pages can be shared. a shareable page must be allocated in advance as part of a section object , which is the kernel abstraction for memory-mapped files and for sharing memory pages across processes. so, to be precise, we can speak of two kinds of shareable memory:

shareable memory that is shared: memory pages that are currently mapped into the virtual address space of at least two processes.

shareable memory that is not currently shared: memory pages that may be shared in the future, but are currently mapped into the virtual address space of fewer than two processes.

note: the terms “private” and “shared” (or “shareable”) memory refer only to committed memory. reserved pages cannot be shared, so it makes no sense to ask whether they are private or shareable.

it makes sense to look at the following per-process data points, to understand which part of its virtual memory is shared (or shareable) with other processes:

private bytes (memory that is not shared or shareable with other processes). this information is available in vmmap under total > private. it is also available as a performance counter process > private bytes. note that some of this committed memory may be backed by the page file, and not currently resident in physical memory.

shareable bytes . this information is available in vmmap under shareable > committed. you can’t tell which of these bytes are actually shared with other processes, unless you settle for the following two data points:

shareable bytes currently resident . this information is available in vmmap under total > shareable ws, but only includes pages that are resident in physical memory. it doesn’t include potentially-shareable pages that happen to be paged out to disk, or that weren’t accessed yet after being committed.

shared bytes currently resident . this information is available in vmmap under total > shared ws, but again only includes pages that are resident in physical memory.

also note that vmmap’s shareable category doesn’t include certain kinds of shareable memory, such as images (dlls). these are represented separately by the image category.

try it out: run memory.exe shareable_touch 100 . you’ll see private bytes unchanged, and shareable bytes go up — even though the allocated memory isn’t currently shared with any other process. shared bytes, on the other hand, should remain the same. you can also try memory.exe shareable 100 — you’ll see the shareable/shared ws values unchanged because physical memory is not allocated unless the committed memory is also accessed.

physical memory

so far, we only discussed the state of virtual memory pages. indeed, free, unusable, and reserved pages have no effect on the amount of physical memory used by the system (other than the data structures that must track reserved memory regions). but committed memory may have the effect of consuming physical memory, too. windows tracks physical memory on a system-wide basis, but there is also information maintained on a per-process level that concerns that process’ individual physical memory usage through its set of committed virtual memory pages, also known as the working set .

windows manages physical memory in a set of lists: active, standby, modified, free, and zero — to name a few. these lists are global to all processes on the system. they can be very important from a monitoring standpoint, but i’ll leave them for another time. if you’re really curious, there’s a great sysinternals tool called rammap that you can explore.

we need to add to our monitoring toolbox the following data points related to process physical memory:

private physical bytes . this refers to the physical pages that are mapped to private committed pages in our process, and is often called the process’ private working set . this information is available in vmmap under total > private ws. it is also available in task manager as memory (private working set).

shareable or shared physical bytes . similarly, these are the physical pages that are mapped to shareable committed pages in our process. we discussed these metrics before when talking about shareable/shared memory (in vmmap, these are under total > shared/shareable ws).

total physical bytes . simply the sum of the previous two metrics. you might be tempted to say that this is the amount of physical memory consumed by our process, which would be accurate if it wasn’t for sharing. this information is available in vmmap under total > total ws, as the process > working set performance counter, and in task manager as working set (memory).

committed bytes not mapped yet to any backing storage (ram or page file) . like i said before, windows doesn’t allocate any physical memory when you commit a page of virtual memory. only when the virtual page is first accessed, windows handles the hardware exception by lazily allocating a page of physical memory. so, you could have committed pages in your process that aren’t currently backed by neither ram nor page file — simply because they were never accessed until now. unfortunately, there is no easy way that i know of to get this information.

you can experiment with the on-demand physical memory allocation by running memory.exe commit 1000 . even though the system-wide commit size was charged 1000mb, you won’t see any change in physical memory usage (e.g. in task manager). but now try memory.exe commit_touch 1000 , which commits memory and makes sure to touch every single page. this time, both the commit size and physical memory usage should go up by 1000mb.

committed bytes not currently resident . these are pages of committed memory that was paged out to disk. if you’re willing to ignore committed pages that weren’t accessed yet, then this metric can be calculated as the difference between vmmap’s total > committed and total > total ws values (or as the difference between the process > page file bytes and process > working set bytes performance counters — recall that process > page file bytes is really the commit size of the process).

kernel memory

finally, your process can indirectly affect the system’s memory usage, too. kernel data structures like files, sockets, and mutexes are created and destroyed when your process requests it. page tables that map virtual to physical addresses are allocated and populated when you commit and access memory pages.

although it is rarely the case that your process would make a significant dent in the kernel’s memory usage, it’s important to monitor the following metrics:

pool bytes . this refers to kernel memory directly attributable to your process, such as data structures for files or synchronization objects. pool memory is further subdivided to paged pool and non-paged pool . the system-wide pool utilization values are available in task manager (under the memory tab), or as the memory > pool paged bytes and memory > pool nonpaged bytes performance counters.

for some kernel objects, the pool allocation is also charged to the owning process. this is the case with i/o completion packets queued to an i/o completion port, which is what you can experiment with by running memory.exe nppool 10000 and inspecting the value of the process > pool nonpaged bytes performance counter. (to quickly inspect performance counters, run typeperf from a command prompt window. for example: typeperf “process(memory)\pool nonpaged bytes” will show you the counter’s value every second.)

page table bytes . mapping virtual addresses to physical addresses requires book-keeping, provided by data structures called page tables . these data structures are allocated in kernel memory. at a high level, mapping a small page of virtual memory (4kb on both x86 and x86_64) requires 4 or 8 bytes of page table space, plus some additional small overhead. because windows is lazy, it doesn’t construct page tables in advance when you reserve or even commit memory. only when you actively access a page, windows will fill the page table entry. page table usage is available in vmmap as page table > size. it would typically be a fairly small value, even if you allocate a lot of memory.

experiment by running memory.exe commit_touch 2000 (committing and touching almost 2gb of memory). on my windows 10 x64 system, the resulting increase in page table bytes was approximately 4mb.

note: because any virtual memory allocation has the potential of requiring page table space eventually, windows used to charge reserved memory to the system commit limit, because it anticipated these reserved pages to eventually become committed and require actual page table space. in windows 8.1 x64 and windows 10 x64, a security mechanism called cfg (control flow guard) requires a 2tb chunk of reserved memory for each process. charging commit for that many pages would be impractical. therefore, on newer versions of windows, reserving memory does not charge commit. you can verify this by running memory.exe reserve 1000000 (to reserve almost 1tb of memory) and note that the system-wide commit limit ( typeperf “memory\committed bytes” ) doesn’t go up considerably.

summary

hopefully, this post explained the key memory monitoring metrics for windows processes. there’s a lot more to say about the internals of windows memory management, and i’m happy to refer you to windows internals, 6th edition for more details. you might also find the testlimit tool useful to check just how far the memory manager is willing to stretch for you.

 

 

 

 

Top