2

See, I wanted to measure memory usage of my C++ program. From inside the program, without profilers or process viewers, etc.

Why from inside the program?

  1. Measurements will be done thousands of times—must be automated; therefore, having an eye on Task Manager, top, whatever, will not do
  2. Measurements are to be done during production runs—performance degradation, which may be caused by profilers, is not acceptable since the run times are non-negligible already (several hours for large problem instances)

note. Why measure at all? The only reason to measure used memory (as reported by the OS) as opposed to calculating “expected” usage in advance is the fact that I can not directly, analytically “sizeof” how much does my principal data structure use. The structure itself is

unordered_map<bitset, map<uint16_t, int64_t> >

these are packed into a vector for all I care (a list would actually suffice as well, I only ever need to access the “neighbouring” structures; without details on memory usage, I can hardly decide which to choose)

vector< unordered_map<bitset, map<uint16_t, int64_t> > >

so if anybody knows how to “sizeof” the memory occupied by such a structure, that would also solve the issue (though I'd probably have to fork the question or something).

Environment: It may be assumed that the program runs all alone on the given machine (along with the OS, etc. of course; either a PC or a supercomputer's node); it is certain to be the only one program requiring large (say > 512 MiB) amounts of memory—computational experiment environment. The program is either run on my home PC (16GiB RAM; Windows 7 or Linux Mint 18.1) or the institution supercomputer's node (circa 100GiB RAM, CentOS 7), and the program may want to consume all that RAM. Note that the supercomputer effectively prohibits disk swapping of user processes, and my home PC has a smallish page file.

Memory usage pattern. The program can be said to sequentially fill a sort of table, each row wherein is the vector<...> as specified above. Say the prime data structure is called supp. Then, for each integer k, to fill supp[k], the data from supp[k-1] is required. As supp[k] is filled it is used to initialize supp[k+1]. Thus, at each time, this, prev, and next “table rows” must be readily accessible. After the table is filled, the program does a relatively quick (compared with “initializing” and filling the table), non-exhaustive search in the table, through which a solution is obtained. Note that the memory is only allocated through the STL containers, I never explicitly new() or malloc() myself.

Questions. Wishful thinking.

  1. What is the appropriate way to measure total memory usage (including swapped to disk) of a process from inside its source code (one for Windows, one for Linux)?
  2. Should probably be another question, or rather a good googling session, but still---what is the proper (or just easy) way to explicitly control (say encourage or discourage) swapping to disk? A pointer to an authoritative book on the subject would be very welcome. Again, forgive my ignorance, I'd like a means to say something on the lines of “NEVER swap supp” or “swap supp[10]”; then, when I need it, “unswap supp[10]”—all from the program's code. I thought I'd have to resolve to serialize the data structures and explicitly store them as a binary file, then reverse the transformation.

On Linux, it appeared the easiest to just catch the heap pointers through sbrk(0), cast them as 64-bit unsigned integers, and compute the difference after the memory gets allocated, and this approach produced plausible results (did not do more rigorous tests yet).

edit 5. Removed reference to HeapAlloc wrangling—irrelevant.

edit 4. Windows solution This bit of code reports the working set that matches the one in Task Manager; that's about all I wanted—tested on Windows 10 x64 (tested by allocations like new uint8_t[1024*1024], or rather, new uint8_t[1ULL << howMuch], not in my “production” yet ). On Linux, I'd try getrusage or something to get the equivalent. The principal element is GetProcessMemoryInfo, as suggested by @IInspectable and @conio

#include<Windows.h>
#include<Psapi.h>
//get the handle to this process
auto myHandle = GetCurrentProcess();
//to fill in the process' memory usage details
PROCESS_MEMORY_COUNTERS pmc;
//return the usage (bytes), if I may
if (GetProcessMemoryInfo(myHandle, &pmc, sizeof(pmc)))
    return(pmc.WorkingSetSize);
else
    return 0;

edit 5. Removed reference to GetProcessWorkingSetSize as irrelevant. Thanks @conio.

23
  • Virtual memory is pretty complex. What particular statistic do you wish to measure? Commented Jan 24, 2017 at 14:19
  • I want the nearest to RAM usage as would be reported by Task Manager “total physical memory reserved for an individual proces” or top. I do not intend to crawl into virtual memory, the usage should be reported for RAM. Commented Jan 24, 2017 at 14:40
  • For Windows, have a look at Process Memory Usage Information. Besides, your program doesn't consume RAM. It consumes address space. RAM is just a performance optimization. Commented Jan 24, 2017 at 14:44
  • If you pretend that the complexity of virtual memory does not exist then I doubt you'll get much useful. What are you going to do with the information. Which decisions will it inform? Commented Jan 24, 2017 at 14:48
  • Memory usage determines whether a given problem instance can be solved or not (the actual relevant statistic would be average memory usage per state obtained by dividing memory usage by the number of states). It also could affect the choice of the program's inner structure (I could not calculate memory usage of std::unordered_map<...> “analytically” so I decided to go empiric. Commented Jan 24, 2017 at 15:06

2 Answers 2

2

On Windows, the GlobalMemoryStatusEx function gives you useful information both about your process and the whole system.

Based on this table you might want to look at MEMORYSTATUSEX.ullAvailPhys to answer "Am I getting close to hitting swapping overhead?" and changes in (MEMORYSTATUSEX.ullTotalVirtual – MEMORYSTATUSEX.ullAvailVirtual) to answer "How much RAM is my process allocating?"

Sign up to request clarification or add additional context in comments.

Comments

1

To know how much physical memory your process takes you need to query the process working set or, more likely, the private working set. The working set is (more or less) the amount of physical pages in RAM your process uses. Private working set excludes shared memory.

See

for terminology and a little bit more details.

There are performance counters for both metrics.

(You can also use QueryWorkingSet(Ex) and calculate that on your own, but that's just nasty in my opinion. You can get the (non-private) working set with GetProcessMemoryInfo.)


But the more interesting question is whether or not this helps your program to make useful decisions. If nobody's asking for memory or using it, the mere fact that you're using most of the physical memory is of no interest. Or are you worried about your program alone using too much memory?

You haven't said anything about the algorithms it employs or its memory usage patterns. If it uses lots of memory, but does this mostly sequentially, and comes back to old memory relatively rarely it might not be a problem. Windows writes "old" pages to disk eagerly, before paging out resident pages is completely necessary to supply demand for physical memory. If everything goes well, reusing these already written to disk pages for something else is really cheap.

If your real concern is memory thrashing ("virtual memory will be of no use due to swapping overhead"), then this is what you should be looking for, rather than trying to infer (or guess...) that from the amount of physical memory used. A more useful metric would be page faults per unit of time. It just so happens that there are performance counters for this too. See, for example Evaluating Memory and Cache Usage.

I suspect this to be a better metric to base your decision on.

6 Comments

I do believe he's worried about his program alone exhausting memory, based on "The program itself surely loves its RAM (dynamic programming, lots of states to store), it will gladly chow through several GiB on certain problem instances.... If the problem eats through all available RAM, it is to terminate---virtual memory will be of no use due to swapping overhead."
I don't disagree, but my answer - the suggestion to monitor page faults rather than memory usage - holds regardless of whether he's the only one on the machine or he cares about other programs being able to work too.
Agreed that page faults are the problem, but rate (faults per time) may not be all that helpful, because thrashing and I/O contention. Measuring (faults per computation) would be much better, because it isn't defeated by the slowdown caused by the faults.
So the answer is, probably, Private Working Set; in view of the environment, I assume shared to be negligible (although I'd prefer an upper bound—the whole Working Set, just in case). Now, my real main concern is the actual memory usage; also, as far as I bothered to test, the program tends to be killed by the system as soon as RAM is exhausted anyway. That suits me, however, I will be really glad to know how to catch when does it happen to report it in the program's own log.
@BenVoigt: In principle you're right. In practice, I think before you get that much PFs that you can't even execute code that causes PFs, you'll cause a rise in PFs. Per second. Tools like Process Explorer and Process Hacker show "PF Delta", but I'm not aware of a performance counter that gives you the number of PFs up to this point (rather than rate). Process Hacker is open source so you can find the undocumented uses of NtQueryInformationProcess (with ProcessVmCounters) and NtQuerySystemInformation (with NtQuerySystemInformation) to get the process and system PF count respectively.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.