“Profile” is a familiar word for us, especially in C/C++ program. So there are some ways to profile a C/C++ program, such as CodeViz, doxygen, gprof, oprofile and so on. These can be separated in two categories:
- Static – CodeViz, doxygen …
- dynamic – gprof, oprofile …
When you need to profile?
Usually, we needn’t care about performance, but in some stringent environmental, we need to do a lot of efforts to optimize the performance. Many scenes are in the server end, we hope the server program can serve more users.
Some years old, people did not care the performance, because they think the machine is cheap, and we can increase the numbers of machine, which can solve this problem. But today, many people recognize that the number of machine maybe a large number, it will cost a large number of money. So the performance become important again.
Other wise, if the money can solve the problem, that’s not a problem. There are many scenes we can’t use money to solve it simply. For example, architecture defects lead to performance issues. We have to profile the program to solve this issue.
So, if you have a lot of money and don’t care about how much you will waste, you needn’t care about it. If you don’t care users experience even server crash, you needn’t care about it. If you have no time to fix that, and the performance is not the top priority issue, you needn’t to care about it.
Now, you know when you need to profile the program. I think as a strict man, you need to adjust your work flow as follow:
- Understand requirement
- Unit testing
- Performance testing
Let’s do it
Today, I’ll write some experience for profile a program with gprof.
First you need to re-compile your program by add “-pg” argument in CFLAGS/CPPFLAGS/LDFLAGS, and then do the unit test or functional test. Now you will get a “gmon.out” file, which used for generating a human reading text.
Then run “gprof yourbin gmon.out > program.prof”
Ok, now you can open the program.prof to read the result, find the part which impact the performance, and try to fix that.
You see, that’s simple, :D. But if you want to do some advance things, you need to know some truth as follow:
- Normally, gprof can not collect system library statistics, so the call graph, accumulate time, call count can not be appeared in the final result. If you want to do with that, link with “lc_p”. If your program mostly run in the user time, gprof can help you more.
- gprof can not collect multiple threads program information completely, it only collect the main thread information, if you want to collect the others threads, you need to let the others can be signed by “ITIMER_PROF”.
- The default result generated by gprof maybe hard to read, so you can use “Gprof2Dot” tool to generate a beautiful graph from gmon.out.
Finally, I think you can try it for helping you to deep known your program, keep thinking and optimizing your program.