{"id":678,"date":"2018-08-07T00:53:04","date_gmt":"2018-08-07T07:53:04","guid":{"rendered":"http:\/\/finaldie.com\/blog\/?p=678"},"modified":"2018-08-07T23:02:38","modified_gmt":"2018-08-08T06:02:38","slug":"skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement","status":"publish","type":"post","link":"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/","title":{"rendered":"About memory measurement and tracing"},"content":{"rendered":"<p>Finally we are here for <a href=\"https:\/\/github.com\/finaldie\/skull\/releases\/tag\/v1.2.3\">Skull Engine v1.2.3<\/a>, a special tag number, and a big story behind.<\/p>\n<p>Let\u2019s check how many important features since v1.1:<\/p>\n<ul>\n<li><strong>New:<\/strong> Realtime memory tracing tool <code>skull-trace<\/code><\/li>\n<li><strong>New:<\/strong> Override libc malloc to better measure memory stats<\/li>\n<li><strong>New:<\/strong> Upgrade python2 to python3<\/li>\n<li><strong>Enhancement:<\/strong> Remove protobuf-c from <em>Engine<\/em> dependency<\/li>\n<li><strong>Enhancement:<\/strong> Add <code>google\/protobuf<\/code> as a submodule<\/li>\n<\/ul>\n<p>The full change log please refer to <a href=\"https:\/\/github.com\/finaldie\/skull\/blob\/master\/ChangeLog.md\">here<\/a>.<\/p>\n<p>When we say \u201cmemory\u201d, people usually think about memory pool, efficient memory allocation, garbage collection, but all the terms seem pretty old concepts, nowadays fewer people care about it since RAM is getting cheaper and cheaper. No enough RAM? Just upgrade it to a bigger one. Thus, it sounds like we can ignore the memory related issues, but can we?<\/p>\n<p>The reality is, even today, the memory management is still a serious topic during the development cycle, whatever we have a faster GC, an efficient memory pool or a better programming language, to better manage memory is still not that easy.<\/p>\n<p>In order to better manage it, we need a way to measure it first. So this time, let\u2019s talk about memory measurement.<\/p>\n<h1>Experimental Memory Measurement<\/h1>\n<pre><code>root@de925ca81ab4:\/workspace# echo memory | nc 0 7759\n======== Skull Memory Measurement ========\n      .        init        used: 6184\n      .        core        used: 274031720\n module       admin        used: 10624\n module     ranking        used: 0\n module     request        used: 0\n module     dns_ask        used: 0\n module    response        used: 0\nservice     ranking        used: 272\nservice         dns        used: 0\n\n================ Summary =================\nTotal Allocated: 274048800\nTrace Enabled: 0<\/code><\/pre>\n<p>Think about the normal feature releases, to better verify the feature is working or not, we may have different situations need to deal with, like:<\/p>\n<ul>\n<li>Exception in certain cases<\/li>\n<li>Crash<\/li>\n<li>Behavior changes<\/li>\n<li>Memory leak<\/li>\n<\/ul>\n<p>For the first two cases, they usually fail at an early stage, we can fix them immediately with direct hints. But for the rest of the two, we need a strong monitoring\/reporting mechanism to check it after a long run.<\/p>\n<p>Here, for the last one <strong>Memory Leak<\/strong>, whatever which programming language we use, even today (someone may think there is no such kind of issue anymore), it is still possible to leak buckets of memory in somewhere. And usually we observe the total application memory utilization, if leak happens, hard to say which part could cause the leak, rollback is always the quickest action we can take, but in the next for the reproducing and fixing, it is still a headache, and there are some of the possible methods we can use:<\/p>\n<ul>\n<li>Review all the changes (May bunch of PRs): Low efficiency and may not find finally<\/li>\n<li>Binary Install PRs until locate which PR caused leak: Low efficiency and long duration<\/li>\n<li>Use Valgrind-like tool to reproduce and observe (For C\/C++ program): Impossible in production, it\u2019s too slow.<\/li>\n<\/ul>\n<p>Above is a lucky situation we have, a terrible story is that we take over a project from others, and there are some leaks already, no one knows where it is. Keep restarting in certain cycles (hours\/days) to avoid OOM is the first thing we can do at the beginning. There is no PRs we can review, just entire code base, so what can we do?<\/p>\n<p>So, there is no clear signal to tell us which part of code caused leak, if there is, could be much easier for these kind of problems. That\u2019s why this memory measurement feature be rollout, it provides per-module, per-service memory utilization monitoring, like ram usage, call times in different APIs(malloc, realloc, free). From it, we can directly knows about which component\u2019s memory is keep increasing, and get back to the PR changes related to that component, to solve the issue faster than before.<\/p>\n<p>And with this feature, we can also get more benefits, like, after every release or when we doing the A\/B testing, we can compare with the old one to see whether the new code introduces too much memory, or whether the optimization for memory usage works or not, it\u2019s now truly based on the data instead of imagination. And base on that, like I mentioned above, the measurement is not just a ram usage, it also provides the call times for each low-level API, so take that signal, we can either optimize how to reduce the memory pressure for a certain module\/service, or reduce the call times for a hotspot code path, in order to ease the peak ram or GC pressure. Now with it, we can understand our application even better than ever.<\/p>\n<p>For the details about how this feature was designed and implemented, the issues I handled, the trade-off I made, I\u2019ll write another blog to describe, stay tuned :)<\/p>\n<h1>Low-level libc malloc tracing toolkit<\/h1>\n<p>By having above memory measurement feature, it would help people to locate the painful leak problem in a faster way, but it\u2019s not enough, it is just a start. The next big problem is how we solve it? As we know, solve a leak issue would take a lot of efforts and time and get frustrated. After that, people may say \u201cHow stupid the problem is\u201d\u2026<\/p>\n<p>Anyway, this part actually is case by case:<\/p>\n<ul>\n<li>If we locate the code changes which caused it, just fix it.<\/li>\n<li>If multiple PRs there, still can use binary-search mechanism to locate it.<\/li>\n<\/ul>\n<p>Problem solved, cheers!<\/p>\n<p>But wait\u2026 it\u2019s not finished yet, to avoid the same issue happens again, we need a postmortem to fully understand why\/when\/how it happened. Let\u2019s imagine it what we will do: we read a lot of related codes, carefully identify which part allocated memory and which part released them, and draw a flow, reverse checking all the stuffs that makes sense, then we get the conclusion.<\/p>\n<p>Well, this is just a perfect example, the truth is, in most of the cases, people usually guess it intuitively, and explain it to others if it looks like making sense.<\/p>\n<p>Let\u2019s think it in another way, if we have a light-weight tracing system, the story will be totally different. Trace the system when we reproduce the problem, and from the trace log, we can exactly knows which allocation isn\u2019t released after a well-defined scope. That means we fully understood the problem, and fixed it exactly.<\/p>\n<p>So, it is the time to bring <code>skull trace<\/code> on the table, there are few goals before implementation:<\/p>\n<ul>\n<li>Light-weight which can be enabled in production<\/li>\n<li>On-demand realtime tracing<\/li>\n<li>Shows exact code with line number (If has symbol) instead of address<\/li>\n<li>Works well with <a href=\"https:\/\/en.wikipedia.org\/wiki\/Address_space_layout_randomization\">ASLR<\/a> enabled system<\/li>\n<\/ul>\n<p>Below is a demo of the tracing output:<\/p>\n<p><iframe src=\"https:\/\/giphy.com\/embed\/1wpbkb5tiSDHRykv7R\" width=\"640\" height=\"246\" frameBorder=\"0\" class=\"giphy-embed\" allowFullScreen><\/iframe><\/p>\n<p><a href=\"https:\/\/giphy.com\/gifs\/skull-engine-1wpbkb5tiSDHRykv7R\">via GIPHY<\/a><\/p>\n<h1>Last<\/h1>\n<p>Beyond those above, please also be open minded to think about few more questions:<\/p>\n<ul>\n<li>How to improve the system design by using memory measurement mechanism?\n<ul>\n<li>What if a buffer allocated in layer A, but released in layer B?<\/li>\n<li>What if we want to split different layers into separate servers?<\/li>\n<\/ul>\n<\/li>\n<li>Is that possible to detect a memory leak before release by leveraging tracing toolkit in a automated way?<\/li>\n<\/ul>\n<p>OK, it\u2019s enough for today, but the story is still ongoing. By measurement and tracing, there will be more interesting places for we to play. Stay tuned :)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Finally we are here for Skull Engine v1.2.3, a special tag number, and a big story behind. Let\u2019s check how many important features since v1.1: New: Realtime memory tracing tool skull-trace New: Override libc malloc to better measure memory stats New: Upgrade python2 to python3 Enhancement: Remove protobuf-c from Engine dependency Enhancement: Add google\/protobuf as &#8230; <a title=\"About memory measurement and tracing\" class=\"read-more\" href=\"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/\" aria-label=\"More on About memory measurement and tracing\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":692,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[23],"tags":[39,40,24],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>About memory measurement and tracing - Final Blog<\/title>\n<meta name=\"description\" content=\"Skull Engine is a fast to start, easy to maintain, high productive serving framework.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"About memory measurement and tracing - Final Blog\" \/>\n<meta property=\"og:description\" content=\"Skull Engine is a fast to start, easy to maintain, high productive serving framework.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/\" \/>\n<meta property=\"og:site_name\" content=\"Final Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/hu.yuzhang\" \/>\n<meta property=\"article:published_time\" content=\"2018-08-07T07:53:04+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-08-08T06:02:38+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/finaldie.com\/blog\/wp-content\/uploads\/2018\/08\/IMG_0389.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"4032\" \/>\n\t<meta property=\"og:image:height\" content=\"3024\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"final\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@hyzwowtools\" \/>\n<meta name=\"twitter:site\" content=\"@hyzwowtools\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"final\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/\",\"url\":\"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/\",\"name\":\"About memory measurement and tracing - Final Blog\",\"isPartOf\":{\"@id\":\"https:\/\/finaldie.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/finaldie.com\/blog\/wp-content\/uploads\/2018\/08\/IMG_0389.jpg\",\"datePublished\":\"2018-08-07T07:53:04+00:00\",\"dateModified\":\"2018-08-08T06:02:38+00:00\",\"author\":{\"@id\":\"https:\/\/finaldie.com\/blog\/#\/schema\/person\/2d4c840d6e8e197f8ade98af2bd2fab3\"},\"description\":\"Skull Engine is a fast to start, easy to maintain, high productive serving framework.\",\"breadcrumb\":{\"@id\":\"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/#primaryimage\",\"url\":\"https:\/\/finaldie.com\/blog\/wp-content\/uploads\/2018\/08\/IMG_0389.jpg\",\"contentUrl\":\"https:\/\/finaldie.com\/blog\/wp-content\/uploads\/2018\/08\/IMG_0389.jpg\",\"width\":4032,\"height\":3024},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/finaldie.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"About memory measurement and tracing\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/finaldie.com\/blog\/#website\",\"url\":\"https:\/\/finaldie.com\/blog\/\",\"name\":\"Final Blog\",\"description\":\"As simple as possible...\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/finaldie.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/finaldie.com\/blog\/#\/schema\/person\/2d4c840d6e8e197f8ade98af2bd2fab3\",\"name\":\"final\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/finaldie.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/4c720545b79ddb0f23b527e0bbcfd9bc?s=96&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/4c720545b79ddb0f23b527e0bbcfd9bc?s=96&r=g\",\"caption\":\"final\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"About memory measurement and tracing - Final Blog","description":"Skull Engine is a fast to start, easy to maintain, high productive serving framework.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/","og_locale":"en_US","og_type":"article","og_title":"About memory measurement and tracing - Final Blog","og_description":"Skull Engine is a fast to start, easy to maintain, high productive serving framework.","og_url":"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/","og_site_name":"Final Blog","article_publisher":"https:\/\/www.facebook.com\/hu.yuzhang","article_published_time":"2018-08-07T07:53:04+00:00","article_modified_time":"2018-08-08T06:02:38+00:00","og_image":[{"width":4032,"height":3024,"url":"https:\/\/finaldie.com\/blog\/wp-content\/uploads\/2018\/08\/IMG_0389.jpg","type":"image\/jpeg"}],"author":"final","twitter_card":"summary_large_image","twitter_creator":"@hyzwowtools","twitter_site":"@hyzwowtools","twitter_misc":{"Written by":"final","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/","url":"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/","name":"About memory measurement and tracing - Final Blog","isPartOf":{"@id":"https:\/\/finaldie.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/#primaryimage"},"image":{"@id":"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/#primaryimage"},"thumbnailUrl":"https:\/\/finaldie.com\/blog\/wp-content\/uploads\/2018\/08\/IMG_0389.jpg","datePublished":"2018-08-07T07:53:04+00:00","dateModified":"2018-08-08T06:02:38+00:00","author":{"@id":"https:\/\/finaldie.com\/blog\/#\/schema\/person\/2d4c840d6e8e197f8ade98af2bd2fab3"},"description":"Skull Engine is a fast to start, easy to maintain, high productive serving framework.","breadcrumb":{"@id":"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/#primaryimage","url":"https:\/\/finaldie.com\/blog\/wp-content\/uploads\/2018\/08\/IMG_0389.jpg","contentUrl":"https:\/\/finaldie.com\/blog\/wp-content\/uploads\/2018\/08\/IMG_0389.jpg","width":4032,"height":3024},{"@type":"BreadcrumbList","@id":"https:\/\/finaldie.com\/blog\/skull-engine-v1-2-3-is-out-lets-talk-about-memory-measurement\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/finaldie.com\/blog\/"},{"@type":"ListItem","position":2,"name":"About memory measurement and tracing"}]},{"@type":"WebSite","@id":"https:\/\/finaldie.com\/blog\/#website","url":"https:\/\/finaldie.com\/blog\/","name":"Final Blog","description":"As simple as possible...","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/finaldie.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/finaldie.com\/blog\/#\/schema\/person\/2d4c840d6e8e197f8ade98af2bd2fab3","name":"final","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/finaldie.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/4c720545b79ddb0f23b527e0bbcfd9bc?s=96&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/4c720545b79ddb0f23b527e0bbcfd9bc?s=96&r=g","caption":"final"}}]}},"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"https:\/\/finaldie.com\/blog\/wp-content\/uploads\/2018\/08\/IMG_0389.jpg","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/finaldie.com\/blog\/wp-json\/wp\/v2\/posts\/678"}],"collection":[{"href":"https:\/\/finaldie.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/finaldie.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/finaldie.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/finaldie.com\/blog\/wp-json\/wp\/v2\/comments?post=678"}],"version-history":[{"count":17,"href":"https:\/\/finaldie.com\/blog\/wp-json\/wp\/v2\/posts\/678\/revisions"}],"predecessor-version":[{"id":698,"href":"https:\/\/finaldie.com\/blog\/wp-json\/wp\/v2\/posts\/678\/revisions\/698"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/finaldie.com\/blog\/wp-json\/wp\/v2\/media\/692"}],"wp:attachment":[{"href":"https:\/\/finaldie.com\/blog\/wp-json\/wp\/v2\/media?parent=678"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/finaldie.com\/blog\/wp-json\/wp\/v2\/categories?post=678"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/finaldie.com\/blog\/wp-json\/wp\/v2\/tags?post=678"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}