Closed Bug 704240 Opened 13 years ago Closed 11 years ago

Reference analysis

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: bjacob, Unassigned)

References

()

Details

(Whiteboard: Go to comment 40 [MemShrink:P2] Being superseded by DMD, see bug 1014346)

Attachments

(18 obsolete files)

An important aspect of our codebase is references between objects, typically of the form: "A owns a refptr pointing to B". Understanding the graph of references is often important for understanding bugs. As an example, loops in this graph need to be explicitly declared to the cycle collector , or else we typically get a memory leak.

We don't seem to have any tool to investigate the graph of references between objects. We have tools to investigate reference counts, but we don't have tools that are able to trace _who_ owns who.

There are probably 2 approaches to implementing such a tool:

 A. The Right Way: at the level of the memory allocator, similar to how one would implement a GC. By instrumenting malloc to keep track of allocated blocks and returned pointers, and scanning memory for these pointers, and when by scanning we find a known pointer, check what block it is in to find who is the owner. In order to get type information, we would have to use debug info (e.g. DWARF).

 B. The Poor Man's Way: at the level of XPCOM. The idea is that most of our objects are XPCOM objects and most of our references are nsCOMPtr's and nsRefPtr's. By adding special markers e.g. in nsISupports and doing some memory scanning, one gets something that works more or less, but is not as reliable and complete as what approach A would give us. I am attaching a patch that does that.

Next steps would be to implement tools that parse the trace and extract useful information.

A no-brainer is to find cycles in the graph, and check if they have been declared to the XPCOM cycle collector.

I don't see this as being only interesting for cycles though. This could help us identify unexpected, overly complex ownership schemes.
Here is an implementation of approach B described in comment 0: tracing references (nsRefPtr and nsCOMPtr) between XPCOM objects.

Some hackery allows to also cover the case whey the refptrs are stored in arrays and hashtables, like nsTArray<nsCOMPtr>.

*** HOW TO USE IT ***

Usage: just apply this patch against mozilla-central (applies against 7dd46087e678) and build. Tested on linux x86_64 debug; should work elsewhere, in particular it should work fine in non-debug builds. Just run Firefox as usual, it will print the trace on stderr (see below).

By default, refptrs on the stack are not reported, as that adds a lot of noise. If you want to see them, define the REFTRACING_SHOW_STACK environment variable.

The trace consists of lines like this one:

REFTRACING: nsCOMPtr@0x1f324f8 : imgRequest@0x1f323a0 -> nsFileChannel@0x1f31cb0

That line means "a nsCOMPtr at address 0x1f324f8, belonging to a imgRequest at address 0x1f323a0, is now pointing to a nsFileChannel at address 0x1f31cb0"

So for example when you see these lines in the trace,

REFTRACING: nsCOMPtr@0x1f324f8 : imgRequest@0x1f323a0 -> nsFileChannel@0x1f31cb0
REFTRACING: nsCOMPtr@0x1f324c8 : imgRequest@0x1f323a0 -> nsSystemPrincipal@0x1894580
REFTRACING: nsCOMPtr@0x1f31e30 : nsFileChannel@0x1f31cb0 -> imgRequest@0x1f323a0

you know that there now is a reference cycle between that imgRequest and that nsFileChannel. Fortunately, this one doesn't seem to give an actual leak, as a bit further down we find:

REFTRACING: nsCOMPtr@0x1f31e30 : nsFileChannel@0x1f31cb0 -> nsDocShell::InterfaceRequestorProxy@0x1a4f5e0
REFTRACING: nsCOMPtr@0x1f31e38 : nsFileChannel@0x1f31cb0 -> NULL
REFTRACING: nsCOMPtr@0x1f32500 : imgRequest@0x1f323a0 -> NULL
REFTRACING: nsCOMPtr@0x1f324f8 : imgRequest@0x1f323a0 -> NULL


*** LIMITATIONS ***

This covers only XPCOM classes (children of nsISupports) and only nsRefPtr/nsCOMPtr references. Hackery was needed to support nsTArrays of refptrs and also pldhash tables; that doesn't seem to always work. There are occasional crashes. To avoid most crashes, memory scanning stops at page boundaries, but that means that it can fail to detect objects that overlap two pages. Detection of stack addresses is inaccurate too, regarding the address of the bottom of the stack of the main thread.

All of that could be overcome by switching to the better approach A mentioned in comment 0.

*** NEED AN ANALYZER TOOL ***

Also, no tool exists yet to intelligently analyze the traces. Such a tool would be useful regardless of the choice of approach A or B. A good starting point would be to parse the traces to construct the reference graph in memory, feed that to boost::graph and get analysis of connected components and strongly connected components (aka cycles). The user could simply truncate the trace to the point where it wants analysis to be performed, before running the analysis tool.

*** HOW THIS PATCH WORKS ***

To see how this patch patch, implementing approach B, works, open xpcom/base/nsISupportsBase.h and xpcom/glue/nsISupportsImpl.cpp and xpcom/glue/nsISupportsImpl.h.

Go to nsISupportsBase.h. A marker is inserted in nsISupports. See NSISUPPORTS_MARKER_MAGIC and nsISupports::ReftracingData::marker. Right after the marker, we store a pointer to self (so that we can safely discover |this| from scanning memory for the marker) and we also make some room to store the size of this type and a char* string pointer giving the type name. Normally we'd want size and type_name to be given by pure virtual methods in nsISupports, but due to XPT works (calling virtual methods by index), adding virtual methods to nsISupports is not an option.

Still in nsISupportsBase.h, two macros are defined: NS_DECL_REFTRACING and NS_IMPL_REFTRACING. In nsISupportsImpl.h, these are added to the usual NS_DECL_ISUPPORTS and friends, so that most XPCOM classes automatically get instrumented. NS_DECL_REFTRACING and NS_IMPL_REFTRACING ensure that when the object is constructed, memory is scanned to find the location of all its nsISupports bases and initialize their |size| and |type_name| fields (see above). That's how we work around the impossibility to add virtual methods to nsISupports.

At this stage, XPCOM objects are instrumented, so how do we use that? By instrumenting nsRefPtr and nsCOMPtr. Everytime their mRawPtr gets updated, we have them call reftracing::dumpRefPtrInfo, see nsISupportsImpl.cpp. This calls reftracing::Scanner to scan memory to discover who is the owner and who is the pointee, and print a trace line to stderr.

reftracing::Scanner primarily looks for nsISupports markers (see above), then checks that the corresponding |size| is consistent with the input address. For example, if, starting at address 0x100 I scan memory backwards and find a nsISupports at address 0x50 and its size is 0x10, then there is no way that the original address 0x100 belongs to this object (0x100 > 0x50 + 0x10) so I give up and declare the object to be 'NON-XPCOM' or 'Unknown'.

There's a bit more complication in reftracing::Scanner of course. Part of the complication is to avoid crashing by crossing page boundaries. Another part is to try to handle the case where the refptrs are stored in a nsTArray or a nsTHashtable. This is done by inserting a special marker at the beginning of nsTArray::Header, indicating to jump to the address of the actual nsTArray. This is RECURSION_MARKER_MAGIC.
OS: Linux → All
Hardware: x86_64 → All
This adds a new 'exp-refgrind' tool to Valgrind.

HOW TO BUILD:

Just apply this patch to the Valgrind source tree, and rebuild Valgrind (including the ./autogen.sh and ./configure steps)

$ svn co svn://svn.valgrind.org/valgrind/trunk valgrind
$ cd valgrind
$ patch -p0 < ~/refgrind.patch
$ ./autogen.sh
$ ./configure
$ make


HOW TO USE:

$ ./vg-in-place --smc-check=all --tool=exp-refgrind <application command line>

As usual with Valgrind, --smc-check=all allows to correctly handle JITted code in Firefox.

Once it's started, you can get a dump of the reference graph by raising a SIGINT (2) signal:

$ kill -INT <pid>

I know it's not very convenient. I have yet to figure how Callgrind gets the signal handling right.

In particular, in the current state of things, unless your application has a SIGINT handler, it's not possible to get more than one snapshot as the first SIGINT kills it.


FIREFOX EXAMPLE:

$ ./vg-in-place --smc-check=all --tool=exp-refgrind ~/build/firefox/dist/bin/firefox -P test -no-remote 2>&1 | tee log
==18134== Refgrind, a reference graph tool
==18134== NOTE: This is an Experimental-Class Valgrind Tool
==18134== Copyright (C) 2010-2011, and GNU GPL'd, by Mozilla Foundation
==18134== Using Valgrind-3.8.0.SVN and LibVEX; rerun with -h for copyright info
==18134== Command: /home/bjacob/build/firefox/dist/bin/firefox -P test -no-remote
==18134== 
nsStringStats
 => mAllocCount:              1
 => mReallocCount:            0
 => mFreeCount:               1
 => mShareCount:              0
 => mAdoptCount:              0
 => mAdoptFreeCount:          0
==18135== 
++DOCSHELL 0x17fa28b0 == 1
++DOCSHELL 0x1ad52600 == 2
++DOMWINDOW == 1 (0x1ad4f548) [serial = 1] [outer = (nil)]
++DOMWINDOW == 2 (0x1ad64ba8) [serial = 2] [outer = 0x1ad4f4d0]
==18134== ======== refgrind received signal (2) ========
==18134== overall statistics:
==18134==  total allocations: 47105 blocks / 14043008 bytes
==18134==  currently live:    30997 blocks / 7935102 bytes
==18134==  max memory usage:  30998 blocks / 7935259 bytes
==18134== graph of blocks and references between them:
==18134==  block 494
==18134==   size 64
==18134==   start address 0x59680A0
==18134==   created by instruction    15491219
==18134==   last read by instruction  121562322
==18134==   last write by instruction 121562330
==18134==   access totals: 176 bytes read, 88 bytes written
==18134==   created by allocation:
==18134==    at 0x40271B3: malloc (vg_replace_malloc.c:263)
==18134==    by 0x401245B: dl_open_worker (dl-open.c:393)
==18134==    by 0x400D925: _dl_catch_error (dl-error.c:178)
==18134==    by 0x4011899: _dl_open (dl-open.c:569)
==18134==    by 0x4C3EF65: dlopen_doit (dlopen.c:67)
==18134==    by 0x400D925: _dl_catch_error (dl-error.c:178)
==18134==    by 0x4C3F2EB: _dlerror_run (dlerror.c:164)
==18134==    by 0x4C3EEE0: dlopen@@GLIBC_2.2.5 (dlopen.c:88)
==18134==    by 0x403DAB: XPCOMGlueLoad(char const*, unsigned int (**)(XPCOMFunctions*, char const*)) (nsGlueLinkingDlopen.cpp:252)
==18134==    by 0x402C8E: XPCOMGlueStartup (nsXPCOMGlue.cpp:77)
==18134==    by 0x40199C: main (nsBrowserApp.cpp:242)
==18134==   references block 24
==18134==   references block 77
==18134==   references block 489
==18134==   references block 11889
==18134==   references block 11910
==18134==   references block 11924
==18134==  block 495
==18134==   size 64
==18134==   start address 0x5968100
==18134==   created by instruction    15491425
==18134==   last read by instruction  121564725
==18134==   last write by instruction 121564733
==18134==   access totals: 144 bytes read, 88 bytes written
==18134==   created by allocation:
==18134==    at 0x40271B3: malloc (vg_replace_malloc.c:263)
==18134==    by 0x401245B: dl_open_worker (dl-open.c:393)
==18134==    by 0x400D925: _dl_catch_error (dl-error.c:178)
==18134==    by 0x4011899: _dl_open (dl-open.c:569)
==18134==    by 0x4C3EF65: dlopen_doit (dlopen.c:67)
==18134==    by 0x400D925: _dl_catch_error (dl-error.c:178)
==18134==    by 0x4C3F2EB: _dlerror_run (dlerror.c:164)
==18134==    by 0x4C3EEE0: dlopen@@GLIBC_2.2.5 (dlopen.c:88)
==18134==    by 0x403DAB: XPCOMGlueLoad(char const*, unsigned int (**)(XPCOMFunctions*, char const*)) (nsGlueLinkingDlopen.cpp:252)
==18134==    by 0x402C8E: XPCOMGlueStartup (nsXPCOMGlue.cpp:77)
==18134==    by 0x40199C: main (nsBrowserApp.cpp:242)
==18134==   references block 69
==18134==   references block 77
==18134==   references block 489
==18134==   references block 11889
==18134==   references block 11910
==18134==   references block 11924

... (trimmed; the above is just the 2 first blocks).


EXPLANATION OF OUTPUT:

A "block" means the result of a single memory allocation.

Only blocks that have references to at least one block, are listed. If you don't see a block listed, that means it doesn't have any reference so isn't very interesting.

"block 495" opens a section describing a block. here, 495 is the creation number, meaning that that block was the 495th created one. There may be fewer actually live blocks at a given time, though.

"size 64" gives the block size in bytes

"created by instruction 15491425" gives the number of the instruction where the block got allocated. Think of it as a monotonic timestamp.

Ditto for "last read by instruction 121573555" and "last write by instruction 121573563". Higher values there mean that the block was accessed more recently.
Attachment #575942 - Attachment is obsolete: true
Peterv and Smaug are probably the people who spend the most time looking at cycle-collector-related leaks, so they may find this of interest.
This version has more useful logging; next version will identify strong references using markers (will require patching the application being debugged).
Attachment #580176 - Attachment is obsolete: true
This version of the Valgrind tool only works with specially patched mozilla to have markers to identify strong references: see next patch here.

The build / usage instructions in Comment 2 are still valid. Note that recent Valgrind SVN doesn't require --disable-jemalloc anymore. Just pass --soname-synonyms=somalloc=NONE to Valgrind.
Attachment #643243 - Attachment is obsolete: true
This patch adds markers in nsRefPtr and nsCOMPtr so that when we found such refptrs while scanning memory, we correctly identify them as strong references. 

There are some other strong references to care about such as nsCOMArray but it should handle the majority of cases already.

Applying this patch to mozilla-central is *mandatory* for running Refgrind against it.
Attached file Reference graph analysis tool (obsolete) —
This stand-alone program parses the logs produced by the Valgrind tool and produces information about all the strong reference cycles with call stacks to how they were created.

It requires the Boost Graph library. For example, on Debian-based distros:

  $ apt-get install libboost-graph-dev

Then it simply builds like this:

  $ c++ refgraph.cpp -O3 -o refgraph

Usage: refgraph reads from standard input and writes to standard output, so once you have recorded a log file from Refgrind, say refgrind.log, you can do:

  $ ./refgraph < refgrind.log | less
Attached file Example refgraph output (obsolete) —
Here's a sample refgraph run obtained as follows:

- apply mozilla-central patch above, build with --enable-valgrind
- apply latest valgrind patch, build valgrind as in comment 2
- download refgraph.cpp, build as in comment 7
- run firefox in valgrind as follows from the valgrind directory:

$ ./vg-in-place --smc-check=all-non-file --soname-synonyms=somalloc=NONE --tool=exp-refgrind ../mozilla-central/obj-firefox-valgrind/dist/bin/firefox -P test -no-remote http://lemonde.fr http://nytimes.com http://youtube.com 2>&1 | tee ~/refgrind.log

- run refgraph as in comment 7 on refgrind.log


Sample output:

There are 75466 blocks and 111450 strong references between them.

There are 322 reference cycles.


=========================================================

Reference cycle 0:

Contains 155 blocks, listed below:

block 2181746
has references to these blocks: 2181719, 2181932, 2182994
allocation call stack:
at 0x402B1FE: malloc (vg_replace_malloc.c:268)
by 0x403738B: PL_ArenaAllocate (plarena.c:200)
by 0x6CD5F8C: nsRuleNode::ComputeDisplayData(void*, nsRuleData const*, nsStyleContext*, nsRuleNode*, nsRuleNode::RuleDetail, bool) (nsRuleNode.cpp:4226)
by 0x6CD8419: nsRuleNode::WalkRuleTree(nsStyleStructID, nsStyleContext*) (nsStyleStructList.h:91)
by 0x6CD9936: nsRuleNode::GetStyleDisplay(nsStyleContext*, bool) (nsStyleStructList.h:91)
by 0x6CF0373: nsStyleContext::ApplyStyleFixups(nsPresContext*) (nsStyleContext.cpp:318)
by 0x6CF26B1: nsStyleContext::nsStyleContext(nsStyleContext*, nsIAtom*, nsCSSPseudoElements::Type, nsRuleNode*, nsPresContext*) (nsStyleContext.cpp:65)
by 0x6CF2717: NS_NewStyleContext(nsStyleContext*, nsIAtom*, nsCSSPseudoElements::Type, nsRuleNode*, nsPresContext*) (nsStyleContext.cpp:696)
by 0x6CF5F81: nsStyleSet::GetContext(nsStyleContext*, nsRuleNode*, nsRuleNode*, bool, bool, nsIAtom*, nsCSSPseudoElements::Type, bool, mozilla::dom::Element*) (nsStyleSet.cpp:579)
by 0x6CF77F1: nsStyleSet::ResolveStyleFor(mozilla::dom::Element*, nsStyleContext*, TreeMatchContext&) (nsStyleSet.cpp:955)
by 0x6B325E6: nsCSSFrameConstructor::ResolveStyleContext(nsStyleContext*, nsIContent*, nsFrameConstructorState*) (nsCSSFrameConstructor.cpp:4581)
by 0x6B326C1: nsCSSFrameConstructor::ResolveStyleContext(nsIFrame*, nsIContent*, nsFrameConstructorState*) (nsCSSFrameConstructor.cpp:4566)

[...]


Note that there is only one such huge (155 blocks) reference cycle. Others are much smaller. They are listed by decreasing order of size. The second cycle has 18 blocks, the majority of cycles have 2 -- 4 blocks.
Attachment #644176 - Attachment mime type: text/plain → application/x-xz
Known issues:
 - the strong reference markers only support direct nsRefPtr and nsCOMPtr ownership. arrays of pointers are not currently handled. nsCOMArray is not currently handled either.
 - does not understand nsIWeakReference.
 - possible issues with 32bit binaries.

Working on the above; meanwhile it also is clear that we don't really take advantage of valgrind here so we'd be better served by a hack-the-memory-allocator-to-introspect-blocks approach.
Updated refgrind tool. Handles nsTArray, nsCOMArray and PLD hashtables.

Forget what I said above about weak pointers, this is a non-issue. As nsWeakReference stores its pointer as a plain pointer, the tool ignores it.
Attachment #644169 - Attachment is obsolete: true
Corresponding updated mozilla-central patch adding markers.

Note that while all of this should work everywhere, it's only been tested on linux 64bit. 32bit and other OSes _should_ work.
Attachment #644170 - Attachment is obsolete: true
By the way, the tool found 2 reference cycles in netwerk/ which doesn't have any CYCLE macro... filed bug 777044.
Summary: Reference tracing → Reference analysis
Whiteboard: Instructions given in comment 8
Whiteboard: Instructions given in comment 8 → Instructions given in comment 8 [MemShrink]
Whiteboard: Instructions given in comment 8 [MemShrink] → Instructions given in comment 8 [MemShrink:P2]
Attached file Example refgraph output (obsolete) —
Attachment #644176 - Attachment is obsolete: true
New version: dumps all blocks, even those that don't have reference to other blocks. They are interesting after all for "what's holding on to that ref" uses.
Attachment #644172 - Attachment is obsolete: true
Attachment #645458 - Attachment is obsolete: true
Attachment #644176 - Attachment is obsolete: false
Attachment #644176 - Attachment is obsolete: true
Attachment #644172 - Attachment is obsolete: false
Depends on: 796999
Been trying to do that since the beginning. Indeed we are not really taking advantage of Valgrind here, as all we use it for is to enumerate all heap blocks; this is better done by instrumenting jemalloc, which this patch does.

This opens the door to much more convenient use -- my plan is to have a "about:something" page allowing to take snapshots at runtime and run analysis on them from within the browser.

This patch also turns on jemalloc 3 automatically -- don't be surprised.

Known caveats: slow, increased memory usage, about:memory is broken.
That sounds very similar at least in spirit to njn's "browser DMD":

"The only Valgrind features of note that DMD uses are:
- malloc interception
- stack trace recording (including getting line numbers) and matching"
> That sounds very similar at least in spirit to njn's "browser DMD":

Indeed, I also have a doesn't-use-Valgrind version of DMD in bug 717853 which uses the existing trace-malloc functionality for intercepting malloc and recording stack traces.
bjacob, if you are planning on recording stack traces, I suggest you look at trace-mallocs "callsite" struct, which is the basis of a space-efficient tree of stack traces that is rooted at |main|.
Thanks!

Nick:
I am very interested in recording stack traces, thanks for the pointer.

Andrew:
If I understand correctly, TraceMalloc works by recording a trace which can then be replayed to get the set of live blocks at a given time; whereas my jemalloc patch instruments blocks so they can be enumerated. So the approach is different, but of course the two approaches can be used to achieved the same thing.

I might not have pursued my approach if I had known about TraceMalloc and Nick's work, but now that it's done, it seems more suited to my task than TraceMalloc, because it doesn't have as much inherent overhead (I suppose that it's tricky to store/process that malloc trace as it might grow large over extended periods of time). My patch in its current form is quite slow because of mutex locking, but that mutex could be removed with moderate effort, and then the speed overhead would be really small. I was hoping to keep overhead small enough that Firefox testers/hackers would keep this instrumentation running during everyday usage.
Ah, I see now --- TraceMalloc is able to maintain a hash table of all blocks, immediately on malloc/free calls, without having to record a trace.
> Ah, I see now --- TraceMalloc is able to maintain a hash table of all
> blocks, immediately on malloc/free calls, without having to record a trace.

Yes.  And the storage of the stack traces is space-efficient because of the tree representation.
Depends on: replace-malloc
This build on the new replace_malloc stuff being developed in bug 804303.

Apply on top of bug 804303 patches and add this in your mozconfig:

  ac_add_options --enable-malloc-replace
  ac_add_options --enable-jemalloc

This builds a librefgraph_replace_malloc library that you can LD_PRELOAD. It uses NSPR (for PR_Lock) so you have to LD_PRELOAD NSPR as well. Like this:

LD_PRELOAD="`pwd`/obj-firefox-debug/dist/bin/libnspr4.so `pwd`/obj-firefox-debug/memory/replace/refgraph/librefgraph_replace_malloc.so" obj-firefox-debug/dist/bin/firefox -P foo -no-remote

This exposes an API to iterate over blocks and access block metadata (address and size) that you can find in the header file, refgraph_replace_malloc.h.

Specifically, everything is in a namespace refgraph, the list element struct is list_elem_t, the function to get the first list element is first_list_elem(), you get to the next list elem by using the elem->next pointer, and you can keep iterating as long as you don't hit the sentinel element returned by sentinel(). Other functions in this header allow to query the payload block address and size for a given list element.
Attachment #668730 - Attachment is obsolete: true
An update with what I currently have. Includes fixes, ability to dump ascii output to a file descriptor, ability to use STL containers thanks to a STL allocator bypassing the instrumentation, and dropped the dependency on NSPR by implementing our own mutexes.
Attachment #683601 - Attachment is obsolete: true
Comment on attachment 686550 [details] [diff] [review]
replace_malloc library allowing to iterate over blocks

Review of attachment 686550 [details] [diff] [review]:
-----------------------------------------------------------------

::: memory/replace/refgraph/refgraph.h
@@ +161,5 @@
> +struct list_elem_t
> +{
> +  // Marker allowing to identify instrumented blocks. Useful as a debugging helper.
> +  // Can be removed if memory usage is a concern.
> +  MOZ_ALIGNED_DECL(uint64_t, list_elem_alignment) marker;

Unless you're allocating on stack, MOZ_ALIGNED_DECL is useless here.

::: memory/replace/refgraph/refgraph_replace_malloc.cpp
@@ +117,5 @@
> +    return 0;
> +  }
> +
> +  while (alignment < sizeof(list_elem_t))
> +    alignment *= 2;

alignment = std::max(alignment, MOZ_ALIGNOF(list_elem_t));

@@ +288,5 @@
> +size_t replace_malloc_good_size(size_t size)
> +{
> +  const size_t extra_space = sizeof(list_elem_t);
> +
> +  return gMallocFuncsTable->malloc_good_size(size) + extra_space;

This seems wrong. This should probably be gMallocFuncsTable->malloc_good_size(size + extra_space) or gMallocFuncsTable->malloc_good_size(size), 
depending how transparent you want this replacement library to be (since malloc_good_size has an influence on the subsequently requested size).
(In reply to Mike Hommey [:glandium] from comment #25)
> Comment on attachment 686550 [details] [diff] [review]
> replace_malloc library allowing to iterate over blocks
> 
> Review of attachment 686550 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: memory/replace/refgraph/refgraph.h
> @@ +161,5 @@
> > +struct list_elem_t
> > +{
> > +  // Marker allowing to identify instrumented blocks. Useful as a debugging helper.
> > +  // Can be removed if memory usage is a concern.
> > +  MOZ_ALIGNED_DECL(uint64_t, list_elem_alignment) marker;
> 
> Unless you're allocating on stack, MOZ_ALIGNED_DECL is useless here.

Not true :-) What I want is for sizeof(list_elem_t) to be a multiple of 16 bytes, so that by sticking a list_elem_t before by heap blocks, I implicitly preserve any alignment property that they had up to 16 bytes alignment (which is the highest that I know malloc to have on any platform). By adding this alignment attribute on a field in this struct, I force the size of the struct to be a multiple of that alignment, otherwise arrays of that struct couldn't honor that alignment requirement. More generally, for that reason, size is always a multiple of alignment.

Example:

#include <iostream>

using namespace std;

struct a {
  char c;
};

struct b {
  __attribute__((aligned(16))) char c;
};

int main() {
  cout << "size of a is " << sizeof(a) << endl;
  cout << "size of b is " << sizeof(b) << endl;
};

$ g++ b.cpp -o b && ./b
size of a is 1
size of b is 16


> 
> ::: memory/replace/refgraph/refgraph_replace_malloc.cpp
> @@ +117,5 @@
> > +    return 0;
> > +  }
> > +
> > +  while (alignment < sizeof(list_elem_t))
> > +    alignment *= 2;
> 
> alignment = std::max(alignment, MOZ_ALIGNOF(list_elem_t));

Thanks!

> 
> @@ +288,5 @@
> > +size_t replace_malloc_good_size(size_t size)
> > +{
> > +  const size_t extra_space = sizeof(list_elem_t);
> > +
> > +  return gMallocFuncsTable->malloc_good_size(size) + extra_space;
> 
> This seems wrong. This should probably be
> gMallocFuncsTable->malloc_good_size(size + extra_space) or
> gMallocFuncsTable->malloc_good_size(size), 
> depending how transparent you want this replacement library to be (since
> malloc_good_size has an influence on the subsequently requested size).

Ah, thanks. I didn't have a precise idea of how that works.
(In reply to Benoit Jacob [:bjacob] from comment #26)
> (In reply to Mike Hommey [:glandium] from comment #25)
> > > +  while (alignment < sizeof(list_elem_t))
> > > +    alignment *= 2;
> > 
> > alignment = std::max(alignment, MOZ_ALIGNOF(list_elem_t));
> 
> Thanks!

Wait, no! What I want here is the smallest power of two that is >= sizeof(list_elem_t).
> Wait, no! What I want here is the smallest power of two that is >= sizeof(list_elem_t).

You still probably don't want to write this as a loop, since that's hard for the compiler to optimize statically.

How about http://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2 ?
(In reply to Benoit Jacob [:bjacob] from comment #27)
> Wait, no! What I want here is the smallest power of two that is >=
> sizeof(list_elem_t).

Do you really need that?
(In reply to Justin Lebar [:jlebar] from comment #28)
> > Wait, no! What I want here is the smallest power of two that is >= sizeof(list_elem_t).
> 
> You still probably don't want to write this as a loop, since that's hard for
> the compiler to optimize statically.
> 
> How about
> http://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2 ?

This is absolutely not performance-sensitive :-)

(In reply to Mike Hommey [:glandium] from comment #29)
> (In reply to Benoit Jacob [:bjacob] from comment #27)
> > Wait, no! What I want here is the smallest power of two that is >=
> > sizeof(list_elem_t).
> 
> Do you really need that?

Yes. This value is not only the alignment that we are going to pass to posix_memalign, it is also the amount of extra space that we are going to allocate. So if list_element_t has alignment 16 and size 48, we really want to pass 64 here (smallest POT >= 48), not 16, as 16 bytes wouldn't be enough room to store the list_elem_t.
> So if list_element_t has alignment 16 and size 48, we really want to
> pass 64 here (smallest POT >= 48), not 16, as 16 bytes wouldn't be enough
> room to store the list_elem_t.

If list_elem_t has alignment 16 and size 48, ptr + 16, ptr + 32, ptr + 48, etc. still have an alignment of 16.

For posix_memalign, alignment must be a power of two and a multiple of sizeof(void *). For memalign, a power of two. Likewise for aligned_alloc, and for valloc alignment is PAGE_SIZE. So alignment is always a power of two.

If the requested alignment is smaller than 16, an alignment of 16 for the whole will just work. Because ptr + 48 is still aligned for any power of 2 below 16.

For alignments greater than the next power of two after sizeof(list_elem_t) (32 on x86, 64 on x86-64), that alignment will work if you place the data at ptr + alignment.

For alignments between 16 and the next power of two after sizeof(list_elem_t) (32 on x86-64), an alignment of 32 works if you place the data at ptr + 64 (smallest multiple of 32 greater than 48, which is sizeof(list_elem_t) on x86-64).

So an alignment of std::max(alignment, MOZ_ALIGNOF(list_elem_t)) always works.
The offset at which you need to place the data varies depending on that value:
<= 16, it is sizeof(list_elem_t)
= 32, it is 32 on x86, and 64 on x86-64
> 32, it is the alignment.
OK, but currently the code is doing:

 gMallocFuncsTable->aligned_alloc(alignment, size + alignment)

So alignment is used both for the alignment and for the amount of extra space, which is why I wanted a value for alignment that would satisfy both. That makes things simpler AFAICS, but if I'm missing something and what you propose is actually simple, a patch is welcome. I am not really trying to minimize memory or speed overhead here, as both are already negligible at least in the case of Firefox. I am more trying to get to the interesting stuff fast.

Moreover, does MOZ_ALIGNOF work for you? Here when I try to use it, the linker complains that it can't find mozilla::AlignmentFinder<link_elem_t>::alignment. I am using GCC 4.6.
Attached patch part 1: dump graph of references (obsolete) — Splinter Review
This implements enough to be able to dump the whole graph of references, to a file or file descriptor.

To build, apply first the patches in bug 804303, and configure with --enable-replace-malloc.

Tested only on Linux x86-64. Should work as-is on 32bit Linux. Mac and Android are untested. Windows not supported at the moment.

To run, LD_PRELOAD librefgraph_replace_malloc.so, e.g.:

  LD_PRELOAD="/path/to/firefox_object_directory/memory/replace/refgraph/librefgraph_replace_malloc.so"

The public API is:

void Dump(int fileDescriptor);
void DumpToFile(const char* filename);

For example in GDB, try this:

(gdb) call refgraph::DumpToFile("/home/bjacob/log")

The resulting file looks like a series of patterns like this:

b 7fffcdf08a50 b0
a 7ffff776eded
a 7ffff2f53a53
a 7ffff2f527e1
a 7ffff270ff67
w 7fffcdf08950
s 7fffcdf23490
w 7fffce1777f0
w 7fffce177cf0
s 7fffce18c8d0
s 7fffce18d7f0
h 7fffce1da8d0
w 7fffce1fc940
w 7fffce446c90
w 7fffce446e10
w 7fffce4f0fd0

How to read this:
 - all numbers are in hex
 - b X Y means we are starting to describe a new block at address X, size Y
 - a X gives an allocation call stack frame. We give only 4 frames, as that is enough to identify types, which is enough for the present tool.
 - w X gives a weak reference (typically, a plain raw pointer) to the block at address X
 - s X gives a strong reference (e.g. nsRefPtr) to the block at address X
 - h X gives a hereditary strong reference i.e. all weak references found in the referenced block must actually be considered as strong references. This is only useful to account for nsCOMArray. The nsCOMArray has a hereditary strong reference to its array storage, so that the nsISupports* pointers found there are correctly accounted for as strong references.

I have a detailed plan for what's next, will post here asap.
Attachment #644172 - Attachment is obsolete: true
Attachment #645461 - Attachment is obsolete: true
Attachment #645572 - Attachment is obsolete: true
Attachment #649935 - Attachment is obsolete: true
Attachment #686550 - Attachment is obsolete: true
Whiteboard: Instructions given in comment 8 [MemShrink:P2] → Go to comment 33 [MemShrink:P2]
Attached patch part 1: dump graph of references (obsolete) — Splinter Review
forgot hg add
Attachment #688635 - Attachment is obsolete: true
Attached patch part 1: dump graph of references (obsolete) — Splinter Review
Woohoo! This new version has many improvements. The most notable are:
 * Collects type information by instrumenting the places where we call AddRef, as these are places where we know the C++ type of the object stored at a given address. Requires RTTI.
 * Instrumentation of pointer/container classes now complete AFAICT.
 * Tweaks arenas so that they aren't actually arenas, i.e. put each object into a separate heap block, so that they don't get it the way of instrumentation.
 * Dumps offsets into blocks, when dumping references, so that we can better understand any 'arena' issue (multiple objects in same block), but I don't know that any such issue remains.
 * Dumps a legend at the beginning, and overall statistics at the end.
Attachment #688637 - Attachment is obsolete: true
Attachment #690118 - Attachment is patch: true
By the way, the build requirements are:
 - apply the patches in bug 804303
 - configure with --enable-replace-malloc --enable-cpp-rtti
Attached patch part 1: dump graph of references (obsolete) — Splinter Review
Some more improvements, especially: using pthreads mutexes instead of homemade spinlock.

I tried again NSPR, but got very strange crashes, which I can only explain as follows: the library we preload links to system NSPR and upon preloading it, it will drag in system NSPR, and some memory allocation will take place before our instrumentation, leading to a mismatch. pthreads seems to be free of such problems so far.

I also tried keeping the homemade mutex and using sched_yield to make it not busy-wait too much, but I got linking errors that I couldn't figure out.
Attachment #690118 - Attachment is obsolete: true
> I tried again NSPR, but got very strange crashes

FYI, if you want to make this work on B2G, your library needs not to depend on libxul at all (so can't link with -lxul).  There's chatter in the newdmd bug about this if you're curious why.
Sure; I didn't think of linking to libxul, just linking to NSPR --- but even that seems not easy to do, so currently it links to neither.
Attachment #690163 - Attachment is obsolete: true
This work has moved to github: the 'refgraph' branch in my mozilla-central fork

https://github.com/bjacob/mozilla-central/tree/refgraph

HOWTO:

0. This probably only works on desktop linux for now due to a couple of non-portable calls. Should be easy to port to every other platform. Easiest should be Mac OSX. The design should work well for mobile platforms as the data collection is separate from the UI (we could dump to a file, adb pull, and examine on a desktop machine).
1. git clone git@github.com:bjacob/mozilla-central.git
2. go to the refgraph branch
3. build with these configure options:
      --enable-replace-malloc
      --enable-cpp-rtti
4. go to about:refgraph. It takes a snapshot immediately.
5. search for a type, or a pattern matched by a type name, like "Image". It will show the types matching this pattern, and for the top hit, will dump a small part of the graph around a few objects of that type. Yes, a better UI is planned.
6. The "traversed by CC" part is based on prior CC runs. Does not yet automatically trigger the CC. So if you want that part to work for now you have to manually trigger the CC (e.g. in about:memory) and then reload about:refgraph.
7. It is then able to correctly detect non-cycle-collected cycles, see bug 827541 comment 16.
Oh yes, and when running this you still have to LD_PRELOAD the library as explained in comment 33.
Whiteboard: Go to comment 33 [MemShrink:P2] → Go to comment 40 [MemShrink:P2]
Just to let you know, stuff is up and running there:

https://github.com/bjacob/mozilla-central

Instructions are on the wiki:

https://github.com/bjacob/mozilla-central/wiki

I probably won't update this bug much anymore, so I'll mark as FIXED. There's more to come though. Currently only Linux and B2G are known to work. Other platforms should be easy to support though. The worst part was supporting both 32bit and 64bit, but that's done now.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Hmm, wait.  Are you going to land this work?  I think it is extremely useful!
It is relatively large and intrusive, so, before talking of landing it, let's prove its usefulness by using it to find and fix leaks. After a certain number of leaks fixed, we would have a stronger case.

It's not even 100% clear to me that we need to land it soon, as the current solution of a github fork is easy enough to use and git seems to be good enough at merging.

Let's use [found-by-refgraph] in the status whiteboard.

Query:
https://bugzilla.mozilla.org/buglist.cgi?status_whiteboard_type=allwordssubstr;query_format=advanced;status_whiteboard=found-by-refgraph;list_id=5454103

Found one already in B2G: bug 834470
Anyone still interested in this should now be insterested in DMD / bug 1014346 instead.
Whiteboard: Go to comment 40 [MemShrink:P2] → Go to comment 40 [MemShrink:P2] Being superseded by DMD, see bug 1014346
See Also: → 1014346
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: