Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need API to Free memory back to OS #278

Closed
alk opened this issue Aug 22, 2015 · 12 comments
Closed

Need API to Free memory back to OS #278

alk opened this issue Aug 22, 2015 · 12 comments

Comments

@alk
Copy link
Contributor

alk commented Aug 22, 2015

Originally reported on Google Code with ID 275

Our application runs on 64-bit SUSE Linux Enterprise Server 10.

The application allocates around 2 - 3 GB of data for specific calls.. 

We are linking to TCMALLOC version 1.5 and we see that memory keeps growing and is
not returned back to OS.

We would like to have a way to return unused memory to OS at periodic intervals. Process
cannot be recycled because it is a production system and we are communicating to other
live applications.

We tried ReleaseFreeMemory() but it it not reducing the heap size. Is there any way
to return memory back to OS?

Regards
Sundari.

Reported by sundarij on 2010-10-11 10:08:02

@alk
Copy link
Contributor Author

alk commented Aug 22, 2015

When you say it's using a lot of memory, do you mean virtual memory or physical memory?

Has there been any actual problem due to the memory growing?  Or just something you're
concerned about?

The best way to figure out what's going on is to put in a call to MallocExtension::instance()->GetStats(),
and print out the resulting buffer.  This will tell us where tcmalloc thinks the memory
is.

Reported by csilvers on 2010-10-11 20:21:01

  • Labels added: Type-Defect, Priority-Medium

@alk
Copy link
Contributor Author

alk commented Aug 22, 2015

Let me explain a little more.

We have multiple instances of the same process running on the Linux server. Our server
has 32 GB RAM.

Each process in turn reads large files of GB size. When we read these large files,
we anyway cannot avoid the huge memory allocation. So the process ends up allocation
around 1 GB physical memory and 2 GB of virtual memory (a typical example)

Once we are done with the necessary operation, this process doesn't really have to
occupy this much memory.. 

We need this because we have other processes that need the physical memory / virtual
memory. Not all processes are active at the same time.. But we expect the process to
give up memory once it is done so that memory can be used by other processes.

We have typically have anywhere between 6 to 8 processes.. each can consume average
of 3 GB per read loop. 

What we have observed is that due to GPT, we are not able to give up memory and the
process starts using more memory.. So 3GB becomes 6 and higher and thereby the other
processes just eat up the swap and hang our systems sometimes.

I will run GetStats() and get an output for further discussion.

Sundari.

Reported by sundarij on 2010-10-12 14:31:04

@alk
Copy link
Contributor Author

alk commented Aug 22, 2015

Most of the bytes are in unmapped page heap after ReleaseFreeMemory call. Given below
is the output of GetStats() before and after ReleaseFreeMemory call.

Initial Heap Size : 47448064

 Stats : ------------------------------------------------
MALLOC:     47448064 (   45.2 MB) Heap size
MALLOC:      2246192 (    2.1 MB) Bytes in use by application
MALLOC:     32825344 (   31.3 MB) Bytes free in page heap
MALLOC:     10260480 (    9.8 MB) Bytes unmapped in page heap
MALLOC:       939456 (    0.9 MB) Bytes free in central cache
MALLOC:            0 (    0.0 MB) Bytes free in transfer cache
MALLOC:      1176592 (    1.1 MB) Bytes free in thread caches
MALLOC:          998              Spans in use
MALLOC:            1              Thread heaps in use
MALLOC:      5373952 (    5.1 MB) Metadata allocated
MALLOC:         4096              Tcmalloc page size
------------------------------------------------
 Stats After ReleaseFreeMemory: ------------------------------------------------
MALLOC:     47448064 (   45.2 MB) Heap size
MALLOC:      2246192 (    2.1 MB) Bytes in use by application
MALLOC:            0 (    0.0 MB) Bytes free in page heap
MALLOC:     43085824 (   41.1 MB) Bytes unmapped in page heap
MALLOC:       939456 (    0.9 MB) Bytes free in central cache
MALLOC:            0 (    0.0 MB) Bytes free in transfer cache
MALLOC:      1176592 (    1.1 MB) Bytes free in thread caches
MALLOC:          997              Spans in use
MALLOC:            1              Thread heaps in use
MALLOC:      5373952 (    5.1 MB) Metadata allocated
MALLOC:         4096              Tcmalloc page size
------------------------------------------------
Heap Size After ReleaseFreeMemory: 47448064

Reported by sundarij on 2010-10-12 14:54:58

@alk
Copy link
Contributor Author

alk commented Aug 22, 2015

Using lots of virtual memory shouldn't be a problem: I think you have 48 bits of virtual
memory on an x86_64 system?  Or maybe 47?  That's plenty.

Physical memory could be an issue.  However, GetStats is showing that the heap size
is the same before and after ReleaseFreeMemory.  So it looksl ike it's doing what it
ought to.  tcmalloc thinks the app has less than 100 M of memory mapped.  Is top (or
ps, or whatever you're using) showing more?

One issue could be overhead due to sampling.  In tcamlloc 1.6, I've changed the default
to not sample at all by default.  Do you want to try upgrading to tcmalloc 1.6, and
see if this fixes the problems you're seeing?  You could also just try running with
the environment variable TCMALLOC_SAMPLE_PARAMETER=0.

Reported by csilvers on 2010-10-12 22:12:25

@alk
Copy link
Contributor Author

alk commented Aug 22, 2015

Thanks for the analysis. I am using version 1.6. Whatever I shared is from 1.6 and my
problem is I want to release the heap because I don't need it in the process!

Sundari.

Reported by sundarij on 2010-10-13 04:54:42

@alk
Copy link
Contributor Author

alk commented Aug 22, 2015

Unmapped bytes *are* released.  We'll clean up this wording in the next release to make
it clearer -- I admit it's really confusing right now.  These are bytes that have been
released to the OS (via an madvise call).  The stats you're showing me indicate everything
is working like it should.

Just to be clear, when you said you were using tcmalloc 1.5 at the top of this bug
report, that was a typo?  You're actually using 1.6?

} What we have observed is that due to GPT, we are not able to give up memory and
} the process starts using more memory.. So 3GB becomes 6 and higher and thereby the
} other processes just eat up the swap and hang our systems sometimes.

Are you certain that's what's happening: processes are swapping because of the memory
demands of the binaries?  Or is that just a hypothesis you have right now?  I want
to be clear, because from what I'm seeing, that shouldn't be happening.

Reported by csilvers on 2010-10-13 05:27:33

@alk
Copy link
Contributor Author

alk commented Aug 22, 2015

One possibility is the madvise() is failing for you, so the bytes aren't actually being
returned to the system properly.  You can test this by looking at src/system_alloc.cc,
at the madvise call.  Right now we ignore return values, but you can look if it's -1
(and not EAGAIN), and maybe print something out then.  If you see that printout when
you're running, then that's an interesting tidbit.

Reported by csilvers on 2010-10-13 05:48:14

@alk
Copy link
Contributor Author

alk commented Aug 22, 2015

Answering the thread for both your responses.

Our software is using GPT 1.5 but I happened to create a utility to simulate this issue.
The utility uses 1.6. I forgot to mention that in my posts. Sorry about it.

Today we did some more characterization using the 1.6 version and the utility.

We verified that madvise() call is indeed returning 0. That shows there are no failures.

We also profiled a real large data allocation and free.

We were always looking at the HEAP SIZE value of GPT and did not pay attention to top
output. Today we tried to correlate both.

Here is what we observed. HEAP SIZE we see in GPT is almost near VIRT value reported
in top for the process.

Our problem is we see a LARGE value of VIRTUAL memory the process is holding on to
after release call.. Is there a way to free that?

Given below is the data :

Memory allocated in the process :

Top output :
-------------------------------
Virt : 1240m
RES : 1.1g

GPT output :
--------------------------------

Heap Size : 1267204096

 Stats : ------------------------------------------------
MALLOC:   1267204096 ( 1208.5 MB) Heap size
MALLOC:      2263680 (    2.2 MB) Bytes in use by application
MALLOC:   1260503040 ( 1202.1 MB) Bytes free in page heap
MALLOC:            0 (    0.0 MB) Bytes unmapped in page heap
MALLOC:      1777056 (    1.7 MB) Bytes free in central cache
MALLOC:        62464 (    0.1 MB) Bytes free in transfer cache
MALLOC:      2597856 (    2.5 MB) Bytes free in thread caches
MALLOC:         2205              Spans in use
MALLOC:            1              Thread heaps in use
MALLOC:     11141120 (   10.6 MB) Metadata allocated
MALLOC:         4096              Tcmalloc page size
------------------------------------------------

AFTER CALLING RELEASEFREEMEMORY CALL

TOP OUTPUT
------------------------------------------------
REST : 19m
Virt : 1240m


MALLOC:   1267204096 ( 1208.5 MB) Heap size
MALLOC:      2263680 (    2.2 MB) Bytes in use by application
MALLOC:            0 (    0.0 MB) Bytes free in page heap
MALLOC:   1260503040 ( 1202.1 MB) Bytes unmapped in page heap
MALLOC:      1777056 (    1.7 MB) Bytes free in central cache
MALLOC:        62464 (    0.1 MB) Bytes free in transfer cache
MALLOC:      2597856 (    2.5 MB) Bytes free in thread caches
MALLOC:         2205              Spans in use
MALLOC:            1              Thread heaps in use
MALLOC:     11141120 (   10.6 MB) Metadata allocated
MALLOC:         4096              Tcmalloc page size

We see that RES memory did come down after release call. We also want to release the
1 GB VIRT memory that is currently being used up by the process.

Regards
Sundari.

Reported by sundarij on 2010-10-13 09:05:59

@alk
Copy link
Contributor Author

alk commented Aug 22, 2015

OK, sounds like things are working as they ought.  Even when we release the memory back
to the system, it stays in our virtual address space for accounting purposes by the
kernel.  However, no physical memory is used, and it should not cause any problems.

Are you actually seeing problems in practice (with tcmalloc 1.6)?  Or are you just
seeing these big numbers and being concerned?  If you are seeing problems, what problems
are you seeing, precisely?

Reported by csilvers on 2010-10-13 19:20:36

@alk
Copy link
Contributor Author

alk commented Aug 22, 2015

Thanks Silver. All problems we saw were with GPT 1.4. We have not upgraded to GPT 1.6
as yet.

We started this exercise because one of our Linux servers, that ran 8 processes hung
because of running out of swap space.

Our Application uses GPT 1.4. And we also don't call ReleaseFreeMemory(). We started
this exercise to see if there are ways to reduce the memory foot print per process.

Initially we were not even sure where the problem was (whether there were memory leaks)

As these are production systems, GPT upgrade might not be possible immeidately. We
want to keep the version to 1.4 if possible for this product family.

I ran the same process with GPT 1.4 version, I don't see the mapped and unmapped bytes
after free. GetStats() in GPT 1.4, just reports Free bytes in heap. But the memory
definitely goes down similar to GPT 1.6.

I will introduce the ReleaseFreeMemory() call in our application so that we give back
memory to OS.

One question I still have is bytes remaining in VIRTUAL ADDRESS SPACE. Our servers
have 6 GB of SWAP allocated. They have 32 GB RAM.

If we run 8 processes and each process reserves 1 GB of SWAP space, will we run out
of swap? I would like to understand the implications of this scenario.

ReleaseFreeMemory() seems to solve the issue with respect to physical memory. 

Thanks a ton for your immediate response and support! Greatly appreciated!

Regards
Sundari


Reported by sundarij on 2010-10-14 08:37:30

@alk
Copy link
Contributor Author

alk commented Aug 22, 2015

Just to be clear, virtual memory is not the same as swap.  Assuming you're on a 64-bit
machine, you have (I think) 64000 gigabytes of virtual memory, so you're not likely
to be running out of it.

The tcmalloc stats report virtual memory use (which is what userspace typically gets
to see).  The stuff in 'unmapped in page heap' is definitely *not* taking physical
memory.  If you're seeing lots of physical memory being used, it must be from the other
numbers.

} I will introduce the ReleaseFreeMemory() call in our application so that we give
back memory to OS.

That's a good idea.  We should emphasize that more in the docs.  I'll try to figure
out the right wording.

} As these are production systems, GPT upgrade might not be possible immeidately. We
want to keep the version to 1.4 if possible for this product family.

That should be fine.  You can try setting the environment variable TCMALLOC_SAMPLE_PARAMETER=0
before running your program, and see if that helps.

Reported by csilvers on 2010-10-14 21:29:38

@alk
Copy link
Contributor Author

alk commented Aug 22, 2015

Closing this bug -- I don't think tcmalloc is doing anything wrong here.  The wording
of the memory-use message has been improved since perftools 1.6, to make it clearer
that virtual memory use isn't causing any problems.

I suspect the sampling is what's really causing issues here, since it doesn't show
up in the tcmalloc memory use output.  Since we turn off sampling by default in the
latest perftools, that could be considered resolved now too. :-)

Reported by csilvers on 2011-09-01 01:53:46

  • Status changed: NotABug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant