Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apparent memory leak #6876

Closed
rustyrussell opened this issue Oct 23, 2015 · 26 comments
Closed

Apparent memory leak #6876

rustyrussell opened this issue Oct 23, 2015 · 26 comments

Comments

@rustyrussell
Copy link
Contributor

See http://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-October/011511.html and thread.

@pstratem
Copy link
Contributor

#6872

@TheBlueMatt
Copy link
Contributor

#6872 is likely unrelated. The issue seems to be something stemming from getblocktemplate-related memory usage.

@older-c-coder
Copy link

Has anyone tried the Qt version to see if it has the same problem?

And BTW, are the release versions (of the daemon & Qt) compiled in debug mode?

Last question: can a unit test be cobbled up to test for this memory issue?

Ron

@jtoomim
Copy link

#6872 may be related. CreateNewBlock() makes fairly extensive use of CCoinsViewCache. I ran a short (4 hour) test with valgrind --tool=massif, and CCoinsViewCache came up as using a lot of memory. A much longer test run is needed for this apparent memory leak issue to show up clearly. I'm about 24 hours into what I plan to be a 40- or 50-hour massif test. I'll post those results when I have them. It might take even longer than 50 hours for massif to get good results. it's taking 30 seconds for valgrind bitcoind to get through a getblocktemplate call, rather than the usual <= 1 sec. I don't know if the leak is related simply to the number of times GBT is called, or if it depends on the uniqueness of the mempool each time GBT is called.

@older-c-coder I have not tested with Qt. A unit test would likely be difficult to write, since the issue is only apparent in terms of RSS usage after calling GBT frequently for around 24 hours. It's possible that this process might be accelerated (e.g. create a fake mempool from block X, run GBT/CNB, X++;, etc). Even then, such a test is likely to take a while, and we don't yet know if that approach will work at triggering the leak.

@morcos
Copy link
Member

There has been IRC discussion of this in bitcoin-core-dev on the 20th and 21st.
I do not think there is an actual "leak". The increased memory usage is mostly a result of the not actually limited size of CCoinsViewCache discussed in #6872 combined with poor memory allocator behavior.
The problem is triggered by calling getblocktemplate which calls CreateNewBlock. In CreateNewBlock every single tx in your memory pool is pulled into the in memory cache of both pcoins tip and a cache view on top of that. The memory allocator also causes this memory to be allocated in a different location for every thread that calls getblocktemplate. Although this memory is freed at the end of CreateNewBlock, there seems to be enough fragmentation that there is still some memory being used towards the end of the heap and the bulk of the memory can not be returned to the OS. This can cause up to # of RPC threads (default: 4) times the memory allocation needed for 1 full memory pool worth of chainstate to remain resident. In addition, depending on the size of your dbcache, if one of the calls to getblocktemplate causes a rehash of the unordered map in your pcoinsTip cache, then that can also take up space in another memory arena because now it has been allocated from a different thread.

I'm working on a short term smaller rewrite of CreateNewBlock which will not require pulling the entire memory pool worth of txs into the cache, which should make the problem much better.

@jtoomim
Copy link

http://toom.im/files/massif.out.27357.log
Or the raw output, if you want to run ms_print on it with your own options:
http://toom.im/files/massif.out.27357

(For best viewing, disable word wrap, like with less -S massif.out.27357.log.)

The highest memory usage appears to be at timestamp 47. Search for "47 68" to jump there. It looks like at that time, a total 46.64% of the heap (411 MB) was in use by CCoinsViewCache via HTTPReq_JSONRPC and CreateNewBlock. The two offending lines in miner.cpp appear to be 175 if (!view.HaveCoins(txin.prevout.hash)) and 348 if (!TestBlockValidity(state, *pblock, previndexPrev, false, false)), the latter of which results in a HaveCoins call later.

@laanwj
Copy link
Member

TL.DR from IRC: this seems specific to the glibc multi-thread arena memory allocator. The application deallocates the memory properly, but pages are not given back to the system after deallocation (probably due to fragmentation allocating lots of small objects). Workaround that we determined on IRC is export MALLOC_ARENA_MAX=1, see https://gist.github.com/laanwj/efe29c7661ce9b6620a7#linux-specific

@pstratem
Copy link
Contributor

Can confirm the work around works. ( export MALLOC_ARENA_MAX=1 )

@jtoomim
Copy link

Seems to be working for me too. 724M RSS after 10 hours. Ten hours isn't long enough to be certain. I'll chirp in tomorrow if it's still a problem.

Edit: it's a day later, and RSS is at 1020 MB. Normally, I would expect memory usage near 1500 MB by now. Seems that MALLOC_ARENA_MAX is working.

@flound1129
Copy link

2.9G RSS after restarting with export MALLOC_ARENA_MAX=1 this morning (about 14 hours ago). Still undesirably high imo.

@TheBlueMatt
Copy link
Contributor

@flound1129 what platform are you on, and what is your -maxmempool setting and -dbcache setting?

@flound1129
Copy link

uname -a

Linux p1 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2 22:08:27 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.3 LTS
Release: 14.04
Codename: trusty

haven't changed maxmempool or dbcache setting from defaults.

@TheBlueMatt
Copy link
Contributor

Wait, are you running master with the mempool limiting stuff in, or a released version?

On October 27, 2015 10:49:02 PM PDT, Adam McKenna notifications@github.com wrote:

uname -a

Linux p1 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2 22:08:27 UTC 2015
x86_64 x86_64 x86_64 GNU/Linux

lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.3 LTS
Release: 14.04
Codename: trusty

haven't changed maxmempool or dbcache setting from defaults.


Reply to this email directly or view it on GitHub:
#6876 (comment)

@flound1129
Copy link

This is on 0.11.1

@sipa
Copy link
Member

sipa commented Oct 28, 2015 via email

@flound1129
Copy link

So you're saying this issue is fixed in master? Or you would like me to test with master? My mempool is only around 162 megs.

@jtoomim
Copy link

No, sipa is saying there are two possible reasons why your memory consumption could be high. One reason is that you are running into the multi-arena allocating/fragmentation bug described here in this issue, #6876. The other is that your mempool is getting bloated from the flood of 14780 byte transactions we're seeing right now.

Try sticking minrelaytxfee=0.0002 into your ~/.bitcoin/bitcoin.conf file, or running with the minrelaytxfee=0.0001 command line option. That should reduce your mempool size by about 161 megs.

@flound1129
Copy link

I'm not worried about having a large mempool, or really even worried about 3GB of total usage. I'm only concerned if there is an actual leak.

@TheBlueMatt
Copy link
Contributor

I dont think we've seen any evidence of an actual leak, just bloated mempools and poor memory allocation algorithms.

@jameshilliard
Copy link
Contributor

I was seeing this as well on my pool's node, it would appear setting limitfreerelay=0 resolves it. This bug appears to cause severe GBT latency that gets worse the longer your node is up. Using my pool block update bench-marking tool my pool went from near the bottom of the list to the top by simply setting limitfreerelay=0. My guess is there is a bug or leak related to mempool handling of free transactions.

@sipa
Copy link
Member

sipa commented Nov 7, 2015 via email

@jameshilliard
Copy link
Contributor

I noticed there was a massive discrepancy between "currentblocksize" : 473630 from getmininginfo and "bytes" : 6934459 from getmempoolinfo even thought the transaction count was roughly the same between those two RPC calls, setting limitfreerelay=0 resolved that and they now match. BTW I also run minrelaytxfee=0.0001.

@jtoomim
Copy link

I believe jameshillard's issue to be different from the memory fragmentation issue reported in this thread. As far as I understand it from IRC, he was getting a lot of the 14790 txns into his mempool via limitfreerelay=default, like these two (from his getrawmempool RPC):

https://www.blocktrail.com/BTC/tx/951761082b38167ca92a94cce9f0783157434ffc66ff39715b1ac30a72cd6611

https://www.blocktrail.com/BTC/tx/5935e25d931734e42e0fff881df79ea9e0d3f4c9d552e9ac20f8825b29049f0a

These two transactions were never getting evicted from his mempool, and were not getting incorporated into blocks by other miners. These were 2 of the 4 transactions that I checked, indicating that very roughly 50% of his mempool was filled with spam. Much more of them were in his mempool than could fit in his 5% priority block space, so his GBT block sizes were kept small even though his mempool was bloated. He was complaining of mempool size, not of RSS usage. I do not think he was encountering any bugs, just features that he did not understand and that may have not been designed optimally in terms of UX.

@jtoomim
Copy link

"Using my pool block update bench-marking tool my pool went from near the bottom of the list to the top by simply setting limitfreerelay=0." -- Actually, it's probably just that you restarted your server, which clears out the mempool. Setting limitfreerelay=0 would change the long-term slope of your GBT latency graph and mempool size to be closer or equal to zero, but the y intercept will be low in any case.

@jameshilliard
Copy link
Contributor

Actually, it's probably just that you restarted your server, which clears out the mempool.

I've let it run for long enough that I don't think that's the issue, before setting that flag GBT performance would degrade in a matter of hours. After changing limitfreerelay to 0, "currentblocksize" from getmininginfo and "bytes" from getmempoolinfo are now the same. The transaction counts were always roughly the same between getmininginfo and getmempoolinfo before and after this change, I guess it's possible that there were some extremely large free transactions taking up all the space.

Setting limitfreerelay=0 would change the long-term slope of your GBT latency graph and mempool size to be closer or equal to zero, but the y intercept will be low in any case.

The issue wasn't with the y intercept it was that it was sloping up over time.

@maflcko
Copy link
Member

This is likely fixed since 0.12

@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests