Page MenuHomePhabricator

Reduce size of growthexperiments_user_impact.geui_data json blobs
Closed, ResolvedPublic4 Estimated Story Points

Description

We (DBAs) are seeing a massive jump in size of binlogs in x1: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&refresh=5m&var-server=db2096&var-datasource=thanos&var-cluster=mysql&viewPanel=28&from=1692957346437&to=1700733346437

grafik.png (864×1 px, 72 KB)

Looking at the binlogs I'm seeing a lot of rows being set like this: {P53803}.

This doesn't look right for many reasons, mostly bots shouldn't have user impact, impact by day is extremely permissive (with lots of zeros, it can simply not store days with zeros), they are not compressed and lastly we really shouldn't store large json blobs in core mariadb databases (unless they are explicitly tuned to hold them, like ExternalStorage) but the last one is much harder to fix, so that's fine but please reduce the size of your blobs ASAP.

Event Timeline

Ladsgroup renamed this task from Don't set user impact for bots to Reduce size of growthexperiments_user_impact.geui_data json blobs.Nov 23 2023, 4:31 PM
Ladsgroup triaged this task as Unbreak Now! priority.

Paucabot isn't a bot. It's "Pau Cabot", not "Pauca Bot"

In T351898#9355883, Pppery wrote:

Paucabot isn't a bot. It's "Pau Cabot", not "Pauca Bot"

Fixed, my apologies.

This is most likely a fallback from T336203: Positive reinforcement: Deploy the new Impact module to all Wikipedias, where we enabled the feature on all Wikipedias. The serialization format was not changed recently, but deploying this everywhere (esp enwiki I guess) probably made this issue way more visible. I'll look into this and upload some patches to make the blob size smaller.

@Ladsgroup I can't see P53803 from my on-duty account (with @Urbanecm, it works fine). Would you mind adding @Urbanecm_WMF to WMF-NDA as well, so account switching isn't needed?

@Ladsgroup I can't see P53803 from my on-duty account (with @Urbanecm, it works fine). Would you mind adding @Urbanecm_WMF to WMF-NDA as well, so account switching isn't needed?

Done

Change 977141 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@master] UserImpact: Make generated JSON blobs smaller

https://gerrit.wikimedia.org/r/977141

Hi @Ladsgroup, I've uploaded a patch if you want to review. I think that omitting the days with zeros is going to be the biggest change in that patch, since impact data aren't calculated for a lot of bots already anyway. On enwiki, we apparently have user impact data generated only for nine bots in total (P53863). We can certainly explicitly exclude bots from the automated generation (they'd still be included for manual requests via /w/rest.php/growthexperiments/v0/user-impact/%23USERID, and I think that's behaviour that we should keep).

Compressing the data makes sense as well, but I think that would require a data migration, making the change not-so-easy to make, unfortunately.

Change 977224 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/GrowthExperiments@master] Compress geui_data json blobs

https://gerrit.wikimedia.org/r/977224

Compressing the data makes sense as well, but I think that would require a data migration, making the change not-so-easy to make, unfortunately.

I'm not following, you can keep backwards compatibility, you don't need to make any data migration. I made the patch that literally implements it ^

Compression makes a massive difference, in the first binlog entry, the size of 8979 bytes, with compression it goes to 1491 bytes. 17% of the original size.


Also random thing I found is that upsert is causing doubling the query size as well because the query is like this:

INSERT /* GrowthExperiments\UserImpact\DatabaseUserImpactStore::setUserImpact  */ INTO `growthexperiments_user_impact` (geui_user_id,geui_data,geui_timestamp) VALUES (123, ****REALLY LARGE BLOB****, '20231123155340') ON DUPLICATE KEY UPDATE geui_data = ****THE SAME REALLY LARGE BLOB AGAIN****

There are ways to avoid upsert as much as possible, check replica first for example and do update if you're sure it's going to cause duplicate key. If not, then upsert.

Change 977265 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@master] DatabaseUserImpactStore: Avoid doing an upsert

https://gerrit.wikimedia.org/r/977265

Change 977265 abandoned by Urbanecm:

[mediawiki/extensions/GrowthExperiments@master] DatabaseUserImpactStore: Avoid doing an upsert

Reason:

squashed into previous

https://gerrit.wikimedia.org/r/977265

FYI: T351888: Flow integration tests block GrowthExperiments merges with an unexpected ContainerDisabledException kind of blocks this, since tests failing (not sure why).

Compressing the data makes sense as well, but I think that would require a data migration, making the change not-so-easy to make, unfortunately.

I'm not following, you can keep backwards compatibility, you don't need to make any data migration. I made the patch that literally implements it ^
Compression makes a massive difference, in the first binlog entry, the size of 8979 bytes, with compression it goes to 1491 bytes. 17% of the original size.

Ignore me, for some reason the "just try decompressing" approach didn't occur to me at all and I didn't realize how easily this can be done. Thanks for the patch!


Also random thing I found is that upsert is causing doubling the query size as well because the query is like this:

INSERT /* GrowthExperiments\UserImpact\DatabaseUserImpactStore::setUserImpact  */ INTO `growthexperiments_user_impact` (geui_user_id,geui_data,geui_timestamp) VALUES (123, ****REALLY LARGE BLOB****, '20231123155340') ON DUPLICATE KEY UPDATE geui_data = ****THE SAME REALLY LARGE BLOB AGAIN****

There are ways to avoid upsert as much as possible, check replica first for example and do update if you're sure it's going to cause duplicate key. If not, then upsert.

Done in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/977141.

Change 977224 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Compress geui_data json blobs

https://gerrit.wikimedia.org/r/977224

Thanks Martin and Amir for fixing this quickly. The changes in 977141 could be slightly changing the articles list output since "old edited" articles with 0 page views would added to recentEditsWithoutPageviews. The problem is rather semantic than breaking the functionality but it could be confusing for uses if they get to see it. Since 0-views edited articles is almost impossible scenario I don't think this is blocker. Filed as T352013.

Change 977613 had a related patch set uploaded (by Urbanecm; author: Amir Sarabadani):

[mediawiki/extensions/GrowthExperiments@wmf/1.42.0-wmf.5] Compress geui_data json blobs

https://gerrit.wikimedia.org/r/977613

Change 977629 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@wmf/1.42.0-wmf.5] UserImpact: Make smaller SQL queries

https://gerrit.wikimedia.org/r/977629

Change 977141 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] UserImpact: Make smaller SQL queries

https://gerrit.wikimedia.org/r/977141

Change 977613 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.42.0-wmf.5] Compress geui_data json blobs

https://gerrit.wikimedia.org/r/977613

Change 977629 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.42.0-wmf.5] UserImpact: Make smaller SQL queries

https://gerrit.wikimedia.org/r/977629

Mentioned in SAL (#wikimedia-operations) [2023-11-27T12:56:41Z] <urbanecm@deploy2002> Started scap: Backport for [[gerrit:977613|Compress geui_data json blobs (T351898)]], [[gerrit:977645|User impact: timezone cleanup (T329700)]], [[gerrit:977629|UserImpact: Make smaller SQL queries (T351898)]]

Mentioned in SAL (#wikimedia-operations) [2023-11-27T13:04:18Z] <urbanecm@deploy2002> Finished scap: Backport for [[gerrit:977613|Compress geui_data json blobs (T351898)]], [[gerrit:977645|User impact: timezone cleanup (T329700)]], [[gerrit:977629|UserImpact: Make smaller SQL queries (T351898)]] (duration: 07m 37s)

Urbanecm_WMF lowered the priority of this task from Unbreak Now! to High.Nov 27 2023, 2:09 PM
Urbanecm_WMF moved this task from Code Review to QA on the Growth-Team (Sprint 3 (Growth Team)) board.

Hi @Ladsgroup, this should now be done in production. One last huge batch of upserts is to be expected (because of an unrelated issue, we've bumped the version constant, resulting in regenerating all user impact data). Once that finishes, the size of queries should be considerably smaller. Leaving it open for you to verify – happy to make further changes if needed.

Urbanecm_WMF set the point value for this task to 4.Nov 28 2023, 11:56 AM

Thanks. I keep an eye on binlog sizes

Just to be clear, what we should be looking at is that it doesn't grow that much. The drop in space isn't a consequence of the fix, the fix won't reduce it like that, it will just make it grow slower :-)
That massive drop is basically us (DBAs) doing an emergency delete of all the binlogs.

Thanks for fixing this!

Yeah, we basically need to look at the derivative and for the sake of reporting/completeness, before the fix it was growing 100GB/day but after the fix it's 10GB a day (total binlog size would reduce from 3TB to 0.3TB)

Change 979097 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/GrowthExperiments@master] User impact: sort datestring keys to ascending alphanumeric order

https://gerrit.wikimedia.org/r/979097

Change 979097 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] User impact: sort datestring keys to ascending alphanumeric order

https://gerrit.wikimedia.org/r/979097

Change 979690 had a related patch set uploaded (by Urbanecm; author: Sergio Gimeno):

[mediawiki/extensions/GrowthExperiments@wmf/1.42.0-wmf.7] User impact: sort datestring keys to ascending alphanumeric order

https://gerrit.wikimedia.org/r/979690

Change 979690 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.42.0-wmf.7] User impact: sort datestring keys to ascending alphanumeric order

https://gerrit.wikimedia.org/r/979690

Mentioned in SAL (#wikimedia-operations) [2023-12-04T12:25:44Z] <urbanecm@deploy2002> Started scap: Backport for [[gerrit:979690|User impact: sort datestring keys to ascending alphanumeric order (T352349 T351898)]]

Mentioned in SAL (#wikimedia-operations) [2023-12-04T12:28:19Z] <urbanecm@deploy2002> urbanecm: Backport for [[gerrit:979690|User impact: sort datestring keys to ascending alphanumeric order (T352349 T351898)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2023-12-04T12:35:28Z] <urbanecm@deploy2002> Finished scap: Backport for [[gerrit:979690|User impact: sort datestring keys to ascending alphanumeric order (T352349 T351898)]] (duration: 09m 43s)