Page MenuHomePhabricator

Stop logging and clean up auto review logs
Closed, ResolvedPublic

Description

Half of > 36GB logging table in ruwiki is just autoreview logs. It is mirroring what we did with autopatrolled actions but we removed that (T189596). FlaggedRevs need to follow suite.

This will improve the size of logging tables in wikis such as dewiki and ruwiki. Not adding DBA tag but subscribing them for visibility.

Event Timeline

Thanks for taking a look at this issue.

Change 702235 had a related patch set uploaded (by Ladsgroup; author: Ladsgroup):

[mediawiki/extensions/FlaggedRevs@master] Stop logging auto review actions

https://gerrit.wikimedia.org/r/702235

Change 702235 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@master] Stop logging auto review actions

https://gerrit.wikimedia.org/r/702235

Something in tech news for this week would be amazing before I started cleaning them from database.

Auto review action in flagged revs (pending changes) won't be logged and all previous logs of auto-review will be removed.

Edit mercilessly.

Mentioned in SAL (#wikimedia-operations) [2021-06-30T21:43:46Z] <Amir1> deleting auto-review logs from test2wiki (T285608)

36K rows deleted from test2wiki (22%)

@Ladsgroup Added to https://meta.wikimedia.org/wiki/Tech/News/2021/27 – tried to contextualise and explain why, let me know if I made any mistakes.

Mentioned in SAL (#wikimedia-operations) [2021-07-01T09:55:33Z] <Amir1> start of clean up of autoreview logs in ruwiki (T285608)

In T285608#7189042, @Johan wrote:

@Ladsgroup Added to https://meta.wikimedia.org/wiki/Tech/News/2021/27 – tried to contextualise and explain why, let me know if I made any mistakes.

Looks perfect. Thanks!

https://ru.wikipedia.org/w/index.php?title=Special:Log&type=review&subtype=autoaccept
Looks like this isn't done yet, autoreview logs still exists from year 2008 and new autoreviews from July 4 added just now.

Also it will be good to remove old/native reviewing (patrolling) logs, https://ru.wikipedia.org/w/index.php?title=Special:Log&type=patrol

In T285608#7196536, @MBH wrote:

https://ru.wikipedia.org/w/index.php?title=Special:Log&type=review&subtype=autoaccept
Looks like this isn't done yet, autoreview logs still exists from year 2008

I deleted 35M log entries for ruwiki but there are five million ones left for various reasons. For example since the tech news entry has not been circulated yet, I wanted to avoid surprises so I didn't delete 2021 log entries.

and new autoreviews from July 4 added just now.

Because the patch to stop logging is not deployed yet :D It'll be deployed the week after next. I won't delete much further as DBA presence will be limited next week (due to wmf one-week long holiday). But week of July 12, I'll continue.

Also it will be good to remove old/native reviewing (patrolling) logs, https://ru.wikipedia.org/w/index.php?title=Special:Log&type=patrol

These are manual patrol actions and they will stay forever, autopatrol ones are already deleted.

In T285608#7414739, @MBH wrote:

@Ladsgroup will this be continued?

It doesn't create new logs anymore. Cleanup is mostly done for ruwiki. Other wikis don't need it as much as ruwiki needed it. Doing it for arwiki/dewiki/etc. is on my radar to be picked up in November. Do you have any specific issue in mind?

I'll waiting when automatic events will be cleaned up completely and I can remove records about they from local documentation: https://ru.wikipedia.org/wiki/Project:Патрулирование#Обозначения_действий_патрулирования_в_API

Mentioned in SAL (#wikimedia-operations) [2021-11-17T06:38:14Z] <Amir1> start of deleting auto-review logs in arwiki (T285608) deleting 23M rows

Mentioned in SAL (#wikimedia-operations) [2021-11-17T07:20:18Z] <Amir1> start of clean up of autreview logs of ruwiki, deleting 3.5M rows (T285608)

@MBH ruwiki is fully cleaned now:

mysql:research@s6-analytics-replica.eqiad.wmnet [ruwiki]> select log_action, count(*) from logging where log_type = 'review' group by log_action limit 50;
+------------+----------+
| log_action | count(*) |
+------------+----------+
| approve    |  6605348 |
| approve-i  |  2067314 |
| approve2   |      517 |
| approve2-i |        4 |
| unapprove  |   244060 |
| unapprove2 |       17 |
+------------+----------+
6 rows in set (6.636 sec)

@Ladsgroup thanks. So maybe a filter by autoreviewing can be removed from the site interface?

https://ru.wikipedia.org/wiki/Special:Log?type=review&subtype=autoaccept

Change 739682 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/FlaggedRevs@master] Drop autoaccept from log filtering

https://gerrit.wikimedia.org/r/739682

In T285608#7512105, @MBH wrote:

@Ladsgroup thanks. So maybe a filter by autoreviewing can be removed from the site interface?

https://ru.wikipedia.org/wiki/Special:Log?type=review&subtype=autoaccept

good catch. Made a patch for it.

Change 739682 merged by jenkins-bot:

[mediawiki/extensions/FlaggedRevs@master] Drop autoaccept from log filtering

https://gerrit.wikimedia.org/r/739682

I can't believe to be honest. There are so many of them in the wikis and also for whatever reason the mysql doesn't use the correct index, so I have to delete them year by year so I had to write this bash query:

echo "Cleaning up autopatrol logs for $1"
for j in {2008..2021}
do
for i in {0..10000}
do
  echo "Deleting batch $i for year $j"
  sql $1 --write -- -e "delete from logging where log_type = 'review' and log_action = 'approve-a' and log_timestamp like '$j%' limit 1000;"
  echo "sleeping...zzz"
  sleep 5
  counts=$(sql $1 -- -ss -e "select count(*) from logging where log_type = 'review' and log_action = 'approve-a' and log_timestamp like '$j%';")
  echo "$counts left for $j"
  if [ "$counts" -eq 0 ]; then
      echo "Done for year $j"
      break
  fi
done
done

and ran it with expanddblist flaggedrevs | xargs bash deleter-complex.sh.

After this is all done, I need to run it again with approve-ia as well.