Page MenuHomePhabricator

Stop sending change notification email if edit is done by a bot
Closed, ResolvedPublic

Description

Bots do a lot of edits and it can easily drown the notified person.

We already hide bot edits from default view and users always can check their watchlist to catch up on what was missed.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 997861 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] mail: Stop sending notification email if edit is done by a bot

https://gerrit.wikimedia.org/r/997861

Hello @Ladsgroup! For Tech News, what wording would you suggest as the content, and When should it be included? Thanks!"

Hi, something along the lines of:

If you have "Email me when a page or a file on my watchlist is changed " option enabled, bot edits will no longer trigger notification emails. You can still see bot edits in [[Special:Watchlist]].

It should be included ASAP. We are planning to deploy this really soon.

@Ladsgroup Is this change happening next week?

@Ladsgroup there is a comment in the Tech News: 2024-07 talk page concerning this change after I added it to the Tech News:

Hi. It's a new bug? When it will be fixed? It could be potentially a good thing, but only after the problem of changing the behavior to "wait until the first non-bot edit and then trigger notification" instead of current "do not notify never if there was at least one bot edit, causing users not to be aware of any edits, bot or non-bot" will be fixed after so many years.

Please can someone reply to this comment or add the response here for me to reply? Thank you!

I responded there, I also added a link to this ticket in tech news.

Change 997861 merged by jenkins-bot:

[mediawiki/core@master] mail: Stop sending notification email if edit is done by a bot

https://gerrit.wikimedia.org/r/997861

Ladsgroup claimed this task.

Change 998983 had a related patch set uploaded (by Tacsipacsi; author: Tacsipacsi):

[mediawiki/core@master] Revert "mail: Stop sending notification email if edit is done by a bot"

https://gerrit.wikimedia.org/r/998983

Tacsipacsi triaged this task as Unbreak Now! priority.

The revert should be merged before it hits production and edits by bot accounts start (irreversibly) being handled as seen.

Ladsgroup lowered the priority of this task from Unbreak Now! to Medium.Feb 12 2024, 8:56 AM

If you disagree with a change, it doesn't mean it has to be reverted.

Tacsipacsi raised the priority of this task from Medium to Unbreak Now!.Feb 12 2024, 9:01 AM

This is BROKEN and causes DATA LOSS. I can work with you on a better solution, but not if you’re not willing to accept that it’s not just my personal taste, but you got it wrong. And I’m not the only one who thinks it’s unacceptable.

It doesn't lead to data loss...

Then how do you call the fact that the database write necessary to highlight the bot edit in the watchlist doesn’t happen?

If that's data loss, we are having data loss since T29884

No, because bot edits used to be correctly highlighted on Special:Watchlist before your patch. The data was there, it just wasn’t delivered in one particular way. Now, it’s not persisted and thus not accessible in any way.

Change 1002391 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] mail: Still update the notification timestamp for bot edits

https://gerrit.wikimedia.org/r/1002391

Joe lowered the priority of this task from Unbreak Now! to Medium.Feb 12 2024, 9:55 AM
Joe subscribed.

@Tacsipacsi I would ask you to keep emotions in check, it's very hard to collaborate (which is what we are supposed to be doing here) when someone is that confrontational. Specifically:

  • yelling doesn't get your point through better, quite the opposite. In fact, took me quite a bit to understand what you were trying to say.
  • raising tasks to UBN! because you're in disagreement also won't get your point through better,.

Let me also add, not making this change has serious potential operational consequences. So the change of preventing sending emails on bot edits needs to happen.

As I understand it, this change would prevent sending emails on bot edits, but also avoid bolding the item because the bolding happens in the email sending job. We can improve the change to make the UX better for editors, but this is not a data loss, nor an unbreak now! issue.

On the other hand, if we don't make this change, we'll have potentially situations where emails will not get delivered, and not just for bot edits, for anything from login recovery to user-to-user emails to watchlist notifications (of any nature).

I've asked to retrieve my bot flag, which I temporarily gave away a while ago due to lack of time. I know pretty well what is going on now on Watchlist in different modes, because I wrote a huge gadget, Whatchlist Manager, years ago. So, I'm planning to create various use cases to check what is the state after the deployment. Starting from this one:

  1. Open some watched page at 23:59.
  2. Close it.
  3. Go away for an hour.
  4. During this period, another accounts edit this page: three non-bot edits, a bot edit, three more non bot edits.
  5. Return to the screen and open the Watchlist in group mode, showing new edits only.

I expect to see a new group with seven edits. As far as I understand from todays Tech News, I will see none. Those before because of the new change, since marking an edit as seen marks automatically all the previous ones. Those after because of the mentioned old bug in the new conditions. The best result will be if I see all the seven.

I don't think what you're describing will happen. Will try it after deploy and let us know.

Change 1002391 merged by jenkins-bot:

[mediawiki/core@master] mail: Still update the notification timestamp for bot edits

https://gerrit.wikimedia.org/r/1002391

Change 998983 abandoned by Tacsipacsi:

[mediawiki/core@master] Revert "mail: Stop sending notification email if edit is done by a bot"

Reason:

In favor of I7f1b6f2adcbb22703f52d7ac0e4322379f81ebdc

https://gerrit.wikimedia.org/r/998983

I call this resolved. Let me know if you have any concerns. I continue to work on improvements.

In T356984#9532727, @Joe wrote:

@Tacsipacsi I would ask you to keep emotions in check

I tried to, but it’s very upsetting that I try to prevent a very bad commit (although done in good faith) from hitting production, and all I get is comments and −2’s that say that it’s wrong, or comments that designate my arguments as “disagreement”.

Let me also add, not making this change has serious potential operational consequences. So the change of preventing sending emails on bot edits needs to happen.

What are those serious operational consequences? What is too high? The number of jobs? The database size? The mail traffic? Please note that the two changes (9dc68fb, f40a495) only marginally affected the number of outgoing mails, as most bot edits already generated no mail due to the condition

!$minorEdit || ( $config->get( MainConfigNames::EnotifMinorEdits ) && !$editor->isAllowed( 'nominornewtalk' ) )

i.e. users with nominornewtalk right (this includes bots) don’t send notification mails on minor edits – and most bot edits are minor.

@Ladsgroup also wrote on Gerrit that “There are lots of reasons to make this change. I can provide you those reasons privately.” Why can’t those reasons be stated publicly?

As I understand it, this change would prevent sending emails on bot edits, but also avoid bolding the item because the bolding happens in the email sending job. We can improve the change to make the UX better for editors, but this is not a data loss

If the data necessary for bolding those lines isn’t written to the database, then it hasn’t been written, and the information for that time range is lost.

On the other hand, if we don't make this change, we'll have potentially situations where emails will not get delivered, and not just for bot edits, for anything from login recovery to user-to-user emails to watchlist notifications (of any nature).

Again, without any public details, I don’t know how big the risk is. What I know is that watchlist notifications not being delivered is not only potential, but a real thing (T29884, T40874), and has been for ages. This means two things:

  • This should be handled in some way, hopefully sooner rather than later.
  • Editors are used to this issue, probably have developed workarounds (e.g. check Special:Watchlist regularly): that “sooner rather than later” doesn’t mean ASAP, there’s no rush.

So when I realized the new bug on Saturday, i.e. one workday before the train branch cut, I felt that the priority and the realistically possible thing is reverting now and fixing later. Now that it has been (largely) reverted, I’m happy to work together on a solution that satisfies all parties: operators, users wanting to be notified about bot edits and users not wanting to miss emails about non-bot edits.

Well, I've just tried four most significant scenarios, and it works fine.

@UOzurumba Hi, this was already announced in the last tech news, why announcing it again?

Well, I've just tried four most significant scenarios, and it works fine.

Tried the rest, they are ok too.

@UOzurumba Hi, this was already announced in the last tech news, why announcing it again?

It wasn't: https://meta.wikimedia.org/wiki/Tech/News/2024/07.

Since the train arrived, I’ve missed

All this because of obscure reasons like “serious potential operational consequences”.

All this because of obscure reasons like “serious potential operational consequences”.

"Why can’t those reasons be stated publicly?" this essentially always means it is a security sensitive topic and disclosing it publicly could have impact. While I realise that it is annoying that you are not privy to all details, that is exactly how it should be. The fact that this has to be spelled out to a very seasoned editor and regular code contributor, drawing more attention to the problem is honestly kinda problematic.

In T356984#9554287, @TheDJ wrote:

"Why can’t those reasons be stated publicly?" this essentially always means it is a security sensitive topic and disclosing it publicly could have impact.

Or that the people who are in the know don’t take time to determine exactly what details are sensitive and disclose parts that aren’t.

While I realise that it is annoying that you are not privy to all details, that is exactly how it should be.

It’s not just the details, it’s at least a rough understanding of the problem. And it’s not just annoying, it makes it impossible to work together on a solution that helps with this problem without making the wikis less usable. For example, if the problem is the amount of outbound mail traffic, my proposal that tries to reduce the number of unnecessarily scheduled jobs won’t help. If the problem is the amount of jobs, optimizing DB queries won’t help. I’ve spent a bit of time over the weekend understanding how the code works, and I have several ideas for improvement (mail traffic, job count, response time, database query optimization), but these are often contradicting, and I can’t propose solutions if I don’t know what the problem is. I can keep uploading patches that try restore the previous status quo, but that won’t really bring us forward.

The issue I've had is that, whenever a bot edits a page or file, I don't receive any notification emails for that page or file from that point forward, which includes any subsequent edits by humans.

Good god. First we can't fix T250856, and now we're making the problem worse??? I might be in the minority of wanting to get every edit made to my watchlist to my inbox, but I feel like this goes over the top to fix what is not necessarily an issue.

Please tell me I'm missing something here.

@Ladsgroup the underlying concern for this task is the volume of bot edits to WikiData specifically, right? In that case, it seems far more reasonable to disable bot edit enotifs for WD edits specifically, rather than breaking/removing functionality for all wikis, especially if the concern is for these edits causing enotifs to users on non-WD wikis (in that case, a general "email me also for edits to connected wikis" setup seems good, which would even give an explicit handle for blanket-disabling enotifs for specific types of edits from connected wikis).

(Assuming I'm right about the motivation,) this feels spiritually related to the option to view category membership changes in your watchlist as well - I'd be surprised if very many people are interested in getting enotifs for those changes, and in any case it was added as a trackable type of change only relatively recently, so it's much more reasonable to choose not to even allow getting enotifs for those changes in the first place.

Now I can explain about this a bit more (cleared by the security team). Yes. When I said the discussion happened privately, I meant it happened in a security ticket.

This feature was sending ~200,000 emails a day on a normal day but that in itself wasn't the issue. The issue was that since bots don't get rate limited. A couple weeks ago, a bot started editing fast enough and that brought down all of our mail infrastructure so bad that important mails such as login notification, password reset and email address confirmation mails were getting dropped. So this is not just a concern or a hypothetical scenario. It happened in January this year. Most importantly, this is not just an accident issue, this can easily be weaponized to bring our mailservers down or use our infra to DOS another mail provider. The only things you need is a bot account (or a compromised one). We don't go around break important user features due to hypothetical scenarios, this case literally brought us down before.

You might say "buy more hardware" and that would fix the mail servers being overloaded. But one reason the outage got worse was that we were sending so many mails that Gmail stopped accepting email from us altogether. We can't fix that by adding more hardware. One way to tackle this is to set up a dedicated second lane mail servers (call them "high volume" or something like that) and teaching mediawiki to send notification emails to them so if gmail or others start throttling or rejecting them, security-sensitive emails wouldn't be affected.

I know this is disruptive, and I apologize for that. On top of that, the current state of watchlist code is not helping either (specially regarding unseen watchlist entries getting highlighted being tied to how emails are sent...) but we really didn't have a choice. I hope to make improvements on the watchlist infra later.

In the meantime, I suggested creating a tool in toolforge to do the work for you, you could potentially create a generalized one that users could "sign up" by providing their watchlist token. So one person would need to create and maintain the tool and others could just use it (and don't need to re-invent the wheel).

My apologies again for this but we don't have any other choice.

Thanks for the explanation! So the concern is the amount of outgoing mail, specifically the amount of mail going out from the production mail server (or amount of mail having the sender wiki@wikimedia.org?).

I think a plan could be:

  1. Restore sending mails about non-bot edits (i.e. edits without bot flag in recent changes). This would restore mails about the most important edits by bot users while keeping the increase in the amount of outgoing mail relatively low, and doesn’t sound very complicated.
    • Check the bot flag instead of the bot group membership when deciding whether to reject the email sending.
    • Rate-limit all non-flagged edits regardless of whether the user belongs to the bot group (if it’s not already rate-limited). This would not only avoid the too big spikes in outgoing mails, but also avoid recent changes being cluttered. If a bot needs to edit so quickly that it runs into the rate limit, it should use the bot flag.
  2. Restore sending mails about flagged bot edits. Flagged bot edits could use different infrastructure, which delivers mails on a best-effort basis, designed so that it rather drops mails than risk the infrastructure sending other mails. I don’t know how complicated implementing this would be, I hope that manageably.

Thanks for the explanation! So the concern is the amount of outgoing mail, specifically the amount of mail going out from the production mail server (or amount of mail having the sender wiki@wikimedia.org?).

I think a plan could be:

  1. Restore sending mails about non-bot edits (i.e. edits without bot flag in recent changes). This would restore mails about the most important edits by bot users while keeping the increase in the amount of outgoing mail relatively low, and doesn’t sound very complicated.
    • Check the bot flag instead of the bot group membership when deciding whether to reject the email sending.
    • Rate-limit all non-flagged edits regardless of whether the user belongs to the bot group (if it’s not already rate-limited). This would not only avoid the too big spikes in outgoing mails, but also avoid recent changes being cluttered. If a bot needs to edit so quickly that it runs into the rate limit, it should use the bot flag.

Yeah, the second part is something that we need to make sure happen before being able to allow non-bot flagged bot edits to go through.

  1. Restore sending mails about flagged bot edits. Flagged bot edits could use different infrastructure, which delivers mails on a best-effort basis, designed so that it rather drops mails than risk the infrastructure sending other mails. I don’t know how complicated implementing this would be, I hope that manageably.

It's not trivial. You need to set up dedicated DNS, dedicated hw, etc. and at the end mail providers might just throttle the whole AS (while I admit it's unlikely) and breaking everything again. Before that, we probably better to finish migration off exim4

Yeah, the second part is something that we need to make sure happen before being able to allow non-bot flagged bot edits to go through.

Does this mean that you’re open to implementing this “phase 1” now provided that the first bullet point is not merged before the second one? (Since rate limiting potentially breaks bots – at least malfunctioning ones –, it seems like a risky patch and something that’s worth riding a train before the actual mail sending change. It should also get its own Phabricator task and announced in well advance.)

It's not trivial. You need to set up dedicated DNS, dedicated hw, etc. and at the end mail providers might just throttle the whole AS (while I admit it's unlikely) and breaking everything again.

Is it more risky than sending the same mails from Toolforge?

Before that, we probably better to finish migration off exim4

Is there any proposed timeline or Phabricator task about it?

Yeah, the second part is something that we need to make sure happen before being able to allow non-bot flagged bot edits to go through.

Does this mean that you’re open to implementing this “phase 1” now provided that the first bullet point is not merged before the second one? (Since rate limiting potentially breaks bots – at least malfunctioning ones –, it seems like a risky patch and something that’s worth riding a train before the actual mail sending change. It should also get its own Phabricator task and announced in well advance.)

I'd be happy to help there but given the amount of work I have on my plate (and this will have to be in my volunteer capacity), I don't think I can take the lead on this.

Also, we can start by decoupling email notification and highlighting. It should be rather easy and fruitful to the community.

It's not trivial. You need to set up dedicated DNS, dedicated hw, etc. and at the end mail providers might just throttle the whole AS (while I admit it's unlikely) and breaking everything again.

Is it more risky than sending the same mails from Toolforge?

Fair, toolforge is the same AS but different CIDR range.

Before that, we probably better to finish migration off exim4

Is there any proposed timeline or Phabricator task about it?

T232343: Consider Postfix as MTA for our MXes (and OTRS/Mailman/Phab)