[Platform Events] Event retry can take longer that expected time (expected retry in secs, actual retry in mins)
Trailblazer Community

[Platform Events] Event retry can take longer that expected time (expected retry in secs, actual retry in mins)

Apex , API , Performance

Last updated 2021-08-05 ·Reference W-8911044 ·Reported By 12 users

In Review

Summary
Intermittently, Event retries can take longer than expected to retry previously failed events

Example
It could take 5-17 mins for next time an event is retried.

Expected
It should take seconds for next time an event is retried.

ref:
https://developer.salesforce.com/docs/atlas.en-us.platform_events.meta/platform_events/platform_events_subscribe_apex_refire.htm

https://developer.salesforce.com/docs/atlas.en-us.platform_events.meta/platform_events/platform_events_subscribe_apex.htm

Repro
1) Create a custom Object
-ApiName - myObj__c
-Add custom field replayId__c - Text(255)

2) Create a custom platform event.
-ApiName - myPE__e
-Add Custom Field - Name__c - Text(10)

3) Create a trigger - similar to below
ResendEventsTrigger
===========
trigger ResendEventsTrigger on myPE__e (after insert) {
system.debug('Start');
for(myPE__e event: Trigger.New){
if(!event.Name__c.contains('ERR')){
myObj__c obj = new myObj__c (Name=event.Name__c,replayId__c=event.replayId);
insert obj;
system.debug('myObj__c - insert');
} else {
// Ensure we don't retry the trigger more than 2 times
if (EventBus.TriggerContext.currentContext().retries < 2) {
// Condition isn't met, so try again later.
system.debug('EventBus.RetryableException');
throw new EventBus.RetryableException();
} else {
// donothing
system.debug('donothing');
}
}
}
}
==========

In the above example when an event with the name "ERR", we then intentionally throw a RetryableException, this will then retry twice, then on the third time will then nothing.

Other events then insert into a custom object.

=======================

1. Insert a bad name platform events with name "ERR".
example
https://developer.salesforce.com/docs/atlas.en-us.platform_events.meta/platform_events/platform_events_publish_apex.htm

List<myPE__e> myPEEvents = new List<myPE__e>();
myPEEvents.add(new myPE__e(Name__c='ERR'));
List<Database.SaveResult> results = EventBus.publish(myPEEvents);

for (Database.SaveResult sr : results) {
if (sr.isSuccess()) {
System.debug('Successfully published event.');
} else {
for(Database.Error err : sr.getErrors()) {
System.debug('Error returned: ' +
err.getStatusCode() +
' - ' +
err.getMessage());
}
}
}



2. Wait for a min, insert a good platform events with other names.

List<myPE__e> myPEEvents = new List<myPE__e>();
myPEEvents.add(new myPE__e(Name__c='abc'));
myPEEvents.add(new myPE__e(Name__c='def'));
myPEEvents.add(new myPE__e(Name__c='123'));
myPEEvents.add(new myPE__e(Name__c='456'));
List<Database.SaveResult> results = EventBus.publish(myPEEvents);

....

3. Repeat 2 if you want. The behavior is the event won't be processed, it will appear to be stuck there.

4. Check debug logs to see when the Apex trigger retried twice and processed those events.
We will see it will wait for mins till the Apex trigger will process the good name events.


=========
Example time test run
At 12:10 pm, insert a new event with the name "abc", cusObj created, and debug log shows success result.
At 12:11 pm, insert a new event with the name "ERR", debug logs show Apex trigger failed with "Script-thrown exception".
- After this transaction, EventBus.TriggerContext.currentContext().retries == 1
At 12:15 pm, insert a new event with the name "bca", nothing happens, this is expected as the Apex trigger still waits in the queue due to the previous retryableException.
At 12:29 pm, Apex shared subscriber resume and try to process the batch of events again, failed, with "Script-thrown exception".
- After this transaction, EventBus.TriggerContext.currentContext().retries ==2
At 12:35 pm, insert a new event with name "123", nothing happens, this is expected as the Apex shared subscriber still waits in the queue due to previous retryableException.

Workaround
Removing the Retryable logic should remove the issue ie. EventBus.RetryableException()


However this might not suit customers use cases.

Is it Fixed?

AP0 AP3 AP4 AP5 AP6 AP7 AP8 AP9 AP10 AP11 AP12 AP13 AP14 AP15 AP16 AP17 AP18 AP19 AP20 AP21 AP22 AP24 AP25 AP26 AP27 AP28 AUS1 AUS11 AUS2S AUS27 AUS26S AUS25 AUS24S AUS23 AUS22S AUS3 AUS43 AUS4S AUS5 AUS6S AUS7 AUS9 BRA1 BRA2S BRA4S CAN17S CAN1 CAN11S CAN2S CAN4S CAN6S CAN8S CS1 CS2 CS4 CS5 CS6 CS7 CS8 CS9 CS10 CS109 CS108 CS107 CS106 CS105 CS102 CS101 CS100 CS115 CS119 CS110 CS117 CS114 CS113 CS112 CS111 CS11 CS116 CS123 CS122 CS121 CS126 CS127 CS129 CS128 CS125 CS124 CS137 CS138 CS133 CS132 CS14 CS148 CS142 CS159 CS152 CS151 CS15 CS162 CS16 CS169 CS165 CS160 CS173 CS17 CS174 CS18 CS189 CS194 CS192 CS193 CS190 CS191 CS199 CS197 CS19 CS198 CS196 CS195 CS20 CS209 CS200 CS202 CS201 CS203 CS21 CS219 CS218 CS217 CS216 CS215 CS214 CS213 CS212 CS211 CS210 CS22 CS220 CS23 CS234 CS24 CS25 CS26 CS27 CS28 CS29 CS31 CS32 CS33 CS34 CS35 CS36 CS37 CS40 CS41 CS42 CS43 CS44 CS45 CS46 CS47 CS49 CS50 CS53 CS57 CS58 CS59 CS60 CS61 CS62 CS63 CS64 CS65 CS66 CS67 CS68 CS69 CS72 CS73 CS74 CS75 CS76 CS77 CS78 CS79 CS80 CS81 CS84 CS86 CS87 CS88 CS89 CS90 CS91 CS92 CS94 CS95 CS96 CS97 CS98 CS999 CS99 DEU1 DEU2S DEU4S EU16 EU17 EU18 EU19 EU25 EU26 EU27 EU28 EU29 EU30 EU31 EU32 EU33 EU34 EU35 EU36 EU37 EU38 EU39 EU40 EU41 EU42 EU43 EU44 EU45 EU46 EU47 EU48 FRA1 FRA2S FRA4S IND1 IND16S IND15 IND11 IND18S IND13 IND2S IND23 IND3S IND4S IND5 IND6S IND7 IND9 JPN1 JPN2S JPN3 JPN4S NA104 NA107 NA109 NA100 NA101 NA103 NA102 NA105 NA119 NA116 NA110 NA118 NA112 NA111 NA115 NA114 NA113 NA117 NA125 NA124 NA122 NA120 NA126 NA127 NA123 NA129 NA121 NA128 NA138 NA134 NA133 NA136 NA135 NA132 NA131 NA130 NA137 NA139 NA140 NA142 NA141 NA149 NA146 NA147 NA148 NA154 NA158 NA159 NA153 NA151 NA155 NA152 NA150 NA156 NA161 NA163 NA167 NA160 NA166 NA165 NA169 NA164 NA168 NA162 NA172 NA170 NA174 NA171 NA173 NA196 NA202 NA204 NA21 NA218 NA214 NA217 NA215 NA64 NA65 NA66 NA68 NA69 NA70 NA71 NA72 NA73 NA74 NA75 NA76 NA77 NA80 NA81 NA82 NA83 NA84 NA85 NA86 NA87 NA88 NA89 NA90 NA91 NA92 NA93 NA94 NA95 NA96 NA97 NA98 NA99 UM1 UM2 UM3 UM4 UM5 UM6 UM7 UM8 UM9 USA1 USA2S USA3S USA4S

Any unreleased services, features, statuses, or dates referenced in this or other public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make their purchase decisions based upon features that are currently available.