[GH-ISSUE #466] expectedNextCheckin miscalculated occasionally #2499

Open
opened 2026-03-20 21:07:13 +01:00 by sascha_hemi · 2 comments
Owner

Originally created by @skiphansen on GitHub (May 22, 2025).
Original GitHub issue: https://github.com/OpenEPaperLink/OpenEPaperLink/issues/466

While investigating #459 it was discovered that expectedNextCheckin is occasionally set incorrectly
which causes expected periodic updates to be skipped.

This results in the AP returning a AvailDataInfo packet with nextCheckIn == 0 until the next update is generated.

When a tag receives an zero for nextCheckIn it calculates it the sleep time based on how many retries on average it takes for the tag to get a response from the AP. If the connection is good this time is 40 seconds.
Polling the AP every 40 seconds for new data needlessly is believed to be the cause of #459.

Debug logging added to the AP code showed:

6:38:52.469: ADR>processDataReq#569: 4467090000013753 expectedNextCheckin set to 06:39:52
6:39:23.245: contentRunner: Set expectedNextCheckin for 4467090000013753 to 07:08:23
7:09:35.755: contentRunner: Set expectedNextCheckin for 4467090000013753 to 07:38:35
7:39:16.477: AD>processDataReq#569: 4467090000013753 expectedNextCheckin set to 07:40:16
7:39:47.656: contentRunner: Set expectedNextCheckin for 4467090000013753 to 08:08:47
8:09:28.299: ADR>processDataReq#569: 4467090000013753 expectedNextCheckin set to 08:10:28
8:09:59.777: contentRunner: Set expectedNextCheckin for 4467090000013753 to 08:37:59
8:38:40.180: ADR>processDataReq#578: 4467090000013753 expectedNextCheckin set to 09:06:40
8:38:50.707: contentRunner: Not updating 4467090000013753, expectedNextCheckin 09:06:40
8:38:51.714: contentRunner: Not updating 4467090000013753, expectedNextCheckin 09:06:40
8:38:50.707: contentRunner: Not updating 4467090000013753, expectedNextCheckin 09:06:40
8:38:51.714: contentRunner: Not updating 4467090000013753, expectedNextCheckin 09:06:40
...
9:01:40.123: contentRunner: Not updating 4467090000013753, expectedNextCheckin 09:06:40
9:01:41.967: drawNew: Updating 4467090000013753

So expectedNextCheckin gets updated in two places always by contentRunner() and sometimes by processDataReq(). The update in processDataReq() sometimes advances expectedNextCheckin past nextupdate which causes the problem.

Full AP log: 5_21_25_ap.log
Full tag log: 5_21_25_tag.log

Originally created by @skiphansen on GitHub (May 22, 2025). Original GitHub issue: https://github.com/OpenEPaperLink/OpenEPaperLink/issues/466 While investigating #459 it was discovered that expectedNextCheckin is occasionally set incorrectly which causes expected periodic updates to be skipped. This results in the AP returning a AvailDataInfo packet with nextCheckIn == 0 until the next update is generated. When a tag receives an zero for nextCheckIn it calculates it the sleep time based on how many retries on average it takes for the tag to get a response from the AP. If the connection is good this time is 40 seconds. Polling the AP every 40 seconds for new data needlessly is believed to be the cause of #459. Debug logging added to the [AP code](https://github.com/skiphansen/OpenEPaperLink/tree/debug_issue_459) showed: ``` 6:38:52.469: ADR>processDataReq#569: 4467090000013753 expectedNextCheckin set to 06:39:52 6:39:23.245: contentRunner: Set expectedNextCheckin for 4467090000013753 to 07:08:23 7:09:35.755: contentRunner: Set expectedNextCheckin for 4467090000013753 to 07:38:35 7:39:16.477: AD>processDataReq#569: 4467090000013753 expectedNextCheckin set to 07:40:16 7:39:47.656: contentRunner: Set expectedNextCheckin for 4467090000013753 to 08:08:47 8:09:28.299: ADR>processDataReq#569: 4467090000013753 expectedNextCheckin set to 08:10:28 8:09:59.777: contentRunner: Set expectedNextCheckin for 4467090000013753 to 08:37:59 8:38:40.180: ADR>processDataReq#578: 4467090000013753 expectedNextCheckin set to 09:06:40 8:38:50.707: contentRunner: Not updating 4467090000013753, expectedNextCheckin 09:06:40 8:38:51.714: contentRunner: Not updating 4467090000013753, expectedNextCheckin 09:06:40 8:38:50.707: contentRunner: Not updating 4467090000013753, expectedNextCheckin 09:06:40 8:38:51.714: contentRunner: Not updating 4467090000013753, expectedNextCheckin 09:06:40 ... 9:01:40.123: contentRunner: Not updating 4467090000013753, expectedNextCheckin 09:06:40 9:01:41.967: drawNew: Updating 4467090000013753 ``` So expectedNextCheckin gets updated in two places always by contentRunner() and sometimes by processDataReq(). The update in processDataReq() sometimes advances expectedNextCheckin past nextupdate which causes the problem. Full AP log: [5_21_25_ap.log](https://github.com/user-attachments/files/20393866/5_21_25_ap.log) Full tag log: [5_21_25_tag.log](https://github.com/user-attachments/files/20393868/5_21_25_tag.log)
sascha_hemi added the bug label 2026-03-20 21:07:13 +01:00
Author
Owner

@nlimper commented on GitHub (May 28, 2025):

I tried to look in to it in more detail. I see the problem in your logs, but I cannot reproduce it myself. But anyway I have some clues how to fix.

First some explanation about expectedNextCheckin:
expectedNextCheckin gets updated by contentRunner() when a prepareIdleReq is prepared. That update is just a 'temporary' estimate, because the sleep time of the tag only really starts once it is sent to the tag. So, pendingIdle is set, and as soon as a tag checks in (processDataReq() is called), A more accurate expectedNextCheckin is set.
(the root cause of this complicated stuff is that the esp32-s3 always have to think one step ahead: the esp32-c6 needs to know the next sleep time before the tag checks in, as once a tag checks in, it needs to be answered immediately, with no time for the esp32-s3 to react. So, the request is prepared about 30 seconds before the tag is expected to check in).

But maybe the real problem could be a rounding error or a slight mis-timing: at 8:38:40.180, just 10 seconds before the new image is generated, the tag is set to sleep for 30 minutes. Which is a bit weird. The sleep is calculated by taking into account taginfo->nextupdate which is the time a next update is expected. The nextupdate time is set when a new image is generated, but I now see, on most content types this is done just before the actual rendering of the image. For example, for the weather content:

            drawWeather(filename, cfgobj, taginfo, imageParams);
            taginfo->nextupdate = now + interval;
            updateTagImage(filename, mac, interval / 60, taginfo, imageParams);

So the image is drawn, taginfo->nextupdate is set, and only then, the buffer is created and queued, which takes a few seconds. That timespan might be the problem. Simply updating taginfo->nextupdate after the call to updateTagImage might solve it.

<!-- gh-comment-id:2916018928 --> @nlimper commented on GitHub (May 28, 2025): I tried to look in to it in more detail. I see the problem in your logs, but I cannot reproduce it myself. But anyway I have some clues how to fix. First some explanation about expectedNextCheckin: expectedNextCheckin gets updated by `contentRunner()` when a prepareIdleReq is prepared. That update is just a 'temporary' estimate, because the sleep time of the tag only really starts once it is sent to the tag. So, pendingIdle is set, and as soon as a tag checks in (`processDataReq()` is called), A more accurate expectedNextCheckin is set. (the root cause of this complicated stuff is that the esp32-s3 always have to think one step ahead: the esp32-c6 needs to know the next sleep time *before* the tag checks in, as once a tag checks in, it needs to be answered immediately, with no time for the esp32-s3 to react. So, the request is prepared about 30 seconds before the tag is expected to check in). But maybe the real problem could be a rounding error or a slight mis-timing: at 8:38:40.180, just 10 seconds before the new image is generated, the tag is set to sleep for 30 minutes. Which is a bit weird. The sleep is calculated by taking into account `taginfo->nextupdate` which is the time a next update is expected. The nextupdate time is set when a new image is generated, but I now see, on most content types this is done just before the actual rendering of the image. For example, for the weather content: ``` drawWeather(filename, cfgobj, taginfo, imageParams); taginfo->nextupdate = now + interval; updateTagImage(filename, mac, interval / 60, taginfo, imageParams); ``` So the image is drawn, taginfo->nextupdate is set, and only then, the buffer is created and queued, which takes a few seconds. That timespan might be the problem. Simply updating taginfo->nextupdate *after* the call to updateTagImage might solve it.
Author
Owner

@skiphansen commented on GitHub (May 28, 2025):

Thanks Nic. I'll do some more testing. It seems to be reproduceable here, but it only happens once or twice a day so it's not exactly easy to recreate.

I've also implemented #467, but haven't tested it yet. The scheduling changes make a lot of sense during normal operation, but they make testing more difficult / time consuming.

<!-- gh-comment-id:2916345105 --> @skiphansen commented on GitHub (May 28, 2025): Thanks Nic. I'll do some more testing. It seems to be reproduceable here, but it only happens once or twice a day so it's not exactly easy to recreate. I've also implemented #467, but haven't tested it yet. The scheduling changes make a lot of sense during normal operation, but they make testing more difficult / time consuming.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/OpenEPaperLink#2499