Major
Detail
Major
Detail
TL-Sync detects duplicate messages and skips them automatically. Missing messages are not detected so far and can lead to complicated technical problems.
Improvement
TL-Sync should detect when a message is missing. It should then stop receiving, write an error to the log and the monitor view should indicate this.
Implementation
TL-Sync transmits the revision of the sending system in every message. However, it transmits only the relevant revisions. So these numbers are not gapless. But if additionally either the number of the last sent revision or the first not sent revision is transmitted, then the receiver has enough data to detect this situation.
Technical background
TL-Sync uses Kafka. And this actually guarantees that no messages are lost. Therefore this feature would actually be unnecessary. Unfortunately, however, it happens again and again with customers that Kafka is configured or administered incorrectly. This leads to data loss and then to technical problems, some of which are very difficult to fix, but at the same time the customer puts a lot of pressure on Kafka to fix them within the shortest possible time. Therefore, this feature is unfortunately useful after all.
Recognizing this situation
On the monitor page of the application there is a line for TL-Sync receiver with the following message:
Failed.
Cause: Processor class com.top_logic.kafka.knowledge.service.importer.KBDataProcessor failed.
Cause: LOG MARK: 'in-tl-sync-context' = 'true'.
Cause: Detected that messages are missing, when receiving changeset 12. The last processed changeset was 7. But the new message states that the last processed changeset should have been 9.
Correction of this situation
The sending application must retransmit the missing data. It can also just retransmit everything. TL-Sync will detect the already processed revisions and skip them.
Before the sending application sends its data again, all "too new" revisions must be removed from Kafka. Typically, any remaining data in Kafka can be deleted. Applications with "chunking" must also delete their stored chunks.
Code Migration
Background
- The TL-Sync message format changes with this migration.
- The updated receiver can also process old messages. But old receivers cannot process the new messages and log error messages.
- The sender can be configured to continue sending old type messages. However, if this is done, the missing message detection will not work either. It is then disabled.
- If messages of the new type are sent, all receiving applications must also be switched to it. Otherwise the reception will fail and log error messages.
Necessary migration
- Either all receiving applications must be updated to the new version at the same time.
- Or all sending, updated applications must be configured to send messages of the old type. See: ChangeSetSerializer.Config.getMessageVersion()
Test
Preparation
- Start ZooKeeper, Kafka and TL Kafka demo.
- Create an object so that a message is sent.
- This initializes the transmission. This makes the receiver remember which revision it received last.
Make sure that a changeset is lost
- Set a breakpoint in the consumer so that no more messages can be received. For example, at the beginning of: ConsumerDispatcher.poll(int, int)
- Create another object to send another message.
- Exit the application without letting the Consumer consume this message.
- Exit Kafka and ZooKeeper.
- Delete the ZooKeeper and Kafka data in the "tmp" folder.
- Where it is located depends on the operating system.
- Restart ZooKeeper, Kafka and TL Kafka Demo.
Check that the transfer is stopped
- Create another object to send another message.
- This object must not arrive. The monitor page must show that there is a problem for the TL-Sync recipient.
Restore transmission
- Stop the tl:KBDataProducerTask in the SchedulerGui.
- Stop TL Kafka Demo, Kafka and ZooKeeper.
- Delete the ZooKeeper and Kafka data in the "tmp" folder.
- Restart ZooKeeper, Kafka and TL Kafka Demo.
- In the database table TL_PROPERTIES delete the entries from the sender:
{{#!sql delete from TL_PROPERTIES where "propKey" like 'TLSync.lastSentRevisionAtDate%' }}}
- Keep the tl:KBDataProducerTask running in the SchedulerGui.
Check that the transfer is working again
- Check that the two missing objects arrived after 15 seconds at the latest.
- If the application was not terminated while the Kafka data was deleted, it may take up to 11 minutes. Because then the exponential backoff continues to run, which slows down the next attempt to receive, so as not to flood the log.