Map Sync Time

Question

Map Sync Time

gshera opened this issue 2 years ago · comments

We are implementing a time-sensitive logging application where we store periodic sensor data to multiple files. Our implementation connects FatFSChan 0.14b with Dhara. Our Flash is a 4Gb NAND with 4k page size. Generally the system performs well and is robust. Each 1 minute, we perform an f_sync to minimize risk of data loss. We are storing data to at least 6 files, and the files say open for the duration of the logging session. After a period of time (2 hours for example), we would occasionally experience an f_sync execution duration greater than 1 second. Our system does not have sufficient memory to buffer sensor data resulting an loss of data from a sensor data buffer overflow.

Our analysis of the problem shows that garbage collection during dhara_map_sync in the while (!dhara_journal_is_clean(&m->journal) loop is the primary cause of the system delay. Our mitigation is to exit the while loop if the execution time exceeds our defined performance tolerance. We only enable the restriction on execution time when logging.

Is there unforeseen risk to the integrity of the journal with this approach?

//returns 0=success, -1=fail
int dhara_map_sync(struct dhara_map *m, dhara_error_t *err)
{
#if LIMITED_SYNC_EXECUTION_TIME
    TickType_t start_time = xTaskGetTickCount();
#endif
	while (!dhara_journal_is_clean(&m->journal)) {
		
#if LIMITED_SYNC_EXECUTION_TIME
	    // Exit early if we are spending too much time here.
		if (system_is_logging()){
			if ((xTaskGetTickCount() - start_time) > MAX_MAP_SYNC_MSEC) {
	
				*err = MAP_SYNC_TIMEOUT;
				return -1;
			}
		}
#endif
		
		
		dhara_page_t p = dhara_journal_peek(&m->journal);
		dhara_error_t my_err;
		int ret;

		if (p == DHARA_PAGE_NONE) {
			ret = pad_queue(m, &my_err);
		} else {
			ret = raw_gc(m, p, &my_err);
			if (!ret)
				dhara_journal_dequeue(&m->journal);
		}

		if ((ret < 0) && (try_recover(m, my_err, err) < 0))
			return -1;
	}

	return 0;
}

int32cn · Answer 1 · Fri Jun 03 2022 04:02:47 GMT+0800 (China Standard Time)

你好，你的邮件已收到，谢谢！！此为自动回复。

Daniel Beer · Answer 2 · Fri Jun 03 2022 05:19:58 GMT+0800 (China Standard Time)

There won't be any risk to journal integrity -- it just means that your sync() will occasionally be a no-op. If this is acceptable, do you even need to sync? Every 2**log2_ppc sector writes from FatFS will result in an implicit sync anyway.

Geoff Shera · Answer 3 · Fri Jun 03 2022 07:05:24 GMT+0800 (China Standard Time)

Thankyou for confirming! With FatFS, we have found that a f_write will write the data to flash, but it will not update the file size, so if a file is not specifically closed or f_sync command is used and we have a system crash, the file will exist, but will be size zero and the data is not recoverable. http://elm-chan.org/fsw/ff/doc/sync.html

Daniel Beer · Answer 4 · Fri Jun 03 2022 07:57:07 GMT+0800 (China Standard Time)

Sorry, that was probably ambiguous of me. What I meant to suggest was continuing to call f_sync as you are, but dropping the call to dhara_map_sync. This way the FatFS metadata will continue to be written out at regular intervals, and you will persist at regular intervals too due to the implicit sync that occurs every 15 sector writes (with a 2kB page size).

The downside is that not every call to f_sync will guarantee to persist, but that's already the case if you are sometimes abandoning dhara_map_sync.

Geoff Shera · Answer 5 · Sat Jun 04 2022 01:45:29 GMT+0800 (China Standard Time)

Dropping the call to dhara_map_sync is an interesting idea. We'll try what we have for now, and review your suggestion in a future iteration. I sincerely appreciate your input and responsiveness on this issue.

int32cn · Answer 6 · Tue Oct 11 2022 15:08:04 GMT+0800 (China Standard Time)

你好，你的邮件已收到，谢谢！！此为自动回复。