dlbeer / dhara

NAND flash translation layer for low-memory systems

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Map Sync Time

gshera opened this issue · comments

We are implementing a time-sensitive logging application where we store periodic sensor data to multiple files. Our implementation connects FatFSChan 0.14b with Dhara. Our Flash is a 4Gb NAND with 4k page size. Generally the system performs well and is robust. Each 1 minute, we perform an f_sync to minimize risk of data loss. We are storing data to at least 6 files, and the files say open for the duration of the logging session. After a period of time (2 hours for example), we would occasionally experience an f_sync execution duration greater than 1 second. Our system does not have sufficient memory to buffer sensor data resulting an loss of data from a sensor data buffer overflow.

Our analysis of the problem shows that garbage collection during dhara_map_sync in the while (!dhara_journal_is_clean(&m->journal) loop is the primary cause of the system delay. Our mitigation is to exit the while loop if the execution time exceeds our defined performance tolerance. We only enable the restriction on execution time when logging.

Is there unforeseen risk to the integrity of the journal with this approach?

//returns 0=success, -1=fail
int dhara_map_sync(struct dhara_map *m, dhara_error_t *err)
{
#if LIMITED_SYNC_EXECUTION_TIME
    TickType_t start_time = xTaskGetTickCount();
#endif
	while (!dhara_journal_is_clean(&m->journal)) {
		
#if LIMITED_SYNC_EXECUTION_TIME
	    // Exit early if we are spending too much time here.
		if (system_is_logging()){
			if ((xTaskGetTickCount() - start_time) > MAX_MAP_SYNC_MSEC) {
	
				*err = MAP_SYNC_TIMEOUT;
				return -1;
			}
		}
#endif
		
		
		dhara_page_t p = dhara_journal_peek(&m->journal);
		dhara_error_t my_err;
		int ret;

		if (p == DHARA_PAGE_NONE) {
			ret = pad_queue(m, &my_err);
		} else {
			ret = raw_gc(m, p, &my_err);
			if (!ret)
				dhara_journal_dequeue(&m->journal);
		}

		if ((ret < 0) && (try_recover(m, my_err, err) < 0))
			return -1;
	}

	return 0;
}

There won't be any risk to journal integrity -- it just means that your sync() will occasionally be a no-op. If this is acceptable, do you even need to sync? Every 2**log2_ppc sector writes from FatFS will result in an implicit sync anyway.

Thankyou for confirming! With FatFS, we have found that a f_write will write the data to flash, but it will not update the file size, so if a file is not specifically closed or f_sync command is used and we have a system crash, the file will exist, but will be size zero and the data is not recoverable. http://elm-chan.org/fsw/ff/doc/sync.html

Sorry, that was probably ambiguous of me. What I meant to suggest was continuing to call f_sync as you are, but dropping the call to dhara_map_sync. This way the FatFS metadata will continue to be written out at regular intervals, and you will persist at regular intervals too due to the implicit sync that occurs every 15 sector writes (with a 2kB page size).

The downside is that not every call to f_sync will guarantee to persist, but that's already the case if you are sometimes abandoning dhara_map_sync.

Dropping the call to dhara_map_sync is an interesting idea. We'll try what we have for now, and review your suggestion in a future iteration. I sincerely appreciate your input and responsiveness on this issue.