`.download()` : compare local and remote files by timestamps before downloading
hugolpz opened this issue · comments
Timestamp
Timestamp property could be used to compare with existing local file's timestamp.
If API timestamp property is smaller (older) than local file timestamp, then skip download.
The imageinfo's "timestamp": "2021-04-25T15:49:00Z"
indeed matches file description page indicating :
Date/Time | Thumbnail | Dimensions | User | Comment | |
---|---|---|---|---|---|
current | 15:49, 25 April 2021 | 1.1 s (99 KB) | Kitel WP (talk | contribs) |
After verification, files with several uploads provide by default the timestamp of the last upload (default : 1 revision, the latest).
Q4-related (✅ #51)
When a file already exists locally, it could be skipped faster.
Given a
the time per download, x
the number of files to download, b
the initial categorymember query time with estimated b=60sec
. We could get the second attempt (update) duration to be such as 2.7*14+60 = 97.8secs
instead of 540 sec.
Q5-related (✅ #51)
@Poslovitch pointed out that filenames are not enough, some versioning check may be ongoing so recently updated files on commons are indeed re-downloaded. (Discord server invitation)
Ciencia-Al-Poder pointed out "First of all, you should avoid redownload files that you downloaded on a previous run. The api will return you the file modification/creation time. Use it to check if the file has been updated." (Discord link)
@kanasimi posted:
Hi. Maybe you will be interesting in
const file_data_list = await wiki.download('Category:name', { directory: './' });
The conceptions are implemented now, I am using generator now. The codes will do as less calls as possible. You may try it yourself.
You may try it yourself and find the codes at https://github.com/kanasimi/CeJS/blob/master/application/net/wiki/page.js#L3893
.download()
benchmark
Before
- Initial attempt :
- categorymembers=369
- downloads=369
- runing time: 16min or 960sec --> 2.7s./file
- Removed 14 files from local directory
- Update attempt:
- categorymembers=369
- downloads=14
- runing time: 540sec --> 38.6s./file
- downloads: est 2.7s/ ⬇️ (same)
- comparisons: est. 1.3s/comparison
After
- Initial attempt :
- categorymembers=829
- downloads=829
- runing time: 760s --> 0.9s/ ⬇️
- Removed 20 files from local directory
- Update attempt:
- categorymembers=829
- downloads=20
- runing time: ~20s
- downloads: est 0.9s/ ⬇️ (same)
- comparisons: est. <0.01s/comparison (200+ faster!)
Conclusion
@kanasimi :
Local comparisons without api.php calls are 200+ times faster ! 🙀 😮 👩🏼🚀 🚀
UPDATE IS DAMN FASTER INDEED !!! 🎆
Well, I think this issue is solved.
200 times faster 😻🙉🤤