Put generated output in its own folder(s)
lenaschimmel opened this issue · comments
Using this tool on a large twitter that existed for some years account generates a lot of *-Tweet-Archive-*.md
files, and now that DMs and group DMs are processed, also a lot of DMs-Archive-*.md
files.
Also, some of the future features might need to save some additional files for internal re-use, which are not relevant to the end user.
I'd like to have a nicer organization, like this:
- <archive_folder>/
- assets/
- data/
- media/
- parser-result/
- tweets/
- TweetArchive.html
*
-Tweet-Archive-*
.md
- dms/
- DMs-Archive-
*
.md
- DMs-Archive-
- tweets/
- parser-data/
What do you think?
Also, I'm not sure if a html file in archive_folder/parser-result/tweets/TweetArchive.html
could reference images that are higher up in the hierarchy via relative paths?
Agreed that it needs thinking about.
I prefer the references to not include relative paths that move up the hierarchy, since it makes moving things harder.
I'd quite like for our consumable output product to go in a separate folder. Maybe:
- <archive_folder>
- Your archive.html
- data/
- assets/
- parser-output/
- *.md *.html
- media/
- parsing-info/
- logs *.txt
- cache files *.json *.txt
It still leaves *.md and *.html mixed up but at least it declutters the root folder of the archive.
So this would mean that we have to copy or move files from <archive_folder>/media
to <archive_folder>/parser-output/media/
, right? Or use symlinks on those OS that support it?
I'm really not sure if going up the hierarchy like <img src="../../media/34534634364576-SEgrgrdGertgSrd.jpg">
is a problem or not when opening a html file via file://
-URL. I've found conflicting info about that, but it seems that I did not yet find the correct search terms for useful info.
Running the script for the first time we would just copy media into parser-output/media/
instead of the current copy into /media
. Unless people have already downloaded hires versions, then they would need to copy them manually to avoid needing to download them all again. Maybe that's a pain.
Going up the hierarchy is fine in html and markdown, as long as the file exists at the relative location. The problem comes when someone wants to move all the files into their blog for example. They would need to replicate the same folder structure so that the relative path would still work.
Ok, I had not even noticed that the media is already copied by the current code 😄
Then I'd suggest that the script creates the directories ./parser-output
and ./parsing-info
if needed, and then moves output from older versions there, if it exists. Then users who upgrade do not need to do or worry about anything.
(I'd like to start coding on this, because the > 400 files in the top level directory of my archive really slow me down and annoy me while implementing / testing other features.)
Yes, good suggestion.
Please do go ahead and implement this. Thank you!