Put generated output in its own folder(s)

Question

Put generated output in its own folder(s)

lenaschimmel opened this issue 2 years ago · comments

Using this tool on a large twitter that existed for some years account generates a lot of *-Tweet-Archive-*.md files, and now that DMs and group DMs are processed, also a lot of DMs-Archive-*.md files.

Also, some of the future features might need to save some additional files for internal re-use, which are not relevant to the end user.

I'd like to have a nicer organization, like this:

<archive_folder>/
- assets/
- data/
- media/
- parser-result/
  - tweets/
    - TweetArchive.html
    - *-Tweet-Archive-*.md
  - dms/
    - DMs-Archive-*.md
- parser-data/

What do you think?

Also, I'm not sure if a html file in archive_folder/parser-result/tweets/TweetArchive.html could reference images that are higher up in the hierarchy via relative paths?

Tim Hutton · Answer 1 · Tue Nov 22 2022 19:16:36 GMT+0800 (China Standard Time)

Agreed that it needs thinking about.

I prefer the references to not include relative paths that move up the hierarchy, since it makes moving things harder.

I'd quite like for our consumable output product to go in a separate folder. Maybe:

<archive_folder>
- Your archive.html
- data/
- assets/
- parser-output/
  - *.md *.html
  - media/
- parsing-info/
  - logs *.txt
  - cache files *.json *.txt

It still leaves *.md and *.html mixed up but at least it declutters the root folder of the archive.

Lena Schimmel · Answer 2 · Tue Nov 22 2022 20:16:23 GMT+0800 (China Standard Time)

So this would mean that we have to copy or move files from <archive_folder>/media to <archive_folder>/parser-output/media/, right? Or use symlinks on those OS that support it?

I'm really not sure if going up the hierarchy like <img src="../../media/34534634364576-SEgrgrdGertgSrd.jpg"> is a problem or not when opening a html file via file://-URL. I've found conflicting info about that, but it seems that I did not yet find the correct search terms for useful info.

Tim Hutton · Answer 3 · Wed Nov 23 2022 04:05:59 GMT+0800 (China Standard Time)

Running the script for the first time we would just copy media into parser-output/media/ instead of the current copy into /media. Unless people have already downloaded hires versions, then they would need to copy them manually to avoid needing to download them all again. Maybe that's a pain.

Going up the hierarchy is fine in html and markdown, as long as the file exists at the relative location. The problem comes when someone wants to move all the files into their blog for example. They would need to replicate the same folder structure so that the relative path would still work.

Lena Schimmel · Answer 4 · Thu Nov 24 2022 02:41:52 GMT+0800 (China Standard Time)

Ok, I had not even noticed that the media is already copied by the current code 😄

Then I'd suggest that the script creates the directories ./parser-output and ./parsing-info if needed, and then moves output from older versions there, if it exists. Then users who upgrade do not need to do or worry about anything.

(I'd like to start coding on this, because the > 400 files in the top level directory of my archive really slow me down and annoy me while implementing / testing other features.)

Tim Hutton · Answer 5 · Thu Nov 24 2022 02:55:15 GMT+0800 (China Standard Time)

Yes, good suggestion.

Please do go ahead and implement this. Thank you!