CCA-Public / diskimageprocessor

Tool for automated processing of disk images in BitCurator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add tsk_recover offset option

zvowell opened this issue · comments

Would it be possible to provide diskimageprocessor the option to take in a value for the tsk_recover sector offset argument? Similar to the way brunnhilde can work as documented here: tw4l/brunnhilde#20

For disks described in that brunnhilde issue, diskimageprocessor.py (and the GUI) currently is not able to carve out files and, consequently, produce any of the brunnhilde reports.

Hi Zach! Thanks for opening the ticket. I can certainly do this in the diskimageprocessor.py and diskimageanalyzer.py scripts -- add some flags and then just specify that if you want to use them for disks where you need to pass in specific arguments to tsk_recover, to use the script on only one disk image.

Not sure how to add this to the GUI though, given that the assumption is that the source being processed could have any number of disk images in it. If you have ideas, I'm all ears!

Ah, good point, I had not taken the multiple disk capabilities of diskimageprocessor (scripts or GUI) into account. If you could add it to the scripts, it would be a big help in my workflow (which I'd be happy to share, btw) at least. It will allow me to have uniform structures for the SIPs. And if it's in the scripts only, perhaps it will be less liable to cause confusion? I agree a GUI option seems to be more trouble than it's worth, but maybe as we process an elegant solution will emerge (we can always hope!). Thanks!

Okay, sounds good! Maybe it would help if you shared a bit about your workflow? (here or through email) Then it might be a bit easier to understand the use case I'm building for. Thanks!

Hi Zach - getting around to following up on this.

I was thinking - in order to keep the multi-disk functionality, what if I added an option to pass along a CSV file with the disk image input that would specify filenames and tsk_recover options? If you had one disk, you would just make a one-line CSV with the relevant info (fs type, img type, sector offset) in consistent order as appropriate. If you had multiple disk images, you could pass along whatever information you have for them or leave columns blank to default to auto.

Does that seem like too much overhead for your use case?

Hi Tim,
Here's the basics of my workflow now: 1) acquire the image with Guymager, 2) take a photograph of the disk, 3) run Disk Image Processor in Processing mode (python script or GUI, either way would be okay) against the image, the Guymager .info file, and the photograph file, and 4) transfer the bagged SIPs to staging for preservation storage. My workaround without a sector offset argument in Disk Image Processor would be to add a step 3b where I identify the SIPs that have no files in the /data/objects/files/ directory, and then move them to a new location where I can run brunnhilde on them separately.

I like the CSV idea, especially the idea of blank cells representing an auto value. And I think it would be a good replacement for the 3b step. I'm not aware of a way to avoid a 2nd pass with tsk_recover (whether it's Disk Image Processor or just brunnhilde); that is, I can't see how to anticipate that a disk needs the sector offset for tsk_recover to work. Is it just FAT raw images? (that's not so much a question for you, just wondering out loud). So, no, I don't think it would be too much overhead. Thank you!

Hi Zach,

So I started working the direction we discussed, and then was having a hard time fitting in the functionality with the existing scope/purpose of the diskimageprocessor script. Eventually I decided to make a new script, "process_with_tsk_options.py", which you see in this dev branch. The script needs to be in the same directory as the other diskimageprocessor files after installation to work, but I have updated the install.sh script in that branch accordingly.

Unfortunately, this would mean you'd still have a bit of a workaround to do in that 3b step, but at least from there you can easily get a standard output for those disk images that require extra work, after you've done some research and know what the sector offset, fstype, and so on are.

Would you mind cloning or downloading the files from the dev branch, installing on your end, and trying it out? Looking forward to any feedback you might have (even if it's that you think I went in the wrong direction here!)

(also, I haven't yet updated the readme in that branch, but if you call the script with -h, I think the inputs should be pretty self-explanatory -- I've tried to make it as close as possible to the existing diskimageprocessor options)

Hey Tim,
Just went through the workflow using process_with_tsk_options.py for step 3b, and it worked like a charm. The thing I like about it is that it retains the structure of the bags that are created with normal diskimageprocessor.py (or GUI), which wasn't happening when I was having to resort to brunnhilde for step 3b. Every image (so far!) having a consistent structure is nice.

Maybe one other thing to note about the workflow --just to have documented in this issue, not really asking you to modify the script to address this-- is that I have to run mmls on the disk image to determine the offset value, before running process_with_tsk_options.py.

I've only run through it once, but will keep trying in the next few weeks and will let you know if anything comes up. Thanks!

Hi Zach - I'm so glad to hear it's working! Yes, I anticipated that you'd have to run mmls first (that's exactly what I did when testing here). There might be a way to automate that, but I think that's a project for another day. I'm going to go ahead and merge this branch into master and consider it the 0.7.0 release. I'm closing the ticket for now but feel free to re-open if you have any feedback frmo testing!