berggren / foorep

Forensics/Malware repository

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Delete sample after insert

dkovar opened this issue · comments

Good evening,

Have you considered modifying foorep to relocate all samples to its own filesystem? At the moment, it appears that it leaves the samples in place. Various other tools do this:

  1. Hash the sample
  2. Copy the sample to a filesystem dedicated to the tool, naming the sample based on the hash
  3. Do the database work, referencing the sample in the tool's filesystem.

Duplicates are detected at ingestion time and, if the sample has a name that is different than the existing sample, a record is created (or adjusted) to note the multiple sample names.

-David

foorep is storing the samples in GridFS, a filesystem within mongoDB. It leaves the samples in place at the moment to, after the import but I will add a option to the CLI to remove it after.

Greetings,

If you import 1TB of malware samples, does the database grow by 1TB? In other words, will GridFS scale well over time?

-David

Yes, if you import 1TB data in GridFS that will grow the database by 1TB. The way GridFS works is by splitting the files over several "documents" in it's internal structure. I think that it will scale pretty well as you can add more database servers and shard the data, but I need to test this in real world first.

Greetings,

I've got about 1.5TB of malware samples coming in this week. I'll get everything set up next week and will feed them all in and see what happens.

-David

Interesting! Please report any issues you get. I will also do a similar
test.

On 01/08/2013 02:56 PM, dkovar wrote:

Greetings,

I've got about 1.5TB of malware samples coming in this week. I'll get everything set up next week and will feed them all in and see what happens.

-David


Reply to this email directly or view it on GitHub:
#3 (comment)