Python script to download Salesforce Files (aka ContentDocument). It's using ProcessPoolExecutor to run downloads in parallell which makes the experience nicer if you have a large number of files.
In a very non scientific test 1614 files (3.23 GB) were downloaded in under 6 minutes.
My adjustments to the script that snorf made was to adjust the script specifically to query from Tasks, which is problematic because Tasks can't be the element of a "LinkedEntity IN ()" query. That's why this (admittedly hacky) solution constructs the Query in chunks of 200 records with a big concatenation of "Id1 OR Id2 OR Id3".
In a smoke test done for a customer we were able to download files from over 1000 tasks.
Download the script, satisfy requirements.txt and you're good to go!
simple-salesforce (https://github.com/simple-salesforce/simple-salesforce)
- Copy download.ini.template to download.ini and fill it out
- Launch the script
usage: download.py [-h] -q query [-o object] [-f filenamepattern]
Export ContentVersion (Files) from Salesforce
optional arguments:
-h, --help show this help message and exit
-q query, --query query
SOQL to limit the valid ContentDocumentIds. Must
return the Ids of parent objects.
-o object, --object object
How are the ContentDocument selected, via
'ContentDocumentLink' (default) or directly from
'ContentDocument'
-f filenamepattern, --filenamepattern filenamepattern
Specify the filename pattern for the output, available
values are:{0} = output_directory, {1} =
content_document_id, {2} title, {3} file_extension,
Default value is: {0}{1}-{2}.{3} which will be
/path/ContentDocumentId-Title.fileExtension
python download.py -q
"SELECT Id FROM Custom_Object__c WHERE Status__c = 'Approved'"
You can also select directly from the ContentDocument Table and then you give the WHERE clause for the ContentDocument Query
python download.py -o ContentDocument
-q "WHERE Title LIKE '1:%' AND FileExtension = 'json'
AND Description = 'Something to filter your ContentDocuments on'"
This was a small implementation for a customer that I decided to clean up and put on GitHub, I guess there are tons of bugs in here so please feel free to contact me if you find any of those.