dbs / SwiftBulkUploader

Script that reads through an entire directory, records paths of all files in a mysql database and uploads all the files. The advantage of recording the paths of all the files is it allows the uploading process to be cancelled and continued. This was necessary for directories with millions of files.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Swift Bulk Upload

These scripts assist in uploading an entire directory onto swift. They were intended for directories containing millions of files up to several terabytes large.

Requirements

  • Python 2.7+
  • python-mysqldb
  • Python Swiftclient
  • MySQL database
  • The following environment variables
  • OS_AUTH_URL
  • OS_USERNAME
  • OS_TENANT_NAME
  • MYSQL_HOST
  • MYSQL_USER
  • MYSQL_PASSWD
  • MYSQL_DB

Usage

Step 1. Index target directory with prepareupload.py

$ python prepareupload.py PathTodirectory MysqlTableName

This creates a table MysqlTableName and populates it with paths to all files in PathToDirectory. It outputs the following log files:

  • MysqlTableName.prepare.error.log # Will log any file path that failed when written to the database.
  • MysqlTableName.prepare.out # A real time log file as file paths are being parsed.

While the above command is running, in a new tab run the following command to watch the progress of the parsing:

$ tail -f MysqlTableName.prepare.out

Step 2. Upload files as stored in step 1.

$ python bulkupload.py containername MysqlTableName 3

This creates 3 processes that reads from MysqlTableName and uploads files into the container containername. If the upload process is stopped, it can be re-run and continue uploading without reuploading already uploaded files. Increase 3 to an appropriate number that your CPU can handle for faster speeds.

Output

This script outputs the following files:

  • MysqlTableName.upload.out # Real time progress of upload
  • MysqlTableName.error.log # Logs failed uploads
  • MysqlTableName.report.log # Created when upload is complete with summary of results.

To check the progress of the upload, run the following command:

$ tail -f MysqlTableName.upload.out

Optional Arguments

####path-cutoff

Example:

$ python bulkupload.py containername MysqlTableName 3 path-cutoff

When uploading a directory from your filesystem, the folder structure is maintained. But sometimes you may not need the entire path. Say you have files in /Users/John/Doe/assets. By using Doe as your path-cutoff, only the directory structure under assets will be maintained.

About

Script that reads through an entire directory, records paths of all files in a mysql database and uploads all the files. The advantage of recording the paths of all the files is it allows the uploading process to be cancelled and continued. This was necessary for directories with millions of files.


Languages

Language:Python 100.0%