AppleEpf LOOKING FOR MAINTAINER.
Installation
gem 'apple_epf'
Setup
Put this in your initializer.rb if you are using Rails.
AppleEpf.configure do |config|
config.apple_id = 'username'
config.apple_password = 'password'
config.download_retry_count = 3 #
config.keep_tbz_after_extract = false
config.extract_dir = '' # where to extract to
config.files_matrix = {} # {popularity: ['application_popularity_per_genre']}
#config.files_matrix = {itunes: [], pricing: [], popularity: []}
config.download_processor = AppleEpf::AriaDownloadProcessor
config.concurrent_downloads = 16
config.log_file = "#{Rails.root}/log/apple_epf_#{Rails.env}.log"
#config.archiver = :gnutar
#config.archiver_path = "/usr/local/bin/tar"
end
All of this can be redefined for every downloader.
Manual manipulations
# Manually download one file
downloader = AppleEpf::Downloader.new('incremental', 'popularity', Date.parse('17-01-2013'))
downloader.download #=> will return local filepath to downloaded file or fire exception
# Manually extract one archive
extractor = AppleEpf::Extractor.new(filename, files_to_extract)
# filename - full path to local file
# files_to_extract - Files to be extracted from Archive (application, application_detail)
file_entry = extractor.perform #=> will return instance of FileEntry
file_entry.tbz_file #=> original file that was parsed. It is removed after untaring
file_entry.extracted_files #=> newly created(unpacked) files
#Manually parse file
parser = AppleEpf::Parser.new(filename)
# filename - full local path to file
parser.process_rows do |r|
puts "row is #{r}"
end
Download and Extract
If you want to combine downloading and extracting your can use one of following methonds. My personal feeling is to parsing should we something live alone and should not be combined in one stack with download and extract. And of cource it is better to download and extract files one by one.
manager = AppleEpf::Incremental.new('10-10-2012',
{ popularity: ['application_popularity_per_genre'] })
manager = AppleEpf::Full.new('10-10-2012',
{ popularity: ['application_popularity_per_genre'] })
manager.download_all_files
# will download all files for this date
# for all keys "popularity", 'pricing', 'itunes' etc
manager.download_and_extract_all_files
#will first download and than extract all files
manager.download_and_extract('itunes', ['application', 'application_detail'])
# will download only 'itunes' and extract only ['application', 'application_detail'].
# This actually ignores matrix passed to initializer
manager.download('itunes') #will only download file
You can omit where to store files by setting it directly to downloader instance
manager.store_dir = '/whatever_dir_you_like'
manager.download('itunes')
OR
downloader = AppleEpf::Downloader.new('incremental', 'popularity', Date.parse('17-01-2013'))
downloader.dirpath = '/whatever_dir_you_like'
downloader.download
You can also omit if you want to store initial tbz files after they will be unpacked
extractor.keep_tbz_after_extract = true
OR
manager.keep_tbz_after_extract = true
Get list of current files avaliable for download
AppleEpf::Incremental.get_current_list #=> current incremental files
AppleEpf::Full.get_current_list #=> current full files
Make sure to try AriaDownloadProcessor
There are 2 downloaders avaliable for use:
CurbDownloadProcessor
- default oneAriaDownloadProcessor
- we use in production
I suggest using last one, as in can do download in parallel. I sugest set
config.concurrent_downloads = 16 or 8
. If you chose to use aria, make sure
you have aria2c
in your PATH
.
And of cource you write your own processor.
Make sure to try lbzip2
lbzip2 uncompress files in parallel. Thanks @funkyboy for his help.
Just set config.use_lbzip2 = true
and make sure you have it in PATH
. On mac you can use brew install lbzip2
for it.