JohannesWiesner / demetrius

A repository for finding and copying files while preserving their folder structure

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Create separate function for duplicate check

JohannesWiesner opened this issue · comments

Code would be more readable if the duplicate checks would get their own function with their own doc-strings:

demetrius/demetrius.py

Lines 166 to 178 in 1d4c6d3

# find literal duplicates and modify the respective destination directories
for _,dir_name in dst_dirs_df.groupby('src_dir_name'):
if not dir_name['src_dir_path'].nunique() == 1:
for idx,(_,src_dir_path) in enumerate(dir_name.groupby('src_dir_path'),start=1):
dst_dirs_df.loc[src_dir_path.index,'dst_dir_path'] = dst_dirs_df.loc[src_dir_path.index,'dst_dir_path'] + '_' + str(idx)
# find pseudo duplicates and modify the respective destination directories
dst_dirs_df['dst_dir_path_lower_case'] = dst_dirs_df['dst_dir_path'].map(str.lower)
for _,dst_dir_path in dst_dirs_df.groupby('dst_dir_path_lower_case'):
if dst_dir_path['src_dir_path'].nunique() != 1:
for idx,(_,dir_name) in enumerate(dst_dir_path.groupby('src_dir_name'),start=1):
dst_dirs_df.loc[dir_name.index,'dst_dir_path'] = dst_dirs_df.loc[dir_name.index,'dst_dir_path'] + ' (' + str(idx) + ')'

Then it would be also more easy + readable to define how the index should look like (e.g. _idx or (idx))