vbrandl / hoc

Generate Hits-of-Code badges for GitHub repositories

Home Page:https://hitsofcode.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Counting

Masynchin opened this issue · comments

Hello, @vbrandl!

I am struggling to understand what count_repositories function do:

hoc/src/count.rs

Lines 4 to 27 in ecbc63f

#[instrument]
pub(crate) fn count_repositories<P>(repo_path: P) -> Result<usize>
where
P: AsRef<Path> + std::fmt::Debug,
{
trace!("Counting repositories");
std::fs::create_dir_all(&repo_path)?;
Ok(read_dir(repo_path)?
.filter_map(StdResult::ok)
.filter(|entry| entry.file_type().map(|ft| ft.is_dir()).unwrap_or(false))
.map(|entry| read_dir(entry.path()))
.filter_map(StdResult::ok)
.flat_map(|dir| {
dir.filter_map(StdResult::ok)
.filter(|entry| entry.file_type().map(|ft| ft.is_dir()).unwrap_or(false))
})
.map(|entry| read_dir(entry.path()))
.filter_map(StdResult::ok)
.flat_map(|dir| {
dir.filter_map(StdResult::ok)
.filter(|entry| entry.file_type().map(|ft| ft.is_dir()).unwrap_or(false))
})
.count())
}

There are two issues:

  1. What repositories this function counts? What is repository, if it is a directory path?
  2. What is exact algorithm? I can see that function counts 5-level nested directories, but why? Why not 4 or 6-level nested? This number comes from some limitation?

I may want to refactor this function, if you would clarify these issues. Thanks!

That function is used for the "currently serving X repositories" stat in the footer. The total amount of repos that were queried using the service

The total amount of repos that were queried using the service.

In what format are they saved? I don't understand why count_repositories counts nested directories like that.

The on disk layout for served repos is <service>/<user>/<repo> so to get the amount of repos, I just have to count everything in */*/* to get the count

Disk layout for served repos is <service>/<user>/<repo>

Thanks for explanation. So, I think this code can be simplified using WalkDir:

trace!("Counting repositories"); 
std::fs::create_dir_all(&repo_path)?; 

WalkDir::new(repo_path)
    .min_depth(3)
    .max_depth(3)
    .into_iter()
    .filter(|entry| entry.file_type().is_dir())
    .count()

What do you think? If you are agree and this code works as I am expecting I can make a PR.

While WalkDir does look nice, I'm not sure, pulling in another dependency just for counting directories is worth doing.
I agree, the code could be tidied up, e.g. by deduplicating the .filter(|entry| entry.file_type().map(|ft| ft.is_dir()).unwrap_or(false)) calls.
I have to look into WalkDir once I'm home again and give you a heads up.

I'm not sure, pulling in another dependency just for counting directories is worth doing.

The code could be tidied up, e.g. by deduplicating the .filter(|entry| entry.file_type().map(|ft| ft.is_dir()).unwrap_or(false)) calls.

I came up with this pseudo solution:

iter::once(repo_path)
    .flat_map(sub_directories)
    .flat_map(sub_directories)
    .flat_map(sub_directories)
    .count()

This eliminates the need of adding another dependency. I will try not to bother you again at your holidays. Cheers!

Your approach definitely looks better than the current implementation. I'd be open to review and merge a PR, if you are willing to give it a shot, or will implement it myself.

I'd be open to review and merge a PR, if you are willing to give it a shot

Thanks, I am already have tested implementation, but also I will need your help. I am about to open a PR.