zserge / glob-grep

A little experiment: compare the languages aimed to replace C

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Rust implementation could be more idiomatic

coriolinus opened this issue · comments

Running example on Rust Playground.

Note: the example above uses two external libraries. This adds some complexity in the event that you want to avoid cargo for some reason.

Summary of changes

Use byte literals instead of casting byte to char.

-            match pattern[p] as char {
-                '*' => {
+            match pattern[p] {
+                b'*' => {

Use an existing path walker instead of rolling our own. Yours works fine, but using an existing function is more in line with i.e. the go implementation.

-     for entry in fs::read_dir(dir)? {
-         let path = entry?.path();
-         if path.is_dir() {
-             walk(&path, &pattern)?;
-             continue;
-         }
+    for entry in WalkDir::new(".") {
+        let entry = entry?;
+        if entry.file_type().is_dir() {
+            continue;
+        }

Use standard convenience combinators:

-            let line = match line {
-                Ok(s) => s,
-                Err(_) => "".to_string(),
-            };
+            let line = line.unwrap_or_default();

Show an approximation of the path instead of an empty string if the path contains non-unicode characters:

-                println!("{}:{}\t{}", path.to_str().unwrap_or(""), lineno, line);
+                println!("{}:{}\t{}", path.display(), lineno, line);

Print any error message to stderr and set an appropriate error code instead of panicing, printing to stdout, etc.

-    match walk(Path::new("."), &argv[1]) {
-        Ok(()) => (),
-        Err(error) => panic!("Error: {:?}", error),
-    };
+    let exit_code = match main_inner() {
+        Ok(_) => 0,
+        Err(err) => {
+            eprintln!("{}", err);
+            1
+        }
+    };
+    std::process::exit(exit_code);

Run each unit test case independently.

-    fn test_glob() {
-        assert!(glob(b"", b""));
-        assert!(glob(b"hello", b"hello"));
+    tests! {
+        empty(b"", b"");
+        literal(b"hello", b"hello");

The last match pattern[p] can use c instead of _ and not having to repeat pattern[p].

If you don't have to convert the logic, can also just use pattern matching instead of multiple conditionals, it could be easier to understand.

fn glob(pattern: &[u8], text: &[u8]) -> bool {
    let mut p: usize = 0;
    let mut t: usize = 0;
    let mut np: usize = 0;
    let mut nt: usize = 0;

    loop {
        match (pattern.get(p), text.get(t)) {
            (Some(b'*'), _) => {
                np = p;
                nt = t + 1;
                p += 1;
            (Some(b'?'), _) if nt < text.len() => {
                p += 1;
                t += 1;
            (Some(pc), Some(tc)) if pc == tc => {
                p += 1;
                t += 1;
            (Some(_), _) | (_, Some(_)) if nt > 0 && nt <= text.len() => {
                p = np;
                t = nt;
            (Some(_), _) | (_, Some(_)) => return false,
            _ => return true,
