noloader / rufus

The Reliable USB Formatting Utility

Home Page:https://rufus.ie

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RUFUS testing environments

noloader opened this issue · comments

@pbatard ,

Could you describe your testing environments, please. I can setup a Windows 7, 8 or 10 VM for testing. But I don't work in MinGW so I don't have a testing environment for the platform.

On the plus side, I have a Celeron J3455 machine with SHA acceleration. I also have a 10th gen Core i7, which provides SHA, too. I can perform testing on real hardware.

I'm also interested in how to enable the debug path in checksum.c. Is it as simple as defining _DEBUG? It looks like you have some SHA self tests there.

Windows 10 should be fine. I no longer test for Windows 7. If you install Visual Studio 2022 Community Edition, you'll have my main testing environment replicated (just click Local Windows Debugger after selecting the type of build and arch).

For MinGW, you need to install msys2 from here, and then install mingw-w64-i686-toolchain mingw-w64-x86_64-toolchain base-devel autotools git upx using pacman (if in doubt, refer to what we do in our GitHub Actions MinGW build script).

For the checksum tests, you'll get them automatically when building the DEBUG version in Visual Studio, so you don't have to define anything. What you do need to do however is call TestChecksum() in our convenient test shortcut (Ctrl-T), which means that, right now, you need to replace the https://github.com/noloader/rufus/blob/master/src/rufus.c#L3781-L3782 lines with TestChecksum();.

Then, when running the DEBUG version, you can just press Ctrl-T and you'll see the following in the log:

Test MD5    0: PASS
Test MD5    1: PASS
Test MD5    2: PASS
Test MD5    3: PASS
Test SHA1   0: PASS
Test SHA1   1: PASS
Test SHA1   2: PASS
Test SHA1   3: PASS
Test SHA256 0: PASS
Test SHA256 1: PASS
Test SHA256 2: PASS
Test SHA256 3: PASS
Test SHA512 0: PASS
Test SHA512 1: PASS
Test SHA512 2: PASS
Test SHA512 3: PASS

Hope this helps.

Thanks you sir!

Thanks again @pbatard,

So I was able to build Rufus in a Windows 10 x64 build 21H2 on a machine with SHA acceleration. However, I am having trouble getting into the debug screen. When I pressed CTRL+T nothing happened. I am not sure if that is due to VirtualBox eating the keystrokes.

rufus-in-vm

I was thinking... It may be easier to supply a command on the command line, like rufus.exe /c or rufus.exe --checksum. What do you think of this:

jwalton@coffee:~/rufus$ cat rufus-cmdline.txt 
diff --git a/src/rufus.c b/src/rufus.c
index 71eff7c7..6c7236ed 100755
--- a/src/rufus.c
+++ b/src/rufus.c
@@ -56,6 +56,11 @@
 #include "../res/grub/grub_version.h"
 #include "../res/grub2/grub2_version.h"
 
+/* For testing, like https://github.com/noloader/rufus/issues/2 */
+#if defined(_DEBUG) || defined (DEBUG)
+extern int TestChecksum(void);
+#endif
+
 enum bootcheck_return {
        BOOTCHECK_PROCEED = 0,
        BOOTCHECK_CANCEL = -1,
@@ -3194,7 +3199,7 @@ static void PrintUsage(char* appname)
        char fname[_MAX_FNAME];
 
        _splitpath(appname, NULL, NULL, fname, NULL);
-       printf("\nUsage: %s [-x] [-g] [-h] [-f FILESYSTEM] [-i PATH] [-l LOCALE] [-w TIMEOUT]\n", fname);
+       printf("\nUsage: %s [-x] [-g] [-c] [-h] [-f FILESYSTEM] [-i PATH] [-l LOCALE] [-w TIMEOUT]\n", fname);
        printf("  -x, --extra-devs\n");
        printf("     List extra devices, such as USB HDDs\n");
        printf("  -g, --gui\n");
@@ -3208,6 +3213,8 @@ static void PrintUsage(char* appname)
        printf("  -w TIMEOUT, --wait=TIMEOUT\n");
        printf("     Wait TIMEOUT tens of seconds for the global application mutex to be released.\n");
        printf("     Used when launching a newer version of " APPLICATION_NAME " from a running application.\n");
+       printf("  -c, --checksum\n");
+       printf("     Test checksum algorithms. Only available in Debug builds.\n");
        printf("  -h, --help\n");
        printf("     This usage guide.\n");
 }
@@ -3308,6 +3315,7 @@ int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine
                {"locale",     required_argument, NULL, 'l'},
                {"filesystem", required_argument, NULL, 'f'},
                {"wait",       required_argument, NULL, 'w'},
+               {"checksum",   no_argument,       NULL, 'c'},
                {0, 0, NULL, 0}
        };
 
@@ -3487,6 +3495,13 @@ int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine
                                case 'w':
                                        wait_for_mutex = atoi(optarg);
                                        break;
+                               case 'c':
+#if defined(_DEBUG) || defined(DEBUG)
+                                       TestChecksum();
+#else
+                                       printf("\nChecksums are only available in Debug builds.");
+#endif
+                                       goto out;
                                case 'h':
                                        PrintUsage(argv[0]);
                                        goto out;

An additional benefit to the command line argument is, this is something you can easily test in a CI/CD pipeline.


Off-topic, the "Itanic" cracked me up. It almost put milk through my nose:

static const char* arch_name[ARCH_MAX] = {
	"unknown", "x86_32", "x86_64", "ARM", "ARM64", "Itanic", "RISC-V 32", "RISC-V 64", "RISC-V 128", "EBC" };

RIP...

@pbatard,

The change I made for the checksum option did not work. When I launch rufus.exe from the command line with the -c or /c option, I get a message 'invalid option." Now the odd thing is, Dbgview shows the -c option being parsed. (Dbgview can be used to capture OutputDebugString messages).

Do you see anything wrong with checksum option change?

rufus-checksum-option

I'd rather not have a checksum commandline option.

There are a few things you need to consider.

  1. Ctrl-T will only work when running the DEBUG version of Rufus. If you are using the RELEASE version, this will not work (but you can make it work if you remove the #if defined(_DEBUG))
  2. The output appears in the log, so you need to press Ctrl-L or the small log button (left of START) to open the log. If using Visual Studio you will also see all the log messages in the Output tab at the bottom of your screen

Thanks @pbatard,

I was using Debug build, so that was not the problem. I did not check the log. That was the problem. I originally looked for rufus.log in PWD, but that was incorrect. I needed to look in %APPDATA%.

So after all that... I found VirtualBox intercepts cpuid instruction and masks-out SHA features. Arg...

I think I'm going to hardcode a TRUE for testing on this machine. I'll report back shortly.

No worries, feel free to take your time, as I won't really mind if this get delayed to get it right.

Just a couple more points that may be of interest when testing:

  1. Obviously running the DEBUG version does add overhead compared to the RELEASE, so when testing actual speed improvements, you do want to run the RELEASE. It shouldn't matter that much that you can no longer use the self-test then, as the data being digested is far too small to provide an efficient speed test. Instead, you probably want to work with a known large ISO (4 GB or more), preferably residing on the fastest disk you have around (NVMe SSD) and compute the checksums repeatedly. Rufus will provide you the computation time in the lower right corner so you should be able to find if your enhancements manage to shave some time there. Oh, and you obviously want to test with the 64-bit rather than the 32-bit version while you're at it.
  2. For even more immediate results, while the checksums are being computed, you can press Alt repeatedly to see the actual processing speed (in MB of data being checksumed per second). This should also give you a good idea if your improvements are making a significant difference.

Image1

@pbatard,

Awesome, thanks for the hints.

Things tested fine once I hardcoded TRUE. I kind of expected it to be the case. I just needed to see it with my own two eyes.

Do you want a PR based on the code in this GitHub? Since you are a collaborator, you are free to modify the code before a PR. That saves us time going back and forth about things - you can simply make the change to ensure it is to your liking. It also avoids the squash-merges and such at your GitHub.

rufus-sha-accel-ok

@pbatard,

Ok, so I just performed a release run of Rufus. I wrote Win10_21H2_English_x32.iso (4191506432 bytes) to a thumb drive over USB 3.0. It took 3:15 to complete.

I tried (repeatedly) pressing ALT to see the processing speed, but I was not able to get "Computing Image Checksums" to display (like shown in your image). I also watched the status bar and never saw the program create checksums. In fact, I even inspected the log and did not see mention of checksums there, either.

Maybe I am doing something wrong? Or maybe VirtualBox is eating the ALT key?

I know using SHA instructions will achieve about 1.5 to 2.0 cycles-per-byte (cpb) for SHA-1, and about 3.5 to 4.0 cpb for SHA-256. That's going to be 6x to 10x faster than a typical C/C++ implementation.

So I guess my question is, if checksums are being computed 7x or 8x faster, would we even have an opportunity to see the checksums computed? What formerly took 20 or 30 seconds could be completed in a few seconds - as fast as the SSD can supply the data.

To move from cpb to MB/s, and to give you an idea of how fast that is... On my Core i5-1035G1 (10th gen) with a base freq of 1.0 GHz and turbo freq of 3.5 GHz, SHA-1 runs at 1.65 GB/s and SHA-256 runs at 1.30 GB/s. Yes, that's Gigabytes per second.

Here's the log file: rufus.log.zip

never saw the program create checksums.

You need to click the (✓) button, next to the SELECT button to compute checksums once you have selected an image. Checksums are NOT computed automatically when creating media.

Also speed will only be displayed for operations where we read the image block by block, which isn't the case when creating a boot media in ISO mode (default) where we copy each file individually.

if checksums are being computed 7x or 8x faster, would we even have an opportunity to see the checksums computed?

Then, if that is really the case, you would tesk using a RAMDisk and a large ISO (just duplicate the install.wim from a Windows ISO util you have a 16 GB one or something).

Do you want a PR based on the code in this GitHub?

I'd prefer one, so that there's some reference, but it's up to you. What's going to happen though is that I'm going to pick your code and integrate it as I see fit and, if you think it can be improved, you can submit a new PR. I really want to limit all this back and forth if possible.

I'm in the process of integrating your code (sadly, it doesn't look like I have any CPU that's modern enough to have SHA acceleration 😢), and the way I'm planning to do it is to remove the need for a static initializer, since that doesn't seem to sit too well with MSVC...

I haven't seen where exactly the need for a static initializer, listed in the 4 preliminary conditions for detection, comes from and, looking at the code, I'd be quite surprised if MSVC (that relies on __cpuid()/__cpuidex()) or gcc (__builtin_cpu_supports()) CPU feature detection is going to fail if not using static initialization. At least, the examples given for these calls don't seem to show anything that would limit their use at any time during the application flow. Can't say I care much about Intel or Clang CC if they are the limiting factors there...

Right now, I'm planning to just perform SHA accel detection during the Rufus init code, but, since I can't validate detection, I am still curious as to where the static initializer condition you listed comes from in the first place, and whether dropping it could actually hurt the detection process.

Hi @pbatard,

Right now, I'm planning to just perform SHA accel detection during the Rufus init code

Yes, this should work. If I knew Rufus code better, I would have offered some code to do it. Due to my unfamiliarity, I just stuffed it into a static initializer.

Also keep in mind... you're cutting-in x86 code, but I also have ARM64 code. The problem I have with ARM64 is, I don't own any Microsoft test devices, so I can't test it. As soon as someone offers a test machine, I can provide the code for testing. ARM64 is fine under Linux with GCC, but we really need a hands-on test of Microsoft.

I am still curious as to where the static initializer condition you listed comes from in the first place

That's a C language feature. A static (non-local) variable in a translation unit will be initialized in a random order. You can control the order of the static initializers using extensions like MS's init_seg and GCC's init_priority attribute.

and whether dropping it [static init] could actually hurt the detection process

Yeah, this should be fine as long as you detect the feature one time before use.

The thing you want to avoid is running the cpuid code for each call to HasSHA1() or HasSHA256(). That will hurt performance.

it doesn't look like I have any CPU that's modern enough to have SHA acceleration

If you put your changes on a feature branch, then I can test it for you.

If you want an inexpensive machine for testing, then try something from AMD with the Zen3 architecture, like a Ryzen 3 or Ryzen 5.

(I just bought my mother a Beelink mini-pc with a Ryzen 5 5560U. Its throughput for SHA-1 is around 2.1 GB/s, and SHA-256 is about 1.8 GB/s. It cost $310 USD during Amazon's Black Friday sale, https://www.amazon.com/dp/B0B2RHXLDK ).

I have integrated your proposal in pbatard@36f4716.

You'll see that I have hacked away a few things when it comes to the detection, starting with just disabling compilation for compilers that don't support the features we need, since I couldn't care less about people who use compilers that aren't up to date. I also removed the SHA-512 templates altogether, since there doesn't seem any x86 or ARM support for the time being, and we can just add support when/if the extensions happen.

As mentioned above, I also moved detection into Rufus init and just used a couple global booleans.

If you want to test, you can find working (VS2022) binaries that include these changes in the artifacts from https://github.com/pbatard/rufus/actions/runs/3647826731. There's also the 32-bit only MinGW binary at https://github.com/pbatard/rufus/actions/runs/3647826732. Note that Ctrl-T should work with these executables, as I have now enabled the feature for ALPHA builds.

Hi @pbatard,

I hope to check this tonight. I'll run the Rufus on one of the AMD machines with native SHA support.

I'll report back tonight or tomorrow.

So I'm still having trouble with Virtual Box. I can't get VBox to enable SHA in the guest VM. I've got an open question on the VBox forums at modifyvm --cpu-profile host does not provide host cpu features.