MarioFuzzy

A fuzz tool for super mario game.

Usage

Transform

python main.py transform <input_data>

<input_data>: Input data used for the transformation operation.

Fuzz

python main.py fuzz <seed_path> <crash_path>

<seed_path>: Path to the seed data for the fuzzing operation.
<crash_path>: Path to the crash data for saving crash reports during the fuzzing process.

Requirements

Game: Mario-Level-1

Required Packages: Please refer to the requirements.txt file for the list of required packages.

Notes

Please make sure to modify the game_path in the config.yaml file located in the action folder to your own game's absolute path.

Overview

南京大学软件学院软件测试课程代码作业，选题方向为程序分析-模糊测试。

实现命令行工具形式的基于变异的模糊器，设计思路如下：

定义测试输入：一组游戏操作的序列，五元组={left, right, up, jump, fire}
定义输出：o={is_mario, gold, score}， is_mario=mario是否存在画面中；gold=金币数量；score=游戏得分
种子调度，基于feedback
1. 种子优先级排序：得分更高序列优先
2. 能量分配：根据feedback结果，对照基准线分配energy
测试生成：六种变异算子，并使用退火算法进行变异算子调度
游戏执行，并截图保存
结果分析：使用图像识别训练模型，识别分析状态返回结果
结果保存：保存所有产生唯一状态的测试输入

本项目特点如下：

面向新的模糊目标：测试目标为经典游戏超级马里奥
添加新机制
- 更多样的变异算子：新增Shuffle变异算子
- 变异算子调度：使用退火算法实现
- 新颖的输出分析策略：训练图像识别模型，对游戏结果截图进行图像识别后获得游戏分数等信息

Design

本项目架构如下：

action: 包含与执行相关的文件
- config.yaml: 配置文件
- fuzz.py: Fuzz操作
- run.py: 游戏运行
- transform.py: Transform操作
test: 测试相关的文件
util: 工具类
- mutator_schedule: 变异算子调度工具
- output_analysis: 输出分析工具
- preprocess: 预处理工具
- seed_schedule: 种子调度工具
command.py: 命令解析
main.py: 执行入口
README.md: 项目的说明文档
requirements.txt: 项目所需的依赖包列表

Main

main.py

coloredlogs.install(level='INFO', fmt='%(asctime)s - %(levelname)s - %(message)s')


def main():
    logging.info("Welcome to mario fuzz!")
    parse_and_run()


if __name__ == '__main__':
    main()

CLI

command.py

使用argparse对命令行选项、参数进行解析。

增加了operation参数，并分为transform与fuzz命令。

def parse_and_run():
    parser = argparse.ArgumentParser(description='A fuzz tool for super mario game.')
    subparsers = parser.add_subparsers(dest='operation', help='Operation to perform')

    # 添加 transform 子命令
    transform_parser = subparsers.add_parser('transform',
                                             help='Transform the seed with one specific rules')
    transform_parser.add_argument('input_data', help='Input data for the transform operation')
    transform_parser.set_defaults(func=transform_cmd)

    # 添加 fuzz 子命令
    fuzz_parser = subparsers.add_parser('fuzz', help='Fuzz the game')
    fuzz_parser.add_argument('seed_path', help='Path of seeds for the fuzz operation')
    fuzz_parser.add_argument('crash_path', help='Path of crash data for the fuzz operation')
    fuzz_parser.set_defaults(func=fuzz_cmd)

    # 解析命令行参数并执行相应的操作
    args = parser.parse_args()
    args.func(args)

transform操作用于对指定的种子进行特定变异操作，尤其在前期开发对变异算子部分测试使用。

fuzz操作用于模糊测试，需要用户指定种子路径与crash存放路径。

Action

本模块包括Fuzz, Transform, Run三种操作

Fuzz

fuzz.py

初始化使用用户定义的target_path和crash_path来指定种子路径和崩溃路径。

def __init__(self, target_path, crash_path):
    self.target_path = target_path
    self.crash_path = crash_path

首先遍历种子文件收集初始分数，然后调用SeedSchedule进行种子调度，并根据游戏分数对种子进行能量分配，最后调用MutatorSchedule进行变异算子调度，并对变异后返回结果分析进行对应的保存。

def run(self):
    file_list = self.get_file_list()
    seed_score_pairs = []
    for i in tqdm(range(len(file_list)), "Loading seeds"):
        tmp_ops = read_file_content(file_list[i])
        _, gold, score = run.play_game(tmp_ops, len(tmp_ops))
        seed_score_pairs.append((tmp_ops, score))

    while True:
        seed_schedule = SeedSchedule(seed_score_pairs)
        selected_tuple = seed_schedule.schedule()

        energy = min(len(selected_tuple[0]), c.OP_COUNT_BASELINE * (selected_tuple[1] / c.SCORE_BASELINE))

        mutation_schedule = MutatorSchedule(selected_tuple, energy)
        output_data, score, is_crash = mutation_schedule.schedule()

        if is_crash:
            self.save_crash_data(output_data)
            continue
        else:
            seed_score_pairs.append((output_data, score))
            self.save_output_data(output_data)

Transform

transform.py

使用枚举规定Transform种类，除规定实现的CharFlip, CharIns, CharDel, Havoc, Splice五种算子，增加Shuffle变异算子：

class TransformKind(Enum):
    """Transforms for the action."""
    CHAR_FLIP = 0
    CHAR_INS = 1
    CHAR_DEL = 2
    HAVOC = 3
    SPLICE = 4
    SHUFFLE = 5

新增的Shuffle变异算子：对操作序列本身乱序

def shuffle(self):
    self.input_data = ''.join(random.sample(self.input_data, len(self.input_data)))
    return self.input_data

Run

run.py

使用pynput库将字符串的操作序列转换为游戏中的输入

class KeyboardActions:
    def __init__(self):
        self.keyboard = Controller()
        self.time_interval = 0.5

    def press_a_key(self):
        self.keyboard.press('a')
        time.sleep(self.time_interval)
        self.keyboard.release('a')
    def press_up_key(self):
        self.keyboard.press(Key.up)
        time.sleep(self.time_interval)
        self.keyboard.release(Key.up)
    def press_s_key(self):
        self.keyboard.press('s')
        time.sleep(self.time_interval)
        self.keyboard.release('s')

    def press_left_key(self):
        self.keyboard.press(Key.left)
        time.sleep(self.time_interval)
        self.keyboard.release(Key.left)

    def press_right_key(self):
        self.keyboard.press(Key.right)
        time.sleep(self.time_interval)
        self.keyboard.release(Key.right)

    def press_down_key(self):
        self.keyboard.press(Key.down)
        time.sleep(self.time_interval)
        self.keyboard.release(Key.down)

    def press_enter_key(self):
        self.keyboard.press(Key.enter)
        time.sleep(self.time_interval)
        self.keyboard.release(Key.enter)

对游戏窗口截图，交由识别部分进行图像识别，实时得到结果

def take_screenshot(window):
    # 获取窗口位置和大小
    window_x, window_y, window_width, window_height = window.left, window.top, window.width, window.height

    # 截取窗口图像
    screenshot = pyautogui.screenshot(region=(window_x + 20, window_y, window_width - 20, window_height - 20))
    # 将Pillow图像对象转换为OpenCV图像对象
    opencv_image = cv2.cvtColor(np.array(screenshot), cv2.COLOR_RGB2BGR)
    # 将OpenCV图像对象转换为灰度图像
    gray_image = cv2.cvtColor(opencv_image, cv2.COLOR_BGR2GRAY)
    boolValue = analysis.check_image(gray_image)
    gold = analysis.extract_gold(opencv_image)
    score = analysis.extract_score(opencv_image)

    # 可以根据需要返回截图对象或者其他信息
    return boolValue, gold, score

创建游戏进程，开始游戏操作

    pygame_process = subprocess.Popen(['python', game_path + '/mario_level_1.py'], stdin=subprocess.PIPE)

Utilities

PreProcess

preprocess/

Mario

mario/

存储三种状态下的mario图像素材

Resources

resources/

存储原始的mario游戏素材

getMarioPics.py

对原始mario游戏素材进行切片，得到三种状态下的mario图像素材

def main():
    mario_sprites = MarioSprites()

    # 保存所有图片
    save_images(mario_sprites.right_frames, "./mario/small_normal", "right_small_normal")
    save_images(mario_sprites.left_frames, "./mario/small_normal", "left_small_normal")

    save_images(mario_sprites.left_big_normal_frames, "./mario/big_normal", "left_big_normal")
    save_images(mario_sprites.right_big_normal_frames,"./mario/big_normal", "right_big_normal")

    save_images(mario_sprites.right_fire_frames, "./mario/big_fire", "right_big_fire")

Output Analyzer

output_analysis/

NumberTrain

numberTrain.py

对游戏的数字图像进行训练，通过图像中的数字轮廓，将每个数字提取出来，并将这些数字的分类结果和压平图像分别保存到 classifications.txt 和 flattened_images.txt中。

 # 遍历所有轮廓
 for npaContour in npaContours:
     # 如果轮廓面积大于阈值
     if cv2.contourArea(npaContour) > MIN_CONTOUR_AREA:
         [intX, intY, intW, intH] = cv2.boundingRect(npaContour)

         # 在原始图像上绘制红色矩形
         cv2.rectangle(imgTrainingNumbers, (intX, intY), (intX + intW, intY + intH), (0, 0, 255), 2)

         imgROI = imgThresh[intY:intY + intH, intX:intX + intW]
         imgROIResized = cv2.resize(imgROI, (RESIZED_IMAGE_WIDTH, RESIZED_IMAGE_HEIGHT))

         # 显示原始字符区域和调整大小后的字符区域
         cv2.imshow("imgROI", imgROI)
         cv2.imshow("imgROIResized", imgROIResized)
         cv2.imshow("training_numbers.png", imgTrainingNumbers)

         # 获取键盘输入
         intChar = cv2.waitKey(0)

         # 如果按下ESC键，退出程序
         if intChar == 27:
             sys.exit()
         # 如果按下有效字符，添加到分类列表
         elif intChar in intValidChars:
             intClassifications.append(intChar)

             # 压平字符区域并添加到数组
             npaFlattenedImage = imgROIResized.reshape((1, RESIZED_IMAGE_WIDTH * RESIZED_IMAGE_HEIGHT))
             npaFlattenedImages = np.append(npaFlattenedImages, npaFlattenedImage, 0)

 fltClassifications = np.array(intClassifications, np.float32)
 npaClassifications = fltClassifications.reshape((fltClassifications.size, 1))

 print("\n训练完成！！\n")
 np.savetxt("classifications.txt", npaClassifications)
 np.savetxt("flattened_images.txt", npaFlattenedImages)

NumberExtract

numberExtract.py

通过训练好的KNN模型对输入的数字图像进行识别提取。

def number_get(test_image):
  
    ......
    
    # 创建KNN对象
    kNearest = cv2.ml.KNearest_create()

    # 训练KNN模型
    kNearest.train(npaFlattenedImages, cv2.ml.ROW_SAMPLE, npaClassifications)

    ......
    
    # 最终识别结果字符串
    strFinalString = ""

    # 遍历有效轮廓
    for contourWithData in validContoursWithData:
        # 获取数字区域
        imgROI = imgThresh[contourWithData.intRectY: contourWithData.intRectY + contourWithData.intRectHeight,
                 contourWithData.intRectX: contourWithData.intRectX + contourWithData.intRectWidth]

        # 调整数字区域大小
        imgROIResized = cv2.resize(imgROI, (RESIZED_IMAGE_WIDTH, RESIZED_IMAGE_HEIGHT))

        # 将数字区域展平为一维numpy数组
        npaROIResized = imgROIResized.reshape((1, RESIZED_IMAGE_WIDTH * RESIZED_IMAGE_HEIGHT))

        # 将数据类型转换为float32
        npaROIResized = np.float32(npaROIResized)

        # 使用KNN进行识别
        retval, npaResults, neigh_resp, dists = kNearest.findNearest(npaROIResized, k=1)

        # 获取识别结果字符
        strCurrentChar = str(chr(int(npaResults[0][0])))
        strFinalString = strFinalString + strCurrentChar
    return strFinalString

OutputAnalysis

outputAnalysis.py

提取mario图像的特征点，将特征点保存在train_des.pkl

def extract_mario_des(self):
    template_image_path = '../preprocess/mario/small_normal/right_small_normal_0.png'
    arr_des = []
    arr_kp = []
    arr_template = []
    # 遍历文件
    PATH_TO_TEST_IMAGES_DIR = '../preprocess/mario'
    for pidImage in glob.glob(PATH_TO_TEST_IMAGES_DIR + "/*/*.png"):
        template_image_path = pidImage
        template_image_path = template_image_path.replace('\\', '/')
        # print(template_image_path)

        # 读取mario和背景
        template = cv2.imread(template_image_path, 0)
        if template is None:
            print('Could not open or find the images!')
            exit(0)
        template = cv2.resize(template, None, fx=2, fy=2)
        # 创建SIFT对象
        sift = cv2.xfeatures2d.SIFT_create()
        # 提取特征点
        kp1, des1 = sift.detectAndCompute(template, None)
        arr_des.append(des1)
        arr_kp.append(kp1)
        arr_template.append(template)
    save_data('train_des.pkl', arr_des)
    return arr_des, arr_kp, arr_template

提取游戏窗口截图的特征点，与已提取的mario特征点进行匹配，返回输入图像中是否包含mario

def check_image(self, screenshot):
    train_des = load_des('train_des.pkl')
    if screenshot is None:
        print('Could not open or find the images!')
        exit(0)
    # 创建SIFT对象
    sift = cv2.SIFT_create()
    # 提取特征点
    kp2, des2 = sift.detectAndCompute(screenshot, None)
    bf = cv2.BFMatcher()
    for i in range(len(train_des)):
        # 匹配特征点
        matches = bf.knnMatch(train_des[i], des2, k=2)
        good = []
        for m, n in matches:
            if m.distance < 0.75 * n.distance:
                good.append([m])
        # 若匹配超过3点则认为存在
        if len(good) > 3:
            return True
    return False

提取游戏窗口截图中的游戏分数

def extract_score(self, screen):
    imgROI = screen[165:219, 140:429]
    # cv2.imshow("ROI_WIN", imgROI)
    # cv2.waitKey(0)
    res = number_get(imgROI)
    if len(res) > 6:
        res = res[0:6]
    # print("score:"+res)
    return res

提取游戏窗口截图中的金币数

def extract_gold(self, screen):
    imgROI = screen[165:219, 635:730]
    # cv2.imshow("ROI_WIN", imgROI)
    # cv2.waitKey(0)
    res = number_get(imgROI)
    if len(res) > 2:
        res = res[0:2]
    # print("gold:"+res)
    return res

Mutator Scheduler

mutator_schedule/

实现了一个可以根据种子和能量进行游戏运行的函数。

def get_score(seed, energy):
    is_mario, _, score = action.run.play_game(seed, energy)
    return is_mario, score

以模拟退火算法为载体进行变异调度，先选中操作种子，并获取其得分。如果发生崩溃，停止操作并返回崩溃。在算法中，通过在迭代循环中在当前种子附近进行变异，并计算新种子的得分，同时监测程序是否崩溃，然后根据Metropolis准则计算接受概率，在接受概率内或得到了更高分的种子-得分对的情况下，会更新种子-得分对，循环直到迭代结束。最后函数会返回迭代完成后的最佳种子-得分对。

def schedule(self):
    current_op = self.seed_score_pairs[0]
    is_mario, current_score = get_score(current_op, len(current_op))

    if not is_mario:
        return current_op, current_score, False

    best_op = current_op
    best_score = current_score

    for _ in tqdm(range(self.iterations)):
        self.temperature *= self.cooling_rate

        # 在当前字符串附近进行变化
        transform_action = Transform(current_op)
        new_op = transform_action.transform()

        # 计算新字符串的得分
        is_mario, new_score = get_score(new_op, self.energy)

        if not is_mario:
            return new_op, current_score, False

        # 计算接受概率
        acceptance_probability = math.exp((new_score - current_score) / self.temperature)

        # 根据概率决定是否接受新字符串
        if new_score > current_score or np.random.uniform(low=0, high=1) < acceptance_probability:
            current_op = new_op
            current_score = new_score

        # 更新最佳字符串
        if current_score > best_score:
            best_op = current_op
            best_score = current_score

    return best_op, best_score, True

Seed Scheduler

seed_schedule/

实现了一个排序算法，用于对种子序列进行排序。

def sort_tuples(tuples_list):
    sorted_tuples = sorted(tuples_list, key=lambda x: x[1], reverse=True)
    return sorted_tuples

实现了一个选择算法，用于对种子进行有一定权重的选择。在这里，我们将排好序的种子-分数对按一定比例切分成高分组和低分组，并在低分组中随机选取一个同高分数组一同进行随机选择。

def select_tuples(sorted_tuples, high_ratio):
    total_tuples = len(sorted_tuples)
    high_count = int(total_tuples * high_ratio)

    high_tuples = sorted_tuples[:high_count]
    random_tuple = random.choice(sorted_tuples[high_count:])

    selected_tuple = random.choice(high_tuples + [random_tuple])
    return selected_tuple

结果分析

应用六种变异算子，生成53个种子，详细可见product/seed

QRXqrx / MarioFuzzy

MarioFuzzy

Usage

Transform

Fuzz

Requirements

Notes

Overview

Design

Main

CLI

Action

Fuzz

Transform

Run

Utilities

PreProcess

Mario

Resources

Output Analyzer

NumberTrain

NumberExtract

OutputAnalysis

Mutator Scheduler

Seed Scheduler

结果分析

About

Languages