How to run the code with smaller ckpt like opt-6.7B

Question

How to run the code with smaller ckpt like opt-6.7B

czq693497091 opened this issue 5 months ago · comments

With limited GPU resources, how to use opt-6.7b to just run the code?

Zhenqian Chen · Answer 1 · Mon Apr 08 2024 19:35:36 GMT+0800 (China Standard Time)

I successfully run the collect_sp_data.sh with opt-6.7b and opt-66b. But when I apply the sparse_predictor to train the mlp predictor for both opt-6.7b and opt-66b, but both shows all zero y, which means that the mlp_label_0.mmap is all zero and the training process ends. Is the problem widespread, and how can it be solved?

xubaizhou · Answer 2 · Wed Apr 17 2024 18:48:55 GMT+0800 (China Standard Time)

I successfully run the collect_sp_data.sh with opt-6.7b and opt-66b. But when I apply the sparse_predictor to train the mlp predictor for both opt-6.7b and opt-66b, but both shows all zero y, which means that the mlp_label_0.mmap is all zero and the training process ends. Is the problem widespread, and how can it be solved?

the predictor also needs to be trained, and are you running with a single GPU?

Zhenqian Chen · Answer 3 · Wed Apr 17 2024 18:51:24 GMT+0800 (China Standard Time)

I successfully run the collect_sp_data.sh with opt-6.7b and opt-66b. But when I apply the sparse_predictor to train the mlp predictor for both opt-6.7b and opt-66b, but both shows all zero y, which means that the mlp_label_0.mmap is all zero and the training process ends. Is the problem widespread, and how can it be solved?

the predictor also needs to be trained, and are you running with a single GPU?

I trained the predictor with more than one gpus. But when I tried to train the mlp predictor, it shows all zero. I don't know how to solve it.

Ke Cheng · Answer 4 · Mon Apr 22 2024 11:03:50 GMT+0800 (China Standard Time)

I successfully run the collect_sp_data.sh with opt-6.7b and opt-66b. But when I apply the sparse_predictor to train the mlp predictor for both opt-6.7b and opt-66b, but both shows all zero y, which means that the mlp_label_0.mmap is all zero and the training process ends. Is the problem widespread, and how can it be solved?

I encountered the same problem, but I don't know how to solve it.Have you solved the problem now?

Zhenqian Chen · Answer 5 · Mon Apr 22 2024 22:04:29 GMT+0800 (China Standard Time)

I successfully run the collect_sp_data.sh with opt-6.7b and opt-66b. But when I apply the sparse_predictor to train the mlp predictor for both opt-6.7b and opt-66b, but both shows all zero y, which means that the mlp_label_0.mmap is all zero and the training process ends. Is the problem widespread, and how can it be solved?

I encountered the same problem, but I don't know how to solve it.Have you solved the problem now?

still not. I email to the author but she said that is should not be all zero.

Ke Cheng · Answer 6 · Mon Apr 22 2024 22:26:48 GMT+0800 (China Standard Time)

I successfully run the collect_sp_data.sh with opt-6.7b and opt-66b. But when I apply the sparse_predictor to train the mlp predictor for both opt-6.7b and opt-66b, but both shows all zero y, which means that the mlp_label_0.mmap is all zero and the training process ends. Is the problem widespread, and how can it be solved?

I encountered the same problem, but I don't know how to solve it.Have you solved the problem now?

still not. I email to the author but she said that is should not be all zero.

sad..(T＿T).. thanks for your response

Ke Cheng · Answer 7 · Fri Apr 26 2024 11:52:39 GMT+0800 (China Standard Time)

I successfully run the collect_sp_data.sh with opt-6.7b and opt-66b. But when I apply the sparse_predictor to train the mlp predictor for both opt-6.7b and opt-66b, but both shows all zero y, which means that the mlp_label_0.mmap is all zero and the training process ends. Is the problem widespread, and how can it be solved?

I encountered the same problem, but I don't know how to solve it.Have you solved the problem now?

still not. I email to the author but she said that is should not be all zero.

Hey, I have found the reason for the problem. The issue is with fp_label, because the author has allocated a fp_label.mmap file of size [400000 ,(4 * hidden_size)] for storing fp_label, but in reality, fp_label does not contain that much data. When the author was dividing the validation set, they selected the last 0.05 * len(fp_label) of data from the fp_labl file, which resulted in the reading of empty data. As a result, the MLP cannot receive effective training. To solve this problem, you can modify the def get_data(args, l) function in ./DejaVu/sparse_predictor/main_mlp.py.

def get_data(args, l):
    if CONFIG[args.model]['ckt_storage'] == "bylayer":
        #path = f"{DATA[args.model][args.dataset]}/mlp_x_{l}.mmap"
        path = f"{DATA[args.model][args.dataset]}/mlp_sp_x_{l}.mmap"
        print(f"Reading query from {path}")
        query = np.array(np.memmap(path, dtype='float16', mode='r', shape=(400000,CONFIG[args.model]['d']))[: CONFIG[args.model]['N']])
        path = f"{DATA[args.model][args.dataset]}/mlp_label_{l}.mmap"
        print(f"Reading MLP label from {path}")
        label = np.array(np.memmap(path, dtype='float16', mode='r', shape=(400000,CONFIG[args.model]['d'] * 4))[: CONFIG[args.model]['N']])
        
        num_valid = (label.sum(-1) > 0).sum()
        print(num_valid)
        return  query[:num_valid], label[:num_valid]
        #return  query, label

This is my solution, I hope it will be helpful to you😀

Zhenqian Chen · Answer 8 · Fri Apr 26 2024 12:00:56 GMT+0800 (China Standard Time)

I successfully run the collect_sp_data.sh with opt-6.7b and opt-66b. But when I apply the sparse_predictor to train the mlp predictor for both opt-6.7b and opt-66b, but both shows all zero y, which means that the mlp_label_0.mmap is all zero and the training process ends. Is the problem widespread, and how can it be solved?

I encountered the same problem, but I don't know how to solve it.Have you solved the problem now?

still not. I email to the author but she said that is should not be all zero.

Hey, I have found the reason for the problem. The issue is with fp_label, because the author has allocated a fp_label.mmap file of size [400000 ,(4 * hidden_size)] for storing fp_label, but in reality, fp_label does not contain that much data. When the author was dividing the validation set, they selected the last 0.05 * len(fp_label) of data from the fp_labl file, which resulted in the reading of empty data. As a result, the MLP cannot receive effective training. To solve this problem, you can modify the def get_data(args, l) function in ./DejaVu/sparse_predictor/main_mlp.py.
def get_data(args, l):
    if CONFIG[args.model]['ckt_storage'] == "bylayer":
        #path = f"{DATA[args.model][args.dataset]}/mlp_x_{l}.mmap"
        path = f"{DATA[args.model][args.dataset]}/mlp_sp_x_{l}.mmap"
        print(f"Reading query from {path}")
        query = np.array(np.memmap(path, dtype='float16', mode='r', shape=(400000,CONFIG[args.model]['d']))[: CONFIG[args.model]['N']])
        path = f"{DATA[args.model][args.dataset]}/mlp_label_{l}.mmap"
        print(f"Reading MLP label from {path}")
        label = np.array(np.memmap(path, dtype='float16', mode='r', shape=(400000,CONFIG[args.model]['d'] * 4))[: CONFIG[args.model]['N']])
        
        num_valid = (label.sum(-1) > 0).sum()
        print(num_valid)
        return  query[:num_valid], label[:num_valid]
        #return  query, label
This is my solution, I hope it will be helpful to you😀

cool, thanks for your solution and I will try😀