liyucheng09 / Selective_Context

Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40% memory and GPU time.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

python main.py

lqcStar opened this issue · comments

Where do we get the six parameters:arxiv_path, news_path, conversation_path, save_to_path, num_articles, model_name = sys.argv[1:]

arxiv_dir = /mnt/fast/nobackup/users/yl02706/llm_memorize/arxiv/500arxiv
news_dir = /mnt/fast/nobackup/users/yl02706/llm_memorize/news
conversation_dir = /mnt/fast/nobackup/users/yl02706/llm_memorize/conversation

these are places to load and cache dataset.

You can find download links for the three datasets in readme.

save_to_path the path you save your experimental results.

I will update main.py shortly to provide more implementation details. Stay tuned.

@lqcStar Check the updated reproduction guideline. Let me know if you have any further quesiton.

Thanks for you reply, it's my pleasure, but when i unzip the datasets_dumps.zip, it contained three directories :arxiv,conversion and news. Arxiv contains 500 json files, conversion conversion contains an empty json file, news is an empty directory.

And how is the last model name parameter specified and should it be downloaded to the server, num_articles is set to what is appropriate.

try again with same code.

Now conversion is not empty anymore.

news is intented to be empty. so no problem.

num_articles set to 300.

model name is any huggingface model, like huggyllama/llama-7b.

Thank you very much for your answer. My problem is solved,
Best wishes!