Does it able to read csv dataset and perform data analysis?

Question

Does it able to read csv dataset and perform data analysis?

hhy-joseph opened this issue a year ago · comments

And run code for data visualizations

Seungyoun, Shin · Answer 1 · Fri Aug 04 2023 09:43:08 GMT+0800 (China Standard Time)

You can indeed include user content in the prompt by using the syntax f"{user_content}\nUser Uploaded File: {'./tmp/file.csv'}". I am working on construction of datasets and supports Supervised Fine-Tuning (SFT) on LLama2 with data generated by GPT-4. #1

hhy-joseph · Answer 2 · Sun Aug 06 2023 19:30:08 GMT+0800 (China Standard Time)

Can we limit the Internet access ability? Let say make the model search answer from my DB instead of the Internet. Thank you!

Marco Neves · Answer 3 · Thu Aug 10 2023 01:57:39 GMT+0800 (China Standard Time)

hey @SeungyounShin I've recently implemented a similar library like yours that it's basically baked into llama.cpp and it uses a lot of the same techniques you have have implemented here.

Here is the project in case you are interesting in a possible collab-

Also-- I'm curious to know how you handled the prompts that require first examining a file's content and ingesting a sample of it for a more accurate/tailored completion.

Seungyoun, Shin · Answer 4 · Thu Aug 10 2023 21:33:22 GMT+0800 (China Standard Time)

Hi, @itsPreto

I'm genuinely intrigued by your project and am open to collaboration. It seems we're venturing into the same territory with our respective projects.

In regards to your query about handling prompts that involve examining a file's content: I believe the essence lies in prompt engineering. Many individuals have inquired about incorporating files. However, in my perspective, this doesn't necessarily require fine-tuning or intensive data curation. I typically utilize the following format:

### User : (... user question goes here ....)
User Uploaded File Path : "./tmp/file.csv"

### Assistant : (... assistant answer goes here ...)

This structure enables the model (in our case, LLM) to grasp the context of a user-uploaded file.

To ensure the model comprehends the content more effectively, I've implemented a "chain of thought" strategy:

Let's do step-by-step.

By introducing this, the output from LLM tends to resemble code execution in a Jupyter notebook environment. It's a mechanism for the model to correct its early-stage inaccuracies and to understand the underlying content better.

However, as with all models, it's not flawless. I'm in the process of training Llama2 using a dataset generated by GPT-4. I'm hopeful this will provide more profound insights. I'll certainly keep you posted on any significant discoveries. Thank you for reaching out!

I hope this revision articulates your response more clearly.

Marco Neves · Answer 5 · Thu Aug 10 2023 21:42:07 GMT+0800 (China Standard Time)

@SeungyounShin that makes a lot of sense! I'll be looking to implement the same this weekend.

I was also thinking about the possibility of training or fine-tuning a smaller model (similar to replit-v2-3B) that's specialized in function calling.

This way the ability to open and inspect files could be treated as a tool available! I think it would compliment the bigger models extremely well.

hhy-joseph · Answer 6 · Fri Aug 11 2023 13:10:27 GMT+0800 (China Standard Time)

Sorry for asking again, might I know which file I should put: f"{user_content}\nUser Uploaded File: {'./tmp/file.csv'}" to allow user upload csv? Thank you!

Seungyoun, Shin · Answer 7 · Fri Aug 11 2023 16:58:02 GMT+0800 (China Standard Time)

@hhy-joseph

To integrate the CSV upload feature using the mentioned format, you'd want to modify the section of code at chatbot.py Line 27C49-L27C56. You can adjust the msg variable accordingly.

However, it's crucial to remember that just revising this line might not be enough. The full integration will require adjustments in the Gradio UI to handle the CSV file uploads. I'm planning on updating this aspect in the near future to make it more seamless.

Thanks for bringing this to attention, and stay tuned for the updates!

Jipok · Answer 8 · Mon Aug 28 2023 17:26:20 GMT+0800 (China Standard Time)

@itsPreto
Sorry for writing here. But I don't know how else to contact you. Your project is good, but publishing on github in the current version has huge disadvantages. Since this is a fork you don't have any issues or discussions, how can a person ask a question/suggestion/improvement? I even have to write my current message in the issues of a third-party project. And if I had not accidentally stumbled upon you here, I would not have been able to write to you at all. It is almost impossible to find your project in the github search results, both because it is a fork and because you absolutely did not change the original description from llama.cpp. No tags.
In general, I suggest that you create a new repository with a suitable description and a set of tags. Or open old one. In this case, I think you will be successful in attracting people to the project.

Marco Neves · Answer 9 · Tue Aug 29 2023 03:34:24 GMT+0800 (China Standard Time)

@Jipok thanks for the feedback-- and sorry for hijacking this thread @SeungyounShin

Why a fork? Simply put, llama.cpp provides the best out-of-the-box inferencing for cpu/gpu across many operating systems. Forking it lets me almost seamlessly pull in any updates from the upstream as llama.cpp is heavily under development and always getting tons of improvements/breaking changes.
Unchanged project description: this is just because it's not possible to alter the project description of a forked project-- not sure if there are workarounds to this limitation but I'm not aware of any.

I'll look into refactoring the project to just use the published/released tags instead of baking my project directly into their library.

EDIT: Okay so I just double checked and I AM able to edit the project description so thank you for that!

EDIT#2: I put some more effort into and found out about git submodules so as you suggested I've unarchived the old one and added llama.cpp as a standalone module which will still let me track them and get all the goodies periodically. The project should be functioning again just make sure to build the llama.cpp module first before trying to use baby-code. You can now open issues and discussions as you please.

Jipok · Answer 10 · Tue Aug 29 2023 21:51:44 GMT+0800 (China Standard Time)

Why a fork? Simply put, llama.cpp provides the best out-of-the-box inferencing for cpu/gpu across many operating systems. Forking it lets me almost seamlessly pull in any updates from the upstream as llama.cpp is heavily under development and always getting tons of improvements/breaking changes.

I get it. I'm not saying anything about the implementation of your code. I meant "why are you using the fork feature on github itself"? Make it out as a separate project(or open an old one) and just use basic git remote add upstream git@github.com:ggerganov/llama.cpp.git locally and git pull upstream && git push master

Seungyoun, Shin · Answer 11 · Fri Sep 01 2023 01:37:03 GMT+0800 (China Standard Time)

close for temporarily