InternLM / InternLM

Official release of InternLM2 7B and 20B base and chat models. 200K context support

Home Page:https://internlm.intern-ai.org.cn/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug] InternLM2 int4 出现重复说话、重复前置内容(system prompt)现象

sanbuphy opened this issue · comments

Describe the bug

我使用了 lmdeploy 部署 InternLM2 int4的pipeline进行推理用于英文翻译,发现多次pipe后(每一次都是单独的 pipe(输入))会出现类似情况:(初始化一次pipe然后多次使用)

image

但当我把这句话单独推理的时候就一切正常,不会重复。想请问这样的问题如何避免?谢谢。也不太可能是精度影响,因为无论是原版还是微调也有很多小伙伴反应过。

还有一个神奇的现象是,我的代码:

from lmdeploy import pipeline, TurbomindEngineConfig,GenerationConfig
class InternLM2():
    def __init__(self,model_path="",max_batch_size=1,session_len=4096):
        """
        实现模型的初始化以及初始参数设定
        """
        self.model_path = model_path
        self.model = self._load_model(max_batch_size,session_len)

    def _load_model(self,max_batch_size,session_len):
        engine_config = TurbomindEngineConfig(model_format='awq',max_batch_size=max_batch_size,session_len=session_len)
        if self.model_path is not "":
            pipe = pipeline(self.model_path, backend_config=engine_config)
        else:
            pipe = pipeline("internlm/internlm2-chat-7b-4bits", backend_config=engine_config)
        return pipe

    def infer(self,system_prompt, src_text: str,gen_config:GenerationConfig) -> str:
        response = self.model([system_prompt + src_text],gen_config)
        return response


internLM2 = InternLM2(session_len=2048)
gen_config = GenerationConfig(top_k=20,top_p=0.3,temperature=0.1)
translator_system_prompt = """
    把下列文字翻译成中文,只返回给我结果:
    """

with open(translate_filename, 'w', encoding='utf-8') as file:
    for chunk in new_paragraphs:
        chunk_translate = internLM2.infer(translator_system_prompt,f"{chunk}" ,gen_config)
        print(chunk, '\n' ,chunk_translate[0].text)
        file.write(
            chunk_translate[0].text +'\n')

但我发现有时候他会重复说translator_system_prompt,不管怎么调整prompt无法让他只是返回给我结果:
image

Environment

sys.platform: linux
Python: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 3060
CUDA_HOME: //usr/local/cuda-12
NVCC: Cuda compilation tools, release 12.3, V12.3.52
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.1.0+cu121
PyTorch compiling details: PyTorch built with:

  • GCC 9.3
  • C++ Version: 201703
  • Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 12.1
  • NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  • CuDNN 8.9.2
  • Magma 2.6.1
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.16.0+cu121
LMDeploy: 0.2.4+f5bc455
transformers: 4.36.2
gradio: 3.36.1
fastapi: 0.109.2
pydantic: 2.6.1

Other information

No response

待翻译文本:


Hello, everybody. I am Eli Euregas. I am an engineering manager at Meta. I support the PyTorch developer infrastructure team, and we're going to be giving a talk about what's new for PyTorch developer infrastructure along with Omkar, who is also an engineer on the team. As a note, Sergey is also on the team as well, so we work on a lot of different things. So first and foremost, I kind of want to introduce PyTorch developer infrastructure. This is the first time I think as a team we're giving a talk at the conference, so PyTorch developer infrastructure is everybody here that's pictured except for Nikita, who I think is in the room today, back there. He might have requested changes on some of your pull requests at some point, so I'm sure you'll see him if you contribute to PyTorch. So first and foremost, I kind of want to go over what is the mission for developer infrastructure. So it's kind of changed over the past couple of years, and I think it's going to change in the future as well. So first and foremost, in 2022,
we focused on achieving stability. So how do you get green CI? We wanted to make sure that we wanted to integrate some things about, like, flaky test detection and disablement in order to be able to ensure that our developers are getting the most green CI and the most reliable CI. On top of that, for this year, our main focus has been towards going faster. So how can we provide, you know, quicker time to signal through things like target determination, test reordering? We use a lot of heuristics in order to be able to actually determine what tests to run when. And then finally, I think looking towards the future, what we're really trying to do is we're trying to integrate generative AI. It's like the new hotness, the new crazy thing to kind of implement, and so we want to be able to figure out how to use that within our workloads and to be able to increase developer productivity. And we think that actually some of those use cases might apply even outside of PyTorch. So some high-level goals that we kind of go about when attempting to create developer infrastructure.
So first and foremost, we want to have a high confidence and trust in testing infrastructure. What we find is that when people have this trust and confidence in our testing infrastructure, they usually are able to kind of submit PRs quicker. At the same time, we want to have easy-to-use tools providing visibility and stability. So if you can understand, you can see your signal, you can usually get better results out of that. And then at the end of the day, we also want to maintain high velocity with reusable components. If you have a component that can be reused in a lot of different places, maybe generic, we want to be able to do that as well. And at the end of the day, this all goes towards increasing developer productivity. So ensuring that we can kind of maintain a high velocity with a high level of confidence as well. And so for the fun part, so going over some of the tools that we've built, and I kind of want to make a note that all of this tooling is open source.
The test info repository that Sergey mentioned in his talk about TorchFix contains all of the code for all of these things. So if you're interested in kind of rolling these out to your own code bases, you can also do that as well. So hud.pytorch.org is our flagship tool. This is the tool where most of the developers will interact with the signal here. Typically, historically, it's been used as an on-call tool to be able to see when breakages occur. This is our timeline view. So this is all the commits that go onto main of PyTorch. And we can see all of the signals they roll in. Through this tool, we can also control the developer experience that PyTorch developers might have. Even if we transition to a different CI system, be that CircleCI or GitHub Actions, we always have HUD to be able to tell people what their signal actually is. On top of that, we also collect a ton of metrics. I have a high confidence in saying this. We probably collect the most amount of metrics about our CI for any open source project that exists today. Not vetted.
Don't take me at my word for that, but I'm pretty sure that's the case. What we do, actually, is we collect a ton of metrics for every interaction that you have on GitHub within the PyTorch organization. And we use those metrics to kind of inform our decision making and to showcase to users what's happening on the CI at the time. The other thing is at PyTorch bot merge. This is probably one of the biggest changes that's occurred over the years. I'm sure you guys have seen this if you guys have contributed to PyTorch at some point. At PyTorch bot merge, for historical context, the way that it used to work at the PyTorch organization was that you would submit a PR, that PR would get merged into the Facebook monorepo, and then that would end up getting published back to GitHub. But now we can merge directly on GitHub, which is a great innovation. One of the better things about this is we actually implemented some features that GitHub had not yet introduced, such as merge on green. And then as well,
we can also selectively disable or selectively ignore things like flaky tests, which is the next thing I'm going to talk about. Flaky test detection and disablement. One of the really cool things that we've done over the past year is really key in on this system. The way that it works is that whenever a test runs on PyTorch CI, it'll actually run a couple of times if it fails. If it passes a couple times and it fails a couple times, obviously that's a flaky test and we are going to disable it globally. The great thing about this system is that you actually don't need a code change in order to disable the test. This test will be disabled once the GitHub issue is posted and all subsequent runs will automatically disable this test by default. Another really cool thing is that we have a test run that goes once per day that actually tests all the flaky tests again to determine if they're still flaky or not and will automatically re-enable them once they reach a stable state.
Three more things to talk about for developer infrastructure that are really important and that I really want to key in on. OSSCI Infra, basically where all of the CI jobs run. This is a cluster of about 3,500 machines that reaches a max peak every single day that tests every single code change that goes into PyTorch. Since PyTorch has such a wide variety of hardware requirements and such a wide variety of OS requirements, obviously the infrastructure that runs all of these jobs is going to be very diverse. We put a lot of work into making OSSCI Infra better. And so we need to meet the needs of the growing AI community. So we try to do that. As well, we have reusable workflow components. We understand that people who write machine learning frameworks aren't necessarily always the people that write the best CI. And so we want to be able to ensure that we have a set of reusable workflow components that do a bunch of things like set up a GPU, set up a CPU,
be able to set up a Python environment and allow them to be able to write things as simple as a bash script for their CI. And we also have a set of nightly binaries and validation. And we spent a lot of work in kind of automating that portion. And so to talk about that, I'm going to hand it over to Omkar. Cool. Thanks, Eli. Hey, everybody. I'm Omkar. I'm a software engineer at Meta on the PyTorch Dev Infra team. So Eli just talked about a number of tools that we built to make the developer experience on PyTorch better. One of the big things that developer infra also includes is getting a healthy, correct, and performant version of PyTorch in the hands of our users. And we do this using our release infra. So we actually release at pretty large scale. Every night, we probably have around 500 build, test, and upload workflows running. And this number is really large because we support a pretty sweeping build matrix across different Python versions, different CUDA versions, for AMD GPU support, different operating systems, CPU architectures, et cetera.
And we don't just release the PyTorch PyTorch core repo. We actually release the entire ecosystem, which includes on the order of 10 other ecosystem projects. So extrapolated over the course of the entire year, we're publishing around 200,000 binaries. And so previously when we did this, we had a lot of logic strewn across the different ecosystem libraries. We didn't have any standardization in terms of the platforms that each of these libraries need to support. So this made releases really difficult. So we set out to change this. And so what we did is we created these modular reusable workflows built on top of GitHub Actions. So the motivation here is to get any project, existing or new project in the PyTorch ecosystem, up and running with their CICD in about 20 lines of config or code like this workflow right here. So what this does for you is it lets you opt into a particular platform that you want to build for. It sets up a clean build environment for you, builds the wheel or other type of binary,
supports hooks for custom pre-build or post-build steps that you might want to run, as well as any arbitrary smoke testing you want to run to validate the binary, and uploads it to the channel of choice. And with this, any library can get up and running with their CICD relatively quickly, be compliant with the entire PyTorch ecosystem, and essentially have a hands-free process for running both their nightly release and their official releases. So this is just the interface that a new project owner kind of has to implement, but what really happens under the hood? So we have this nightly cron trigger, and what that does is it takes all the commits made to a particular project over the course of the previous day, squashes them into one, and pushes that commit to the nightly branch. And so this triggers a bunch of the workflows that were defined in the config file from the previous slide. And this config allows you to opt into a variety of different platforms, particularly operating systems and package types. So we support wheels and conda builds across Linux, Mac M1, and Windows,
as well as new support for Linux ARM64 wheels and iOS and Android binaries. So each of these jobs triggers matrix generation. So that matrix generation essentially specifies which Python versions, CUDA and Rockham versions, that that platform needs to support. And it creates sub-jobs for each of these, and those sub-jobs are launched onto our self-hosted AWS cluster via GitHub Actions. Each of these sub-jobs also has their own unique hardware requirements. For example, a Linux wheels job for building GPU binaries will require an instance with GPUs, Windows machines, Windows jobs will require Windows instances, and so on. So our self-hosted cluster supports all of these different SKUs. And this is the same logical cluster that we maintain that runs all of our CI jobs and benchmarking jobs across the entire ecosystem. So once these jobs are launched onto the cluster, the machines there are pre-configured with custom-built AMIs, where appropriate. And for Linux jobs, we also build custom Docker images, and all the build and testing happens inside those containers. And once the binaries are built and verified, for conda binaries, they're uploaded to our Anaconda PyTorch nightly channel.
And for wheels binaries, they're uploaded to our self-hosted PyPI index, which is backed by S3 on the same AWS cluster. And this is the backend for the download.pytorch.org website that you guys can use. So this is all for one repository, right? We have to do this across every single repository in the PyTorch ecosystem. And we also need to do this in a dependency-aware fashion. Because, for example, torch vision nightlies for a particular day will depend on PyTorch nightlies from that same day. So we need to ensure that the PyTorch nightlies are built, tested, and uploaded before we can even start the torch vision builds. So all these builds are staggered in this dependency-aware fashion. And once they're all built, tested, and uploaded, we fetch them into ecosystem-wide validation workflows that run metadata checks and ensure that all these binaries can work well together. And that's the story of how PyTorch binaries are built and published to our users, usually with a pretty high success rate. If anything goes wrong, we trust that you will file GitHub issues and let us know. So this is all great,
right? We have tooling to make development on this large, complex project easier. We have well-defined systems for CICD. We have basically a reasonably automated process to get PyTorch to our users, both for nightlies and for official releases. So what's next? There's been a lot of activity and research around large language models for software engineering applications, right? For the rest of the conference, you've gathered that PyTorch is driving a lot of innovation in LLM research. So we're thinking of how we can close the loop to use LLMs to improve PyTorch, or particularly the PyTorch development experience. And there's plenty of interesting data that we have to rely on, including commits, the actual code, logs, metadata, et cetera. There's a large number of models, such as Code Llama, that are performant, that are fine-tuned on code. And are performant on things like code completion and multiline infilling. They're also reasonably small, so they can be inferenced on a single GPU with strict latency requirements.
And our idea is to use this data and these models fine-tuned on specific tasks to provide an improved developer experience for PyTorch developers. So let me provide some numbers to motivate one specific problem that we have. On every push to a PyTorch PR, we run around 2.3 million tests. Now, that's a really large number. There's a lot of tests replicated across this entire support matrix that we have. And because of this, it takes around four hours for any developer to get full signal on any change to their PR. And this leads to a relatively degraded experience for iterating on your code. I see a lot of nods here, so I guess everybody's kind of faced this issue. So that's the developer experience perspective. If you extrapolate that number to the entire year, we run around a trillion tests on PyTorch CI in the entire year, right? So that comes out of somebody's budget. And intuitively, not all the changes affect all the tests that are being run. So it's clear that we're overrunning tests.
So how can we take some information from the code being changed and use that to determine which tests are relevant? And so this practice is known as target determination, right? Can we figure out which tests are relevant based on what code is changed? And the PyTorch architecture is great for developer usage, great for building new features, but it's kind of bad for traditional target determination. There's a lot of generated code, complex interdependencies between modules, code that cuts across Python and PyBinding to C++, CUDA, et cetera. So traditional target determination has been pretty difficult. And we've tried. We've tried hard-coded rules about whether changes to one particular module should not run tests in another module or having explicit dependency graph, kind of like Buck or Bazel build systems have, or even using past failure rates. But they've all essentially had some issues and we've kind of had to revert these. So we had this idea. Can we treat this as an information retrieval problem, right? We look at the traditional two-tower model approach that's commonly used for search applications.
And instead of trying to use a user-supplied query to search for relevant documents, we take the code changed in a PR and use it to search for relevant unit tests. So we employed the Codelama 7b Python model and we parsed the AST of the PyTorch codebase to identify all the unit test functions and all functions that they call transitively and use Codelama to generate embeddings for each unit test function. So with this, we have an index of embeddings for all the unit tests. And then when a new PR comes in, we run the same AST parsing. We parse all the functions that are changed in the PR and generate embeddings for those using the same model. And then we can compare them with the index of unit test embeddings using something like cosine similarity. So the idea is that code embeddings will be similar to test embeddings for relevant tests and they will score higher. So at the end of this retrieval process, we essentially have a ranked list of tests from highest scoring for most relevant tests to lowest scoring for least relevant test. And eventually,
we can get to a system where we start to filter out the least relevant tests and over time, perhaps only run the most relevant tests. And this brings down the CI load as well as time to signal for developers. And the early results of this are pretty promising. It's pretty good at detecting tests. The test it flags as least relevant across a number of sample PRs are, in fact, unrelated to the actual change. And indexing and retrieving are both done in very reasonable time bounds. So we tested a change on the above function. This is a PR that changed torch.distributed fully sharded data parallel, or FSDP. In fact, it changed the init function of that module. And we found that all the tests flagged as most relevant are FSDP tests. And all the tests that were flagged as least relevant were from ONNX, JIT, Functorch, NamedTensor. And in fact, were unrelated to the change. So we're continuing to iterate on the system to prune out irrelevant functions, adding context about call stacks and dependencies to the sequences that we're encoding to generate embeddings and potentially doing dimensionality reduction for the embeddings.
And we're planning on rolling this out to run on PRs from a small set of users that opt in just to gather data about where this model performs well and where it doesn't. And so tying this into our overall vision for LLMs for developer tooling. The fast indexing and retrieving times came after extensive optimization of this end-to-end system. We're working on upstreaming some of our improvements, such as disabling the KV cache in the Code Llama inference code when using Code Llama as an embedding model, or efficiently retrieving intermediate activations from arbitrary layers in the model. And so the goal here is not only to improve the PyTorch developer experience, but also find opportunities to come back and improve PyTorch in the process or other inference code bases upstream. We also see that LLMs are a part of the overall system, but not necessarily the entire answer, right? They could be combined with heuristics like past failures. There's a body of work around using things like test correlations about where pairs of tests frequently pass or fail together.
That could be combined with the scores that we get from our LLM-powered approach. And we have a number of additional use cases in the pipeline that could deliver quality-of-life improvements. For example, identifying the exact error line from a large log, especially in PyTorch when we're running that many number of unit tests, you can end up with kilobytes, megabytes of logs. So concrete quality-of-life improvement there. Or finding out whether a job is flaky or not, right? Often there's a completely unrelated failure on your PR or blocking release, and that kind of prevents you from moving fast. So we're looking at more kind of LLM-powered approaches for each of these problems, as well as potentially generating unit tests further down the road. But we've tried to solve these problems in the past using other approaches, such as naively regex for error logs to retries, base revision retries, and similarity search on the chunk logs for flakiness detection. And so we're bringing that context from using these past approaches as we iterate on these LLM-powered approaches.
And the hope is that not only do we find improvements for PyTorch in the process, but we deliver these products and be able to deliver essentially an improved developer experience for all of you. And yeah, with that, that's it.

完整代码:

# 使用 internLM2 总结文件,格式化输出到目标位置
import re
import os
from pathlib import Path
from whispertranslator.llm import InternLM2
from lmdeploy import GenerationConfig
src_path = "原文.txt"
export_dir = './'

def split_text(text, max_word_count):

    def count_words(text):
        words = re.findall(r'\b\w+\b', text)
        return len(words)

    sentences = re.split(r'(?<=[,.])\s', text)  # 按照逗号和句号分割文本
    new_paragraphs = []
    current_paragraph = ''
    current_word_count = 0

    for sentence in sentences:
        sentence_word_count = count_words(sentence)
        if current_word_count + sentence_word_count <= max_word_count:
            current_paragraph += sentence + ' '
            current_word_count += sentence_word_count
        else:
            if current_word_count > 0:
                new_paragraphs.append(current_paragraph.strip())
            current_paragraph = sentence + ' '
            current_word_count = sentence_word_count

    if current_paragraph != '':
        new_paragraphs.append(current_paragraph.strip())

    return new_paragraphs

with open(src_path,'r') as file:
    full_text = file.read()

new_paragraphs = split_text(full_text, max_word_count=150)

translate_filename = os.path.basename(src_path) + '_translate_new' + '.txt'
translate_filename = Path(export_dir) / translate_filename

internLM2 = InternLM2(session_len=2048)
gen_config = GenerationConfig(top_k=20,top_p=0.3,temperature=0.1)
translator_system_prompt = """
    把下列文字翻译成中文,只返回给我结果:
    """
summary_system_prompt = f"""
    你现在是一个总结专家,请你帮我把下列文字用中文语言总结成一段话,并且分章节给出不同部分的总结:
    """
with open(translate_filename, 'w', encoding='utf-8') as file:
    for chunk in new_paragraphs:
        chunk_translate = internLM2.infer(translator_system_prompt,f"{chunk}" ,gen_config)
        print(chunk, '\n' ,chunk_translate[0].text)
        file.write(
            chunk_translate[0].text +'\n')
        
# with open(chunk_filename, 'r', encoding='utf-8') as file:
#     content = file.read()
#     content = internLM2.infer(summary_system_prompt,
#                                 content)[0].text + '\n' + content
# with open(chunk_filename, "w") as file:
#     file.write(content)

如果避免出现空格,似乎可以改善现象

重复说translator_system_prompt的问题改用这种方式试试呢?system prompt放到system的role里面,另外再强化一下指令的要求:

prompts = [[
{
    'role': 'system',
    'content': '把下列文字翻译成中文,只返回给我翻译结果,不要输出任何额外内容'
},
{
    'role': 'user',
    'content': '待翻译的文本'
},]
response = self.model(prompts, gen_config)

重复说translator_system_prompt的问题改用这种方式试试呢?system prompt放到system的role里面,另外再强化一下指令的要求:

prompts = [[
{
    'role': 'system',
    'content': '把下列文字翻译成中文,只返回给我翻译结果,不要输出任何额外内容'
},
{
    'role': 'user',
    'content': '待翻译的文本'
},]
response = self.model(prompts, gen_config)

仍然未改善 哭泣,还是有类似现象

image

image
有时候还会有这样的问题

我感觉可能是RLHF的时候有些过拟合了,导致模型变得过于helpful,一般表现为在回复的答案前后加过多额外的内容,没法严格遵循指令。
以及翻译名字变成书生浦语应该也是过拟合导致的,训练时候身份认知数据加太多导致“我的名字是”这几个token后面出现“书生浦语”的概率变得太高了。
chat模型实在纠正不过来的话,要不考虑换成没有rl过的chat-sft模型试试。不过我也不确定会不会变好。

我感觉可能是RLHF的时候有些过拟合了,导致模型变得过于helpful,一般表现为在回复的答案前后加过多额外的内容,没法严格遵循指令。 以及翻译名字变成书生浦语应该也是过拟合导致的,训练时候身份认知数据加太多导致“我的名字是”这几个token后面出现“书生浦语”的概率变得太高了。 chat模型实在纠正不过来的话,要不考虑换成没有rl过的chat-sft模型试试。不过我也不确定会不会变好。

感觉 ,得等下一版本?

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 7 days if the stale label is not removed or if there is no further response.

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 7 days if the stale label is not removed or if there is no further response.

This issue is closed because it has been stale for 7 days. Please open a new issue if you have similar issues or you have any new updates now.