Flexget / Flexget

The official FlexGet repository

Home Page:http://www.flexget.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

filter seen doesn't work if fields changed

trim21 opened this issue · comments

Expected behaviour:

seen plugin is expected to reject fetched task.

And I find that, if you update config, it may never remember any task

for example, I have a rss with dynamic torrent url,

<enclosure url="http://127.0.0.1:8745/torrent?q={...}" length="1024" type="application/x-bittorrent"/>

and {...} may change, but title and guid never change.

So I could have a config look like this:

tasks:
  test2:
    limit:
      amount: 20
      from:
        rss: http://127.0.0.11:8745/rss?q=1
    accept_all: true

    seen:
      fields:
        - original_url
#        - title
        - guid
    transmission: ...

But

If I change seen.fields, the seen plugin just become no-op.

for example, from this (config 1):

    seen:
      fields:
        - title

to this (config 2):

    seen:
      fields:
        - guid

and run it multiple times, it will always pipe this task to seen_info_hash, seen plugin never reject this task.

Actual behaviour:

seen plugin should reject save

Steps to reproduce:

  • use config 1
  • execute task
  • use config 2
  • execute task again
  • execute task again (you should saw task rejected by seen plugin, not seen_info_hash)

Log:

(click to expand)
paste log output here

Additional information:

  • FlexGet version: current develop dee678c
  • Python version:
  • Installation method:
  • Using daemon (yes/no):
  • OS and version:
  • Link to crash log:
import random
from pathlib import Path
from typing import Annotated

import fastapi
from fastapi import Query
from loguru import logger
from starlette.responses import Response


app = fastapi.FastAPI(debug=True)


@app.get("/rss")
def generate_rss():
    token = random.randbytes(12).hex()
    rss = f"""
 <rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
    <channel>
        <title>TT tt tt</title>
        <link>https://example.com</link>
        <description>desc</description>
        <dc:creator>tt</dc:creator>
        <item>
            <title>example title</title>
            <link>https://example.com/777581</link>
            <description>
                <![CDATA[ hi ]]></description>
            <enclosure
                    url="http://127.0.0.1:8745/torrent?q={token}"
            <pubDate>Wed, 24 Apr 2024 14:30:42 GMT</pubDate>
            <comments>https://example.com</comments>
            <guid isPermaLink="false">63a7820e3abee02347b07d8a0473db7ee49af2d1</guid>
            <dc:creator>N/A</dc:creator>
            <dc:date>2024-04-24T14:30:42Z</dc:date>
        </item>
    </channel>
</rss>
    """

    logger.info("generate rss with torrent token query q={}", token)

    return Response(content=rss.encode(), media_type="application/xml")


@app.get("/torrent")
def torrent_download(q: Annotated[str, Query()]):
    logger.info("torrent downloaded q={}", q)
    raise ValueError("please provide a valid torrent here")
    return Response(content=Path(...).read_bytes())


if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, port=8745)

and this can also be fixed by flexget seen forget --task test2 '*'

Entries end up rejected by seen_info_task because it runs before seen

$ flexget plugin | grep "seen"
| seen               | task               | filter(255),       | doc, builtin |
| seen_info_hash     | task               | filter(180),       | doc, builtin |
| seen_movies        | task               | filter(-255),      | doc          |

You could either override the plugin priority or disable seen_info_task altogether

EDIT: Turns out I misremembered it, priority goes High to Low, not the other way around.

no, it doesnt working.

I use same config1 in issue description, and use this as config 2, torrents are still downloaded, still rejected by seen_info_hash

    seen:
      priority: 10
      fields:
        - original_url
        - title
#        - guid

No it doesn't work indeed...

If I execute task with non seen config, then edit config with seen config with fields, seen plugin doesn't work.

    plugin_priority:
      seen: 170
    seen:
      fields:
        - original_url
        - guid

I add logger to these 2 plugin

2024-04-25 00:02:04 VERBOSE  task_queue                    There are 1 tasks to execute. Shutdown will commence when they have completed.
2024-04-25 00:02:04 VERBOSE  rss           test2           Bozo error <class 'xml.sax._exceptions.SAXParseException'> while parsing feed, but entries were produced, ignoring the error.
2024-04-25 00:02:04 VERBOSE  details       test2           Produced 1 entries.
2024-04-25 00:02:04 INFO     seen          test2           handle task 'example title'
2024-04-25 00:02:04 INFO     seen          test2           handle task 'example title'
2024-04-25 00:02:04 INFO     seen          test2           handle task 'example title'
2024-04-25 00:02:04 VERBOSE  task          test2           ACCEPTED: `example title` by accept_all plugin
2024-04-25 00:02:04 INFO     download      test2           Downloading: example title
2024-04-25 00:02:04 VERBOSE  details       test2           Summary - Accepted: 1 (Rejected: 0 Undecided: 0 Failed: 0)
2024-04-25 00:02:04 INFO     seen          test2           handle task 'example title'
2024-04-25 00:02:04 INFO     seen          test2           handle task 'example title'
2024-04-25 00:02:04 INFO     remember_rej  test2           Remembering rejection of `example title`
2024-04-25 00:02:04 VERBOSE  task          test2           REJECTED: `example title` by seen_info_hash plugin because entry with torrent_info_hash `D3CB4E9FBC394993E6EF11F16287F8C2B39E75F5` is already marked seen in the task test2 at 2024-04-24 23:56

NOTE: this doesn't happened in clean state. You must run task without seen config first, then edit config with seen fields to reproduce this bug.

You want to disable seen_info_hash? You can disable built-in plugins with the disable plugin.

disable:
  - seen_info_hash

Or, you could explicitly configure it as off:

seen_info_hash: no

Or have I misinterpreted the issue?

You want to disable seen_info_hash? You can disable built-in plugins with the disable plugin.

disable:
  - seen_info_hash

Or, you could explicitly configure it as off:

seen_info_hash: no

Or have I misinterpreted the issue?

seen plugin doesn't reject task as expected