filter seen doesn't work if fields changed
trim21 opened this issue · comments
Expected behaviour:
seen plugin is expected to reject fetched task.
And I find that, if you update config, it may never remember any task
for example, I have a rss with dynamic torrent url,
<enclosure url="http://127.0.0.1:8745/torrent?q={...}" length="1024" type="application/x-bittorrent"/>
and {...}
may change, but title and guid never change.
So I could have a config look like this:
tasks:
test2:
limit:
amount: 20
from:
rss: http://127.0.0.11:8745/rss?q=1
accept_all: true
seen:
fields:
- original_url
# - title
- guid
transmission: ...
But
If I change seen.fields
, the seen plugin just become no-op.
for example, from this (config 1):
seen:
fields:
- title
to this (config 2):
seen:
fields:
- guid
and run it multiple times, it will always pipe this task to seen_info_hash
, seen plugin never reject this task.
Actual behaviour:
seen plugin should reject save
Steps to reproduce:
- use config 1
- execute task
- use config 2
- execute task again
- execute task again (you should saw task rejected by seen plugin, not seen_info_hash)
Log:
(click to expand)
paste log output here
Additional information:
- FlexGet version: current develop dee678c
- Python version:
- Installation method:
- Using daemon (yes/no):
- OS and version:
- Link to crash log:
import random
from pathlib import Path
from typing import Annotated
import fastapi
from fastapi import Query
from loguru import logger
from starlette.responses import Response
app = fastapi.FastAPI(debug=True)
@app.get("/rss")
def generate_rss():
token = random.randbytes(12).hex()
rss = f"""
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
<channel>
<title>TT tt tt</title>
<link>https://example.com</link>
<description>desc</description>
<dc:creator>tt</dc:creator>
<item>
<title>example title</title>
<link>https://example.com/777581</link>
<description>
<![CDATA[ hi ]]></description>
<enclosure
url="http://127.0.0.1:8745/torrent?q={token}"
<pubDate>Wed, 24 Apr 2024 14:30:42 GMT</pubDate>
<comments>https://example.com</comments>
<guid isPermaLink="false">63a7820e3abee02347b07d8a0473db7ee49af2d1</guid>
<dc:creator>N/A</dc:creator>
<dc:date>2024-04-24T14:30:42Z</dc:date>
</item>
</channel>
</rss>
"""
logger.info("generate rss with torrent token query q={}", token)
return Response(content=rss.encode(), media_type="application/xml")
@app.get("/torrent")
def torrent_download(q: Annotated[str, Query()]):
logger.info("torrent downloaded q={}", q)
raise ValueError("please provide a valid torrent here")
return Response(content=Path(...).read_bytes())
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, port=8745)
and this can also be fixed by flexget seen forget --task test2 '*'
Entries end up rejected by seen_info_task
because it runs before seen
$ flexget plugin | grep "seen"
| seen | task | filter(255), | doc, builtin |
| seen_info_hash | task | filter(180), | doc, builtin |
| seen_movies | task | filter(-255), | doc |
You could either override the plugin priority or disable seen_info_task
altogether
EDIT: Turns out I misremembered it, priority goes High to Low, not the other way around.
no, it doesnt working.
I use same config1 in issue description, and use this as config 2, torrents are still downloaded, still rejected by seen_info_hash
seen:
priority: 10
fields:
- original_url
- title
# - guid
No it doesn't work indeed...
If I execute task with non seen config, then edit config with seen config with fields, seen plugin doesn't work.
plugin_priority:
seen: 170
seen:
fields:
- original_url
- guid
I add logger to these 2 plugin
2024-04-25 00:02:04 VERBOSE task_queue There are 1 tasks to execute. Shutdown will commence when they have completed.
2024-04-25 00:02:04 VERBOSE rss test2 Bozo error <class 'xml.sax._exceptions.SAXParseException'> while parsing feed, but entries were produced, ignoring the error.
2024-04-25 00:02:04 VERBOSE details test2 Produced 1 entries.
2024-04-25 00:02:04 INFO seen test2 handle task 'example title'
2024-04-25 00:02:04 INFO seen test2 handle task 'example title'
2024-04-25 00:02:04 INFO seen test2 handle task 'example title'
2024-04-25 00:02:04 VERBOSE task test2 ACCEPTED: `example title` by accept_all plugin
2024-04-25 00:02:04 INFO download test2 Downloading: example title
2024-04-25 00:02:04 VERBOSE details test2 Summary - Accepted: 1 (Rejected: 0 Undecided: 0 Failed: 0)
2024-04-25 00:02:04 INFO seen test2 handle task 'example title'
2024-04-25 00:02:04 INFO seen test2 handle task 'example title'
2024-04-25 00:02:04 INFO remember_rej test2 Remembering rejection of `example title`
2024-04-25 00:02:04 VERBOSE task test2 REJECTED: `example title` by seen_info_hash plugin because entry with torrent_info_hash `D3CB4E9FBC394993E6EF11F16287F8C2B39E75F5` is already marked seen in the task test2 at 2024-04-24 23:56
NOTE: this doesn't happened in clean state. You must run task without seen config first, then edit config with seen fields to reproduce this bug.
You want to disable seen_info_hash? You can disable built-in plugins with the disable plugin.
disable:
- seen_info_hash
Or, you could explicitly configure it as off:
seen_info_hash: no
Or have I misinterpreted the issue?
You want to disable seen_info_hash? You can disable built-in plugins with the disable plugin.
disable: - seen_info_hashOr, you could explicitly configure it as off:
seen_info_hash: noOr have I misinterpreted the issue?
seen plugin doesn't reject task as expected