Strict search
mazzespazze opened this issue · comments
Perhaps there is, but I could not find a way to perform a strict search.
search = "\"java JVM compiler\"" res, meta = Works().search(search).get(return_meta=True)
It returns results that contain the word java
, some that contain jvm
and some with compiler
.
How do you "force" an AND operator on all the words?
Hi, see https://github.com/J535D165/pyalex#logical-expressions.
In [1]: from pyalex import Works
In [2]: Works().search(["java", "JVM", "compiler"]).count()
Out[2]: 6904
In [3]: Works().search(["java JVM compiler"]).count()
Out[3]: 6904
I think, by default, the AND logical is used. However, maybe the words are found in other fields (search searchers multiple fields afaik)? search_filter
is more specific.
Sorry for the delay, lost the notification.
What I meant was mostly the sentence. Like "Java JVM" ends in searching Java
and JVM
separately over the literature. Perhaps one reference mentions Java
in the title and JVM
in the abstract. But what I am searching is the phrase Java JVM
strict.
Basically performing a search like:
https://scholar.google.com/scholar?hl=sv&as_sdt=0%2C5&q=%22java+JVM%22&btnG=
@J535D165 any update on this?
I can't find the answer in the OpenAlex docs, but I gave it a try myself:
In [1]: from pyalex import Works
In [2]: Works().search(["java JVM"]).count()
Out[2]: 13958
In [3]: Works().search(["java", "JVM"]).count()
Out[3]: 13958
In [4]: Works().search("java+JVM").count()
Out[4]: 13958
In [5]: Works().search("java AND JVM").count()
Out[5]: 13958
In [6]: Works().search("\"java JVM\"").count()
Out[6]: 393
In [9]: Works().search_filter(display_name="java JVM").count()
Out[9]: 63
In [10]: Works().search_filter(abstract="java JVM").count()
Out[10]: 1769
In [11]: Works().search_filter(display_name="\"java JVM\"").count()
Out[11]: 7
In [12]: Works().search_filter(abstract="\"java JVM\"").count()
Out[12]: 22
In [13]: Works().search_filter(fulltext="\"java JVM\"").count()
Out[13]: 374
We can add this abstraction to pyalex:
Works().search_filter(abstract="java JVM", strict=True).count()
and
Works().search("java JVM", strict=True).count()
Let me know what you think!
Thank you @J535D165 ! The strict boolean would simplify much of it, but the examples you gave are great!
You can close this issue for the moment.