J535D165 / pyalex

A Python library for OpenAlex (openalex.org)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Strict search

mazzespazze opened this issue · comments

Perhaps there is, but I could not find a way to perform a strict search.

search = "\"java JVM compiler\"" res, meta = Works().search(search).get(return_meta=True)

It returns results that contain the word java, some that contain jvm and some with compiler.

How do you "force" an AND operator on all the words?

Hi, see https://github.com/J535D165/pyalex#logical-expressions.

In [1]: from pyalex import Works

In [2]: Works().search(["java", "JVM", "compiler"]).count()
Out[2]: 6904

In [3]: Works().search(["java JVM compiler"]).count()
Out[3]: 6904

I think, by default, the AND logical is used. However, maybe the words are found in other fields (search searchers multiple fields afaik)? search_filter is more specific.

Sorry for the delay, lost the notification.

What I meant was mostly the sentence. Like "Java JVM" ends in searching Java and JVM separately over the literature. Perhaps one reference mentions Java in the title and JVM in the abstract. But what I am searching is the phrase Java JVM strict.

Basically performing a search like:
https://scholar.google.com/scholar?hl=sv&as_sdt=0%2C5&q=%22java+JVM%22&btnG=

@J535D165 any update on this?

I can't find the answer in the OpenAlex docs, but I gave it a try myself:

In [1]: from pyalex import Works

In [2]: Works().search(["java JVM"]).count()
Out[2]: 13958

In [3]: Works().search(["java", "JVM"]).count()
Out[3]: 13958

In [4]: Works().search("java+JVM").count()
Out[4]: 13958

In [5]: Works().search("java AND JVM").count()
Out[5]: 13958

In [6]: Works().search("\"java JVM\"").count()
Out[6]: 393

In [9]: Works().search_filter(display_name="java JVM").count()
Out[9]: 63

In [10]: Works().search_filter(abstract="java JVM").count()
Out[10]: 1769

In [11]: Works().search_filter(display_name="\"java JVM\"").count()
Out[11]: 7

In [12]: Works().search_filter(abstract="\"java JVM\"").count()
Out[12]: 22

In [13]: Works().search_filter(fulltext="\"java JVM\"").count()
Out[13]: 374

We can add this abstraction to pyalex:

Works().search_filter(abstract="java JVM", strict=True).count()

and

Works().search("java JVM", strict=True).count()

Let me know what you think!

Thank you @J535D165 ! The strict boolean would simplify much of it, but the examples you gave are great!

You can close this issue for the moment.