microlinkhq / metascraper

Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.

Home Page:https://metascraper.js.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[metascraper-image] sometimes image url contains [object%20Object]

fasenderos opened this issue · comments

Prerequisites

  • I'm using the last version.
  • My node version is the same as declared as package.json.

Subject of the issue

Sometimes imageRules return an incorrect image url with [object%20Object] in it. In the example below I'm using only two urls, but if it is necessary I can provide many others.

After debugging your code I have found that the problem happens in the urlObject function when args contains an object instead of a string

const urlObject = (...args) => {
try {
return new URL(...args)
} catch (_) {
return { toString: () => '' }
}
}

In the two urls provided, args contains

## FIRST URL
[ 
  { '@id': 'https://www.milanocittastato.it/#/schema/logo/image/' }, 
  'https://www.milanocittastato.it/'
]

## SECOND URL
[
  { '@id': 'https://www.belfusto.com/#/schema/logo/image/' },
  'https://www.belfusto.com/tendencia/ergowear/'
]

but I don't know where the { '@id': '....' } comes from.

Steps to reproduce

Note: You can reproduce the code using interactive Node.js shell by Runkit.

const metascraper = require('metascraper')([
  require('metascraper-image')(),
])

const { fetch } = require('undici')

const siteUrls = [
  'https://www.milanocittastato.it/',
  'https://www.belfusto.com/tendencia/ergowear/'
];

const getMetadata = async (siteUrl) => {
  const { html, url } = await fetch(siteUrl).then(async res => ({
    url: res.url,
    html: await res.text()
  }))
  const metadata = await metascraper({ html, url })
  return metadata;
}

;(async () => {
  for await (const siteUrl of siteUrls) {
    const metadata = await getMetadata(siteUrl);
    console.log(metadata);
  }
})()

Expected behaviour

It should find at least one image or return null

Actual behaviour

The response is

{ image: "https://www.milanocittastato.it/[object%20Object]" }
{ image: "https://www.belfusto.com/tendencia/ergowear/[object%20Object]" }

@fasenderos Thanks for reporting.

It should be fixed at v5.29.13. I just released it right now, can you test it?

It should be fixed at v5.29.13. I just released it right now, can you test it?

@Kikobeats I can confirm that now it works. Thanks for the fast fix.