AngleSharp / AngleSharp.Css

:angel: Library to enable support for cascading stylesheets in AngleSharp.

Home Page:https://anglesharp.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GetInnerText() not available in PowerShell 7?

GruberMarkus opened this issue · comments

Prerequisites

  • Can you reproduce the problem in a MWE?
  • Are you running the latest version of AngleSharp.Css?
  • Did you check the FAQs to see if that helps you?
  • Are you reporting to the correct repository? (there are multiple AngleSharp libraries, e.g., AngleSharp.Xml for Xml support)
  • Did you perform a search in the issues?

Description

Dear AngleSharp.Css team,

I am looking for a way to convert HTML to plain text, as it would be rendered in a browser.

On Windows, HTMLFile .body.innerText delivers the expected result, but HTMLFile can't be used on Linux and macOS.

With AngleSharp.Css, .body.GetInnerText() delivers the expected result in C#, as you can see in this .Net Fiddle: https://dotnetfiddle.net/gx14nq

I now need to convert the C# code from above to PowerShell 7. It seems to work fine, but GetInnerText() is not available: Method invocation failed because [AngleSharp.Html.Dom.HtmlBodyElement] does not contain a method named 'GetInnerText'.

I would be very thankful if you could give me a hint what I am doing wrong. Thanks in advance!

Here is the PowerShell code, it fails on the last line:

#Requires -Version 7

Set-Location $PSScriptRoot

Import-Module '.\AngleSharp.dll' # NuGet, v1.2.0-beta.410, netstandard2.0
Import-Module '.\AngleSharp.Css.dll' # NuGet, v1.0.0-beta.139, netstandard2.0


$htmlContent = @'
<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
</head>
<body>

<h1>This is a Heading</h1>

<p>This is a paragraph.</p>

</body>
</html>
'@


$config = [AngleSharp.CssConfigurationExtensions]::WithCss([AngleSharp.Configuration]::Default)
$ctx = [AngleSharp.BrowsingContext]::New($config)
$htmlParser = $ctx.GetService[AngleSharp.Html.Parser.IHtmlParser]()
$doc = $htmlParser.ParseDocument($htmlContent)
$doc.Body.GetInnerText()

Steps to Reproduce

Please see above.

Expected Behavior

Please see above.

Actual Behavior

Please see above.

Possible Solution / Known Workarounds

No response

I am not sure why you label this a bug.

GetInnerText is an extension method. I have no idea of PowerShell, so you'll either need to find someone with more knowledge in PowerShell or get the knowledge how to use .NET in PowerShell scripts.

Most likely extension methods are not supported and you will need to call GetInnerText from the underlying static class.

Thanks for answering this fast!

The bug label was added automatically, and I could not remove it.