What is the limitations/requirements to execute csharp code in notebook?
hjy1210 opened this issue · comments
The package and version I'm asking about:
Polyglot Notebooks v1.0.5208010
Question
What is the limitations/requirements to execute csharp code in notebook?
I can execute a simple .net 8.0 csharp console app correctly in VS 2022.
But when copy the code to notebook, error occured when executing. What is missing when ported to notebook?
The code in notebook is as bellow:
#r "nuget:itext7"
#r "nuget:itext7.font-asian"
using iText.Kernel.Pdf.Canvas.Parser.Listener;
using iText.Kernel.Pdf.Canvas.Parser;
using iText.Kernel.Pdf;
using System.Text;
string ExtractText(string filePath)
{
var pdfReader = new PdfReader(filePath);
var pdfDoc = new PdfDocument(pdfReader);
StringBuilder sb = new StringBuilder();
for (int i = 1; i <= pdfDoc.GetNumberOfPages(); i++)
{
var page = pdfDoc.GetPage(i);
LocationTextExtractionStrategy strategy = new LocationTextExtractionStrategy();
sb.AppendLine(PdfTextExtractor.GetTextFromPage(page, strategy));
}
pdfDoc.Close();
var data = sb.ToString();
return data;
}
Console.WriteLine(ExtractText(@"c:\lucenedata\documentsroot\2007-1.pdf"));
Console.WriteLine("Press any key to close app");
Console.ReadKey();
the error message appeared as:
Error: iText.IO.Exceptions.IOException: The CMap iText.IO.Font.Cmap.UniCNS-UTF16-H was not found.
at iText.IO.Font.Cmap.CMapLocationResource.GetLocation(String location)
at iText.IO.Font.Cmap.CMapParser.ParseCid(String cmapName, AbstractCMap cmap, ICMapLocation location, Int32 level)
at iText.IO.Font.Cmap.CMapParser.ParseCid(String cmapName, AbstractCMap cmap, ICMapLocation location)
at iText.IO.Font.CjkResourceLoader.ParseCmap[T](String name, T cmap)
at iText.IO.Font.CjkResourceLoader.GetUni2CidCmap(String uniMap)
at iText.Kernel.Font.FontUtil.GetToUnicodeFromUniMap(String uniMap)
at iText.Kernel.Font.PdfType0Font..ctor(PdfDictionary fontDictionary)
at iText.Kernel.Font.PdfFontFactory.CreateFont(PdfDictionary fontDictionary)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.GetFont(PdfDictionary fontDict)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.SetTextFontOperator.Invoke(PdfCanvasProcessor processor, PdfLiteral operator, IList`1 operands)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.InvokeOperator(PdfLiteral operator, IList`1 operands)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.ProcessContent(Byte[] contentBytes, PdfResources resources)
at iText.Kernel.Pdf.Canvas.Parser.PdfCanvasProcessor.ProcessPageContent(PdfPage page)
at iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor.GetTextFromPage(PdfPage page, ITextExtractionStrategy strategy, IDictionary`2 additionalContentOperators)
at iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor.GetTextFromPage(PdfPage page, ITextExtractionStrategy strategy)
at Submission#3.ExtractText(String filePath)
at Submission#4.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)
Following is the pdf file appeared in the code.
2007-1.pdf
This might be an issue with this specific package. Do you happen to know what location it's looking for? For example, if it's looking in a build output location, it won't find it, since there's no build output for the C# Script.
Unrelated to the exception, Console.ReadLine
won't work in the notebook. Input gestures are documented here: https://github.com/dotnet/interactive/blob/main/docs/input-prompts.md
@jonsequitur
About Do you happen to know what location it's looking for?
What does it mean?
I was referring to this from your exception details:
Error: iText.IO.Exceptions.IOException: The CMap iText.IO.Font.Cmap.UniCNS-UTF16-H was not found.
at iText.IO.Font.Cmap.CMapLocationResource.GetLocation(String location)
My guess is that this is a file in the package that the build would normally copy to the build output (in a normal C# project build). The code is probably looking for this file in that location. But C# Script doesn't do a build and so the file isn't in the expected location (but it is in the NuGet package cache).
This would be something that this package would need to account for in order to work correctly in C# Script / .NET Interactive.
@jonsequitur
The Visual Studio C# project build output directory contains following files, once click the execution file RxNetPuzzle.exe
, the program executed as expected.
I still do not know how to fix the problem, thanks for your time.
itext.barcodes.dll
itext.bouncy-castle-connector.dll
itext.commons.dll
itext.font_asian.dll
itext.forms.dll
itext.io.dll
itext.kernel.dll
itext.layout.dll
itext.pdfa.dll
itext.pdfua.dll
itext.sign.dll
itext.styledxmlparser.dll
itext.svg.dll
Microsoft.DotNet.PlatformAbstractions.dll
Microsoft.Extensions.DependencyInjection.Abstractions.dll
Microsoft.Extensions.DependencyInjection.dll
Microsoft.Extensions.DependencyModel.dll
Microsoft.Extensions.Logging.Abstractions.dll
Microsoft.Extensions.Logging.dll
Microsoft.Extensions.Options.dll
Microsoft.Extensions.Primitives.dll
Newtonsoft.Json.dll
RxNetPuzzle.deps.json
RxNetPuzzle.dll
RxNetPuzzle.exe
RxNetPuzzle.pdb
RxNetPuzzle.runtimeconfig.json