Sicos1977 / ChromiumHtmlToPdf

Convert HTML to PDF with a Chromium based browser

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Could you please support specifying file names during batch conversion?

2-3-5-7 opened this issue · comments

I have some URIs to convert, and I would like the input list to support specifying file names. The URL and file name would be separated by a tab character, as shown below. If there is no tab character, the default file name would be used.

https://nextjs.org/learn/foundations/about-nextjs	1. about-nextjs
https://nextjs.org/learn/foundations/about-nextjs/what-is-nextjs	1.1 what is nextjs
https://nextjs.org/learn/foundations/from-javascript-to-react	1.2 from-javascript-to-react
https://nextjs.org/learn/foundations/from-javascript-to-react/updating-ui-with-javascript

Also found two bugs (maybe?)

  • When the input is a URI, Path.ChangeExtension should not be used. Instead, it should be outputFile = inputUri.IsFile ? Path.ChangeExtension(outputFile, ".pdf") : outputFile + ".pdf".
  • The output seems to be treated as a file rather than a path, so I changed it to using var output = File.OpenWrite(Path.Combine(options.Output, "conversion_info.txt"));

I'm a C# beginner and I've tried to make some changes based on my understanding. Please forgive me if there are any mistakes.

if (options.InputIsList)
{
	_itemsToConvert = new ConcurrentQueue<ConversionItem>();
	_itemsConverted = new ConcurrentQueue<ConversionItem>();

	WriteToLog($"Reading input file '{options.Input}'");
	var lines = File.ReadAllLines(options.Input);
	foreach (var line in lines)
	{
		string[] arr = line.Split("\t");
		
		var inputUri = new ConvertUri(arr[0]);
		var outputPath = Path.GetFullPath(options.Output);
		string outputFile;

		if (arr.Length == 1)
			outputFile = inputUri.IsFile
			? Path.GetFileName(inputUri.AbsolutePath)
			: FileManager.RemoveInvalidFileNameChars(inputUri.ToString());
		else
			outputFile = arr[1];

		outputFile = inputUri.IsFile ? Path.ChangeExtension(outputFile, ".pdf") : outputFile + ".pdf";

		_itemsToConvert.Enqueue(new ConversionItem(inputUri,
			// ReSharper disable once AssignNullToNotNullAttribute
			Path.Combine(outputPath, outputFile)));
	}

	WriteToLog($"{_itemsToConvert.Count} items read");

	if (options.UseMultiThreading)
	{
		_workerTasks = new List<Task>();

		WriteToLog($"Starting {maxTasks} processing tasks");
		for (var i = 0; i < maxTasks; i++)
		{
			var i1 = i;
			_workerTasks.Add(_taskFactory.StartNew(() =>
				ConvertWithTask(options, (i1 + 1).ToString())));
		}

		WriteToLog("Started");

		Task.WaitAll(_workerTasks.ToArray());
	}
	else
		ConvertWithTask(options, null).GetAwaiter().GetResult();

	// Write conversion information to output file
	using var output = File.OpenWrite(Path.Combine(options.Output, "conversion_info.txt"));
	foreach (var itemConverted in _itemsConverted)
	{
		var bytes = new UTF8Encoding(true).GetBytes(itemConverted.OutputLine);
		output.Write(bytes, 0, bytes.Length);
	}

Finally, thank you for writing this program!

commented

That should be possible. I'll try to look into this this weekend.

I like your pacman logo :-)

commented

I added the option but I used a pipe sign instead of a tab. So your input file would be like this

inputfile1.html|outputfile1.pdf
inputfile2.html|outputfile2.pdf
inputfile3.html|outputfile3.pdf

Etc....