tutorcruncher / pydf

PDF generation in python using wkhtmltopdf for heroku and docker

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Trouble getting Windows version going....

MikeTheWatchGuy opened this issue · comments

Hi there

Could be me being new to Python, but I do think I've go plenty of decades of programming tomake up for that bit of ignorance

I was attracted to PYDF because it can run async.

I'm already up and saving PDF files using the PDFKIT module.

I was sorta surprised that I needed to modify the source code to get pydf to run on Windows.

I needed to add an Environment variable pointing to the WkhltmlPDF.exe file. OK, not such a bad thing, but would have been nice to see in the installation for windows notes. I also changed the wkhtmltopdf.py file because the executable did not was the required .EXE on the end of the filename.

OK, got past that initial install kinda problems but I'm still crashing.

So, taking a moment to log an Issue to see perhaps I've bitten off more than I can chew. I was looking for something off the shelf that I didn't need to modify the package source for.

Maybe I didn't read the docs carefully enough??

Sorry to be a moron. It's my 3rd week on Python and Github, etc. Still, I've come a very long ways!!

==========================================

Curious what IDE folks use on these projects. I've been using PyCharm and I LOVE it! Wow, what a treat to have a full decbugger with breakpoints and a great variable browser. I tried Thromber, but it didn't work quite as well.

==========================================

Thank you to the authors of this package just the same! And I appreciate anyone that takes the time to answer stupid questions.

Update on this problem....

I managed to run the test harness that is supposed to benchmark the async and sync calls.

It WORKED for the synchronous calls!! VICTORY of sorts, despite having to hardcode some stuff that I mentioned earlier. I'll figure out how to unwind that I suppose.

I got this beautiful output!

C:\Anaconda3\python.exe C:/Users/mike/PycharmProjects/SaveAsPDF/TestPYDF.py
000: 1230723
001: 1230723
002: 1230723
003: 1230723
004: 1230723
005: 1230723
006: 1230723
007: 1230723
008: 1230723
009: 1230723
sync, time taken per pdf: 46.644s

And the timing was even correct :-)

Nice !

But then it raised a "Not Implemented" error when the rest of the script ran, trying to do the async test.

Still... this was significance progress from where I started. It even saved the page that I was hoping to save, an ebay listing!

Traceback (most recent call last):
  File "C:/Users/mike/PycharmProjects/SaveAsPDF/TestPYDF.py", line 58, in <module>
    count = loop.run_until_complete(go_async())
  File "C:\Anaconda3\lib\asyncio\base_events.py", line 467, in run_until_complete
    return future.result()
  File "C:/Users/mike/PycharmProjects/SaveAsPDF/TestPYDF.py", line 52, in go_async
    await asyncio.gather(*coros)
  File "C:/Users/mike/PycharmProjects/SaveAsPDF/TestPYDF.py", line 44, in gen
    margin_right='8mm',
  File "C:\Anaconda3\lib\site-packages\pydf\wkhtmltopdf.py", line 65, in generate_pdf
    loop=self.loop
  File "C:\Anaconda3\lib\asyncio\subprocess.py", line 225, in create_subprocess_exec
    stderr=stderr, **kwds)
  File "C:\Anaconda3\lib\asyncio\base_events.py", line 1191, in subprocess_exec
    bufsize, **kwargs)
  File "C:\Anaconda3\lib\asyncio\coroutines.py", line 210, in coro
    res = func(*args, **kw)
  File "C:\Anaconda3\lib\asyncio\base_events.py", line 340, in _make_subprocess_transport
    raise NotImplementedError
NotImplementedError

Process finished with exit code 1

(I've added back quotes so I can read the errors)

If you look at the asyncio subprocess docs here

It starts off by saying you need to use a different event loop on windows, I'd try that first. You can see that that NotImplementedError came from inside the python standard library asyncio code so it's not an issue directly with pydf.

I would try modifying the benchmark code to add

if sys.platform == 'win32':
    loop = asyncio.ProactorEventLoop()
    asyncio.set_event_loop(loop)

and see how you get on.

More generally

I'm no fan of windows I haven't used it seriously for 10 years and have no intention of starting.

However, If you want to submit a pull request to improve docs or code with windows (eg. adding the above code to examples and the benchmark) I'll happily review, merge and release and update.

Thanks for the very helpful response!

I'm no python expert (yet), but if I do manage to hack my way through and can clean it up or redesign from what I learn I will most certainly pass over to you.

Sounds serious however if the python standard lib needs to have Windows support added.

I need to look more at your response, try the code you suggest, and go from there.

I appreciate the help! (A LOT)

  • unpaid endorsement -

If you do have to use Window, I highly recommend PyCharm as an IDE. There's a free version that's downright amazing and it's making learning Python and the several projects I'm working on with it MUCH more enjoyable to write and especially debug. I'm shocked at these Python tools and libraries.

You've missed my point, the standard library is fine. Just that Window's asynchronous support is awful so you have to add those two lines to get asynchronous process communication working.

The STD lib docs are very clear, read them.

I know about pycharm, I have the commercial version.

Got it now! Told ya I needed to re-read your email.

Working on it now.

Glad I picked a good IDE that others are using. I tried about 6 of them before landing there.
I appreciate you sharing. I'll look into commercial version now too.

It's been a while since I've used an IDE... coming from the embedded world and early development on windows. Recall toolchains that looked and worked like pyCharm costing $500 / seat easily.

BTW... running now. Previously I was seeing identical values printed (as you can see in my initial post).

This time they're different.

It just finished with the output below.
I'll continue to study your code, the calls you recommend and I'll RTFM too.
Thanks again for your time

C:\Anaconda3\python.exe "C:\Program Files\PyCharm Community Edition 2017.2.3\helpers\pydev\pydevd.py" --multiproc --qt-support=auto --client 127.0.0.1 --port 55459 --file C:/Users/mike/PycharmProjects/SaveAsPDF/TestPYDF.py
pydev debugger: process 23544 is connecting

Connected to pydev debugger (build 172.3968.37)
000: 1215956
001: 1228537
002: 1373094
003: 1228097
004: 1240619
005: 1228097
006: 1238250
007: 1372292
008: 1228537
009: 1235539
Traceback (most recent call last):
File "C:\Program Files\PyCharm Community Edition 2017.2.3\helpers\pydev\pydevd.py", line 1599, in
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\PyCharm Community Edition 2017.2.3\helpers\pydev\pydevd.py", line 1026, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\PyCharm Community Edition 2017.2.3\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/mike/PycharmProjects/SaveAsPDF/TestPYDF.py", line 57, in
if sys.platform == 'win32':
NameError: name 'sys' is not defined
sync, time taken per pdf: 35.921s

Process finished with exit code 1

oh, and sorry I hadn't debugged that snipped of code you provided and added the import... stupid mistakes.

working on it now...

OK, I'm now debugging further....

The benchmark code is starting up the correct number of processes, but I'm not seeing any output from WKHTMLTOPDF and they never seem to complete.

The synchronous code's runs of WKHTMLTOPDF does indeed run until complete and isn't leaving any zombie-like processes around.

I'm going to need more study and debug time , reading and learning about that async library.

It's odd that the CPU time goes to almost zero for the processes and they're not completing.
I'm going to let them run for a long time to see if they will eventually finish.

I'm gonna punt on this.

I've spent 1/2 day already debugging code that's not mine. I have 1450 pages to go in the Python book I'm reading, two machine learning book, and figuring out the eBay API.
I don't have it in me to learn how to the Windows async API works under Python regardless of how well the documentation is written. I was looking for a library I could use, not a lesson on Windows APIs and architecture.

Maybe make a note on this project that it's broken for Windows? Or maybe it's already there and I didn't read enough of the docs to see it.

I'll just keep using the pdfkit library that is working.

Thanks for your help.