emcniece / ha_pdf

Home Assistant PDF File Sensor

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

New install, not working?

Chacsam opened this issue · comments

commented

Hello,

I have just tried to install ha_pdf as per documentation.

Git duplicated, files in the correct folder:

/Home_Assistant/config/custom_components$ ls ha_pdf -l
total 24
-rw-rw-r-- 1 frederic frederic   30 May  9 11:11 __init__.py
-rw-rw-r-- 1 frederic frederic  191 May  9 11:11 manifest.json
drwxr-xr-x 2 root     root     4096 May  9 11:18 __pycache__
-rw-rw-r-- 1 frederic frederic 2803 May  9 11:11 README.md
-rw-rw-r-- 1 frederic frederic 4485 May  9 11:11 sensor.py

Configured my Home Assistant:

homeassistant:
  packages: !include_dir_named packages
  whitelist_external_dirs:
    - /config/www
    - /config/files-fred
  allowlist_external_dirs:
    - /config/files-fred/downloads/energy

sensor:
  - platform: pdf
    name: "Test PDF Sensor"
    file_path: /config/files-fred/downloads/energy/ipsum.pdf

Where the pdf file is a plain text, ipsum.pdf

No "Test PDF Sensor" to be found anywhere

Feedback in log:

Logger: homeassistant.components.sensor
Source: custom_components/ha_pdf/sensor.py:117
Integration: Capteur (documentation, issues)
First occurred: 11:30:27 (1 occurrences)
Last logged: 11:30:27

pdf: Error on device update!
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 521, in _async_add_entity
    await entity.async_device_update(warning=False)
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 784, in async_device_update
    await coro
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/ha_pdf/sensor.py", line 117, in update
    text = page.extractText()
  File "/usr/local/lib/python3.10/site-packages/PyPDF2/_page.py", line 1899, in extractText
    deprecation_with_replacement("extractText", "extract_text", "3.0.0")
  File "/usr/local/lib/python3.10/site-packages/PyPDF2/_utils.py", line 369, in deprecation_with_replacement
    deprecation(DEPR_MSG_HAPPENED.format(old_name, removed_in, new_name))
  File "/usr/local/lib/python3.10/site-packages/PyPDF2/_utils.py", line 351, in deprecation
    raise DeprecationError(msg)
PyPDF2.errors.DeprecationError: extractText is deprecated and was removed in PyPDF2 3.0.0. Use extract_text instead.

Anything I am doing wrong?

Hi! Thanks for the detailed notes, that helps a lot.

You're not doing anything wrong - it looks like there has been an update to the underlying PyPDF2 library that changes the extractText method to extract_text instead.

The documentation for this method is here: https://pypdf2.readthedocs.io/en/stable/user/extract-text.html

The only occurrence of this method in this repo is here: https://github.com/emcniece/ha_pdf/blob/master/sensor.py#L117

Does it work if you modify line 117 of sensor.py to use page.extract_text()? I don't have an installation running right now for easy testing, but I might be able to investigate more in the next few days.

Give this a shot and let's see what happens 😄

commented

Hi,
Thanks for quick response.

So I did change line 117 in sensor.py in the /config/custom_components/ha_pdf folder to:

                except IndexError:
                    _LOGGER.error("PDF Page %s does not exist in file: %s", self._pdf_page, self._file_path)
                text = page.extract_Text()

restarted Home Assistant, but no pdf sensor to be seen.

Updated log states something different:

Logger: homeassistant.components.sensor
Source: custom_components/ha_pdf/sensor.py:117
Integration: Capteur (documentation, issues)
First occurred: 21:03:11 (1 occurrences)
Last logged: 21:03:11

pdf: Error on device update!
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 521, in _async_add_entity
    await entity.async_device_update(warning=False)
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 784, in async_device_update
    await coro
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/ha_pdf/sensor.py", line 117, in update
    text = page.extract_Text()
AttributeError: 'PageObject' object has no attribute 'extract_Text'
commented

Sorry, my mistake, I just realized I left the capital 'T' in the page.extract_Text() line.
With a small 't' page.extract_text() does work.
That seems to be the fix

Merged a PR and tagged https://github.com/emcniece/ha_pdf/releases/tag/v1.1.2 - feel free to keep your changes locally or check out the repo at master or v1.1.2 👍

Thank you for your help testing this fix!

commented

Thanks for the quick fix, now I can start the complex task of finding the right RegEx to parse my electricty provider's monthly rate card :)