AccentuSoft / LinkScope_Client

Repository for the LinkScope Client software.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Slow building and compiling

D3vil0p3r opened this issue · comments

Hello,
I'm trying to build the application by running build.sh but the building process is slow. During it, I get some warning messages like:

Nuitka:WARNING: Using very slow fallback for ordered sets, please install 'ordered-set' PyPI package for best Python
Nuitka:WARNING: compile time performance.
Nuitka-Plugins:WARNING: numpy: This plugin has been deprecated, do not enable it anymore.

despite I installed python-ordered-set, and also the following warnings:

Nuitka-Plugins:WARNING: anti-bloat: Unwanted import of 'unittest' encountered.               
Nuitka-Plugins:WARNING: anti-bloat: Unwanted import of 'unittest.result' encountered.
Nuitka-Plugins:WARNING: anti-bloat: Unwanted import of 'unittest.case' encountered.
Nuitka-Plugins:WARNING: anti-bloat: Unwanted import of 'unittest.suite' encountered.
Nuitka-Plugins:WARNING: anti-bloat: Unwanted import of 'unittest.loader' encountered.
Nuitka-Plugins:WARNING: anti-bloat: Unwanted import of 'unittest.main' encountered.
Nuitka-Plugins:WARNING: anti-bloat: Unwanted import of 'unittest.runner' encountered.
Nuitka-Plugins:WARNING: anti-bloat: Unwanted import of 'unittest.signals' encountered.
Nuitka-Plugins:WARNING: anti-bloat: Unwanted import of 'unittest.async_case' encountered.
Nuitka-Plugins:WARNING: anti-bloat: Unwanted import of 'unittest.util' encountered.
Nuitka-Plugins:WARNING: anti-bloat: Unwanted import of 'unittest._log' encountered.

and also:

Nuitka-Plugins:WARNING: pyside6: Unwanted import of 'tkinter' that is redundant with 'PySide6' encountered. Use         
Nuitka-Plugins:WARNING: '--nofollow-import-to=tkinter' or uninstall it for best compatibility with pure Python execution.

I'm working on Arch Linux. Is there a way to fix these warnings and speedup the compilation?

In the build script:
python${PYTHON_VER} -m pip install --upgrade wheel pip nuitka orderedset
Should be:
python${PYTHON_VER} -m pip install --upgrade wheel pip nuitka ordered-set

The anti-bloat warnings are benign - excluding the unittests used to break one of the libraries. We will push revisions to the build script for the next major version (1.5.0).

The tkinter warning should not be there, unless you've added a module that uses it. You can try including the parameter given in the warning, though LinkScope should build properly either way.

Regarding the long build time - the software has a lot of modules, and we're constantly adding more. Building the software with the latest build script compiles everything from Python into C, which is a lot of work. If you don't need all the modules included, you can omit them, so you're not compiling everything. There's no other way around this for now, unfortunately. In the future, we might try splitting off non-core modules into a marketplace of sorts, to let users download whatever modules they want.

Thank you for the answer @AccentuSoft

About

The tkinter warning should not be there, unless you've added a module that uses it. You can try including the parameter given in the warning, though LinkScope should build properly either way.

I didn't edit build.sh or other files. If the tkinter module should not be there, for avoiding this warning what I should do? Or do I need to edit some LinkScope file?

I don't have python-tkinter package installed.

You can try changing this line:
--enable-plugin=pyside6 --enable-plugin=numpy --enable-plugin=trio --assume-yes-for-downloads --remove-output \
to this:
--enable-plugin=pyside6 --nofollow-import-to=tkinter --enable-plugin=trio --assume-yes-for-downloads --remove-output \

You could also add the --noinclude-unittest-mode=nofollow option, if you want to test that LinkScope works with it enabled on Arch Linux.

Regarding why it shows up for you: Can you check if you have the python3-tk package installed (or the corresponding one for Arch Linux)? In theory, if you're just cloning this repository and running the build script, Nuitka should not be aware of the presence of the tkinter package if it's installed elsewhere, unless it's system-wide. Otherwise, maybe a package on Arch Linux used by LinkScope has a different version than on Ubuntu, and the Arch Linux version utilizes tkinter.

The python3-tk package seems to not be installed on my clean Arch Linux system
image

Do any of these queries find anything?
find / -iname "tkinter"
find / -iname "tk"

find / -iname "tkinter"

Yes, it found the following:

$ sudo find / -iname "tkinter"

/usr/lib/python3.10/tkinter

At this point I will delete it in case it is not used by another application or I will use the argument you suggested me above.

Furthermore, I'm also preparing a PKGBUILD for LinkScope in order that it can be installed by pacman on Arch world and I would like to store the pkg in BlackArch repository (that collects all InfoSec tools), if you agree.

The PKGBUILD should not contain Python virtual environment, and I won't use pip for installing dependencies. I need to explicitly insert pkg name in depends= variable of PKGBUILD. The most of them are already present in Arch Linux repositories. The issue is that, the following packages are not in Arch repositories: playwright (in AUR), docx2python, PyPDF2 (in AUR), python-Wappalyzer, vtapi3, exif (in AUR), snscrape (I can create a pkg for BlackArch repo), aiosmtplib (not sure if it is provided by python-aiosmtpd pkg), name-that-hash (in AUR).

A further question: why in your proposal above:

You can try changing this line:
--enable-plugin=pyside6 --enable-plugin=numpy --enable-plugin=trio --assume-yes-for-downloads --remove-output
to this:
--enable-plugin=pyside6 --nofollow-import-to=tkinter --enable-plugin=trio --assume-yes-for-downloads --remove-output \

you removed --enable-plugin=numpy?

you removed --enable-plugin=numpy?

It's depreciated, it does not do anything anymore. You can see this in the output of the OP:
Nuitka-Plugins:WARNING: numpy: This plugin has been deprecated, do not enable it anymore.

Furthermore, I'm also preparing a PKGBUILD for LinkScope in order that it can be installed by pacman on Arch world and I would like to store the pkg in BlackArch repository (that collects all InfoSec tools), if you agree.

Sounds good, we're happy to help with that. None of us is very familiar with the packaging style of Arch Linux however, so I'm afraid the help we can provide is a bit limited. I would like to ask about this:

The PKGBUILD should not contain Python virtual environment, and I won't use pip for installing dependencies.

Is this a rule or best practice? The python version of the playwright package in AUR for example uses pip:
https://aur.archlinux.org/packages/python-playwright

Relevant bit:

package() {
  PIP_CONFIG_FILE=/dev/null pip install --isolated --root="$pkgdir" --ignore-installed --no-deps *.whl
  python -O -m compileall "${pkgdir}"
}

This thread from last year shows someone successfully creating a PKGBUILD file with venv:
https://bbs.archlinux.org/viewtopic.php?id=274796

Also, note that the following system packages are required for the software to work (names are for the Ubuntu versions): graphviz, libopengl0 and libmagic1.

I already prepared the PKGBUILD code, I need to understand only how to manage AUR packages and other py packages. It should not be hard.

Is this a rule or best practice? The python version of the playwright package in AUR for example uses pip:
https://aur.archlinux.org/packages/python-playwright

It is a best practice of BlackArch repository, it is not a mandatory rule. So, you can create PKGBUILD by using pip, but on BlackArch best practice it is preferred to use depends= variable for specifying the needed dependencies.

Also, note that the following system packages are required for the software to work (names are for the Ubuntu versions): graphviz, libopengl0 and libmagic1.

Yes, already added in PKGBUILD.

Adding just one more info for you if it could be useful:

LinkScope could depend on PyPDF2 (as I read in requirements.txt --> Extra packages) but in https://pypi.org/project/PyPDF2/ is reported:

NOTE: The PyPDF2 project is going back to its roots. PyPDF2==3.0.X will be the last version of PyPDF2. Development will continue with [pypdf==3.1.0].

Should LinkScope to refer to pypdf instead of PyPDF2?

Adding just one more info for you if it could be useful:

LinkScope could depend on PyPDF2 (as I read in requirements.txt --> Extra packages) but in https://pypi.org/project/PyPDF2/ is reported:

NOTE: The PyPDF2 project is going back to its roots. PyPDF2==3.0.X will be the last version of PyPDF2. Development will continue with [pypdf==3.1.0].

Should LinkScope to refer to pypdf instead of PyPDF2?

Thank you for pointing this out. pypdf is not backward compatible, so we changed & revised the resolution that used the module. We just pushed the change.

In build.sh you execute the following command:

echo "" > "buildEnv/lib/python${PYTHON_VER}/site-packages/snscrape/modules/__init__.py"

so you go to delete the content of that file, and probably it could have some impacts on other processes working with snscrape. What if I compile and run LinkScope without executing this command above?

That file is ran upon initialization of snscrape, and it tries to import the modules in the same directory as it. Compilation changes the names, contents and locations of files, so the module files might not be in the directory that this script expects, and there might be python files that are not the modules that this file expects.

That line was added because building without it meant that resolutions using snscrape didn't work. Generally, things that use stuff like __path__, __file__ or __main__ are not going to work without modification.

So, if that snscrape file is made as empty, snscrape standalone continues to work correctly?

So, if that snscrape file is made as empty, snscrape standalone continues to work correctly?

The standalone package, no. This is just about the compiled version.

So, if that snscrape file is made as empty, snscrape standalone continues to work correctly?

The standalone package, no. This is just about the compiled version.

So, you mean that this change is done only on compiling phase. So, when a user, on another machine, installs the pkg, he/she does not need to edit that snscrape file, right?

So, if that snscrape file is made as empty, snscrape standalone continues to work correctly?

The standalone package, no. This is just about the compiled version.

So, you mean that this change is done only on compiling phase. So, when a user, on another machine, installs the pkg, he/she does not need to edit that snscrape file, right?

Yes. If someone just installs snscrape, they should not edit the file. The edit happens in the venv used for compiling, to make snscrape work inside linkscope resolutions after compiling.

Thank you for the answer. I'm finishing to push the last dependencies in remote repos, and on your README I see that the compiling needs the following string:

FIREFOX_VER=$(python -c "from pathlib import Path;x=Path(\"buildEnv/lib/python${PYTHON_VER}/site-packages/playwright/driver/package/.local-browsers\");print(list(x.glob(\"firefox*/firefox\"))[0].parent.name.split(\"-\")[1])")

and it goes to search in .local-browsers. This folder usually is shown only if we install playwright python libraries as standard user, not by sudo.

In particular:

  • if I install playwright by pip install playwright, I get the .local-browsers folder in $HOME/.local/lib/python3.10/site-packages/playwright/driver/package;
  • if I install it by sudo pip install playwright, I don't have .local-browsers in /usr/lib/python3.10/site-packages/playwright/driver/package.

A PKGBUILD for making packages, of course cannot access to home folders because it cannot know the username of a user inside its rules. So, in PKGBUILD I cannot use that command above, and one possibility is to write the current Firefox version directly in the PKGBUILD as FIREFOX_VER=1369, even if this choice is not good because each user can have a different version... The only good solution is to find a way to find the browser version by accessing files that are not inside the $HOME folder.

Furthermore, it refers to Firefox browser. So my questions are:

  • Firefox must be mandatory installed?
  • If I using another browser, I must change that command above?
  • What is the purpose to have the Firefox version? If I don't implement it, what could happen? I see it is used as --include-data-dir="buildEnv/lib/python${PYTHON_VER}/site-packages/playwright/driver/package/.local-browsers/firefox-${FIREFOX_VER}/firefox=playwright/driver/package/.local-browsers/firefox-${FIREFOX_VER}/firefox". What does this string?
  • It is needed only for the compilation or also for LinkScope working?
  • Let's guess my Firefox updates from 1369 to 1370. I need to recompile and reinstall LinkScope?
  • Since the FIREFOX_VER "issue" (for PKGBUILD) reported above should be relevant only at compilation time, if I compile it by giving directly the folder of my home, with my Firefox version, if another user installs the prepared package, and he/she has a different Firefox version (or another browser), LinkScope functionalities are impacted?

Thank you for the answer. I'm finishing to push the last dependencies in remote repos, and on your README I see that the compiling needs the following string:

FIREFOX_VER=$(python -c "from pathlib import Path;x=Path(\"buildEnv/lib/python${PYTHON_VER}/site-packages/playwright/driver/package/.local-browsers\");print(list(x.glob(\"firefox*/firefox\"))[0].parent.name.split(\"-\")[1])")

and it goes to search in .local-browsers. This folder usually is shown only if we install playwright python libraries as standard user, not by sudo.

In particular:

* if I install playwright by `pip install playwright`, I get the `.local-browsers` folder in `$HOME/.local/lib/python3.10/site-packages/playwright/driver/package`;

* if I install it by `sudo pip install playwright`, I don't have `.local-browsers` in `/usr/lib/python3.10/site-packages/playwright/driver/package`.

A PKGBUILD for making packages, of course cannot access to home folders because it cannot know the username of a user inside its rules. So, in PKGBUILD I cannot use that command above, and one possibility is to write the current Firefox version directly in the PKGBUILD as FIREFOX_VER=1369, even if this choice is not good because each user can have a different version... The only good solution is to find a way to find the browser version by accessing files that are not inside the $HOME folder.

Furthermore, it refers to Firefox browser. So my questions are:

* Firefox must be mandatory installed?

* If I using another browser, I must change that command above?

* What is the purpose to have the Firefox version? If I don't implement it, what could happen? I see it is used as `--include-data-dir="buildEnv/lib/python${PYTHON_VER}/site-packages/playwright/driver/package/.local-browsers/firefox-${FIREFOX_VER}/firefox=playwright/driver/package/.local-browsers/firefox-${FIREFOX_VER}/firefox"`. What does this string?

* It is needed only for the compilation or also for LinkScope working?

* Let's guess my Firefox updates from 1369 to 1370. I need to recompile and reinstall LinkScope?

* Since the FIREFOX_VER "issue" (for PKGBUILD) reported above should be relevant only at compilation time, if I compile it by giving directly the folder of my home, with my Firefox version, if another user installs the prepared package, and he/she has a different Firefox version (or another browser), LinkScope functionalities are impacted?

No browser has to be installed; part of the installation is doing playwright install so that the playwright library specific browsers are installed.

The browsers are downloaded (ordinarily, at least) in the building venv (i.e. buildEnv), and then included into the final package. The reason that the string is there is because the path to the Firefox binaries includes the current version of the playwright Firefox binary, and we get that dynamically using code.

The Firefox browser is used for resolutions, importing tabs, downloading websites etc. - it's a core part of the application.

It's not necessary to update LinkScope whenever the playwright Firefox binary updates. The playwright Firefox binary lags behind the current Firefox version (at least temporarily, for new releases) anyways.

The version of whatever browser(s) the user has installed on their machine does not matter, because we use browser binaries from playwright for most of the browser-related functionality. The only functionality that cares about installed browsers is importing browser tabs from open browser windows, but the logic for that should have the user covered, as long as they have browsers installed in the default locations (i.e. the path to the browser binaries are the same as they would be on Ubuntu).

Has the issue been resolved?

Has the issue been resolved?

Talking with Playwright devs, they said that Playwright is not supported on Arch environment. However, I finished to build all the needed dependencies for LinkScope. I need to test its Arch package and check if it works.

I'm assuming that the blocker for this is not anything we can help with at this stage?
Please confirm so that we know to close the issue.

Hello, I write here the PKGBUILD that could be useful for creating a package for Arch world in case you would like to create a package for Arch Linux:

pkgname=linkscope
_pkgname=LinkScope_Client
pkgver=v1.4.0.r23.gf535f5f
pkgrel=1
pkgdesc='Perform online investigations by representing information as discrete pieces of data, called Entities.'
arch=('any')
groups=('blackarch' 'blackarch-forensic' 'blackarch-recon')
url='https://github.com/AccentuSoft/LinkScope_Client'
license=('GPL3')
depends=('graphviz' 'holehe' 'libglvnd' 'pyside6' 'python' 'python-aiohttp' 'python-aiohttp-socks' 'python-aiosmtplib' 'python-beautifulsoup4' 'python-bs4' 'python-cchardet' 'python-cryptography' 'python-dateutil' 'python-defusedxml' 'python-dnspython' 'python-docker' 'python-docx2python' 'python-email-validator' 'python-exif' 'python-folium' 'python-httpx' 'python-ipwhois' 'python-jellyfish' 'python-lxml' 'python-lz4' 'python-magic' 'python-msgpack' 'python-name-that-hash' 'python-networkx' 'python-odfpy' 'python-openpyxl' 'python-pandas' 'python-pillow' 'python-playwright' 'python-pycountry' 'python-pydot' 'python-pypdf' 'python-python-wappalyzer' 'python-pytz' 'python-reportlab' 'python-requests' 'python-requests-futures' 'python-requests-html' 'python-shodan' 'python-social-analyzer' 'python-svglib' 'python-tldextract' 'python-tweepy' 'python-urllib3' 'python-vtapi3' 'python-xlrd' 'python-xmltodict' 'qt6-charts' 'qt6-svg' 'qt6-webengine' 'snscrape')
makedepends=('git' 'nuitka' 'patchelf' 'python-ordered-set' 'python-pip' 'python-wheel')
source=("git+https://github.com/AccentuSoft/$_pkgname.git")
sha512sums=('SKIP')

pkgver() {
  cd $_pkgname

  git describe --long --tags | sed 's/\([^-]*-g\)/r\1/;s/-/./g'
}

prepare() {
  cd $_pkgname

  PYTHON_VER=3.10
  echo "" | sudo tee "/usr/lib/python${PYTHON_VER}/site-packages/snscrape/modules/__init__.py"
  PLAYWRIGHT_BROWSERS_PATH=0 python${PYTHON_VER} -m playwright install
  LOCAL_DIR="<path-to-lib>"
  FIREFOX_VER=<type-needed-firefox-version>
}

build() {
  cd $_pkgname

  python${PYTHON_VER} -m nuitka --follow-imports --standalone --noinclude-pytest-mode=nofollow \
--noinclude-setuptools-mode=nofollow --noinclude-custom-mode=setuptools:error --noinclude-IPython-mode=nofollow \
--enable-plugin=pyside6 --nofollow-import-to=tkinter --enable-plugin=trio --assume-yes-for-downloads --remove-output \
--disable-console --include-data-dir="Resources=Resources" --include-plugin-directory=Modules --include-package=Core \
--include-data-dir="Core/Entities=Core/Entities" --include-data-dir="Core/Resolutions/Core=Core/Resolutions/Core" \
--include-data-dir="Modules=Modules" --warn-unusual-code --show-modules --include-data-files="Icon.ico=Icon.ico" \
--linux-icon="Icon.ico" \
--include-package-data=playwright \
--include-package-data=folium --include-package-data=branca \
--include-package=social-analyzer --include-package-data=langdetect --include-package-data=tld \
--include-package=Wappalyzer --include-package-data=Wappalyzer \
--include-package=dns \
--include-package=holehe.modules \
--include-package=snscrape \
--include-package=docker --include-package-data=docker \
--include-package-data=pycountry \
--include-package=jellyfish \
--include-package=ipwhois \
--include-package=tweepy \
--include-data-dir="$LOCAL_DIR/lib/python${PYTHON_VER}/site-packages/playwright/driver/package/.local-browsers/firefox-${FIREFOX_VER}/firefox=playwright/driver/package/.local-browsers/firefox-${FIREFOX_VER}/firefox" \
LinkScope.py
}

package() {
  cd $_pkgname

  install -dm 755 "$pkgdir/usr/bin"
  install -Dm 644 requirements.txt "$pkgdir/usr/share/$pkgname/requirements.txt"
  install -Dm 644 -t "$pkgdir/usr/share/doc/$pkgname/" *.md

  rm -rf LICENSE *.md .gitignore

  cp -a * "$pkgdir/usr/share/$pkgname/"

  cat > "$pkgdir/usr/bin/$pkgname" << EOF
#!/bin/sh
exec python /usr/share/$pkgname/LinkScope.py "\$@"
EOF

  chmod a+x "$pkgdir/usr/bin/$pkgname"
}

Currently, some dependencies are stored in BlackArch repository, so it is needed to add the BlackArch repository by running the strap.sh script.

Note that on PKGBUILD above you should replace FIREFOX_VER directly with the firefox version needed otherwise it does not work.