EN | KR
WPAA is a comprehensive tool for analyzing and visualizing HTML architecture of web pages. It provides tree-structured visualization for both static and dynamic web pages, making DOM structure analysis intuitive and efficient.
- π³ Tree Visualization: Hierarchical representation of HTML structure
- π Change Detection: Automatic detection and comparison of webpage structure changes
- π Web Interface: Intuitive web UI for easy analysis
- π Multiple Export Formats: Support for SVG, interactive HTML, CSV, and Markdown
- β‘ Performance Optimization: Asynchronous processing, caching, and memory optimization
- π§ Static/Dynamic Analysis: Support for JavaScript-rendered web pages
- π― Custom Filtering: CSS selector and attribute filtering capabilities
- π Performance Monitoring: Track execution time, memory usage, and cache efficiency
Python 3.7+
pip install -r requirements.txt
Required External Programs:
-
Graphviz Installation:
- Download installer from official website
- Add bin directory to system PATH (e.g.,
C:\Program Files\Graphviz\bin)
-
ChromeDriver Installation (for dynamic page analysis):
- Download from ChromeDriver website
- Save to appropriate location and update path in code:
service = Service('your/path/to/chromedriver')
python run_web_interface.pyAccess http://127.0.0.1:5000 in your browser for intuitive web-based analysis.
Web Interface Features:
- π± User-friendly web UI
- π Real-time analysis progress display
- π Download various output formats
- π Change comparison functionality
- π Performance statistics
Basic usage:
python wpaa_run.py --urls https://example.comAdvanced options:
python wpaa_run.py --urls https://example.com https://test.com \
--exclude script style \
--include-attrs class href \
--custom-filter "div.content" \
--max-depth 3 \
--export-html \
--compare-changes \
--show-performance--urls: List of webpage URLs to analyze (required)--use-selenium: Use Selenium for dynamic content fetching--exclude: HTML tags to exclude (e.g., script style)--include-attrs: HTML attributes to include in nodes (e.g., class id href)--custom-filter: Filter specific elements using CSS selectors (e.g., div.classname)--max-depth: Limit maximum tree depth--include-text: Include text content--output: Choose output format (text or json)--visualize: Visualize tree structure as PNG file--export-svg: Export to SVG format--export-html: Export to interactive HTML--export-csv: Export to CSV format--export-markdown: Export to Markdown format--compare-changes: Compare with previous version--show-performance: Display performance report--optimize-tree: Optimize tree structure
python wpaa_run.py --urls https://news.ycombinator.compython wpaa_run.py --urls https://www.example.com --use-seleniumpython wpaa_run.py --urls https://www.example.com --exclude script style meta link --visualizepython wpaa_run.py --urls https://www.example.com --include-attrs class id href --output jsonpython wpaa_run.py --urls https://www.example.com --export-html --compare-changes --show-performancepython wpaa_run.py --urls https://www.example.com --export-svg --export-csv --export-markdown- Caching: Performance optimization for repeated URL analysis
- Asynchronous Processing: Concurrent analysis of multiple URLs
- Error Handling: Consistent error handling through decorators
- Tree Structure: HTML DOM visualization using anytree library
MK-II_2523: Feature improvements completed
- Tree comparison functionality for detecting site changes
- Web interface implementation
- Support for more output formats (SVG, interactive HTML)
- Performance optimization and memory usage improvements