maguowei / app-crawler

crawling App by uiautomator2 & mitmproxy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

app crawler

crawling app by uiautomator2 & mitmproxy

使用 URL_SCHEMA 跳转实现抖音用户视频和视频评论的抓取

URL_SCHEMA_MAP = {
    'home': "snssdk1128://feed?refer=web",
    'user': 'snssdk1128://user/profile/{uid}?refer=web',
    'detail': 'snssdk1128://aweme/detail/{aweme_id}?refer=web',
    'challenge': 'snssdk1128://challenge/detail/{challenge_id}?refer=web',
    'music': 'snssdk1128://music/detail/{music_id}?refer=web',
    'live': 'snssdk1128://live?room_id={room_id}&user_id={user_id}&from=webview&refer=web',
    'poi":': 'snssdk1128://poi/?id={poi_id}',
    'webview': 'snssdk1128://webview?url={url}&from=webview&refer=web',
    'webview_fullscreen': 'snssdk1128://webview?url={url}&from=webview&hide_nav_bar=1&refer=web',
    'poidetail': 'snssdk1128://poi/detail?id={id}&from=webview&refer=web',
    'forward': 'snssdk1128://forward/detail/{id}',
    'billboard_word': 'snssdk1128://search/trending',
    'billboard_video': "snssdk1128://search/trending?type=1",
    'billboard_music': "snssdk1128://search/trending?type=2",
    'billboard_positive': "snssdk1128://search/trending?type=3",
    'billboard_star': "snssdk1128://search/trending?type=4",
}

依赖安装

下载 Android platform-tools 并解压获取 adb

# 列出连接的设备(设备需开启`开发者选项`)
adb devices
pipenv install
pipenv shell
uiautomator2 init

抖音安装

  • 使用豌豆荚安装旧版抖音APP(v7.5.0以下版本仍然信任用户CA证书)

weditor

使用web界面查看和定位元素

python -m weditor

mitmproxy

安装和信任证书

使用

cp .env.tpl .env
cp -r .mitmproxy ~/.mitmproxy
make run-mitmproxy

# 数据库启动
make up

# 导入测试数据
./app/tools/simple_data_import.py

# 指定设备抓取用户信息和视频列表
./dy.py crawler_users --max_num=200 --device_serial=xxxxx

# 指定设备抓取
./dy.py crawler_comments --device_serial=xxxxxxx

# 指定设备抓取用户粉丝
./dy.py crawler_follower --device_serial=72bf965

# 多设备 抓取用户粉丝
./crawler.py crawler_follower --max_num=200

# 多设备 抓取用户信息、评论
./crawler.py run

部署机器进程管理

sudo cp frp/systemd/app-crawler.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl start app-crawler.service

# 重启服务进程
sudo systemctl restart app-crawler.service

# 开机自启动
sudo systemctl enable app-crawler.service

常见问题

  1. 找不到设备
adb kill-server
adb start-server

还是不行,重启手机试试

  1. adb devices 出现 no permissions (user in plugdev group; are your udev rules wrong?)
  1. weditor 打开时出现 adbutils.errors.AdbError: device not found 更换设备会出现,需要清理 Chrome 的 LocalStorage
  1. 测试机型
  • Xiaomi Mi 6
  • Redmi Note 8

About

crawling App by uiautomator2 & mitmproxy

License:MIT License


Languages

Language:Python 99.6%Language:Makefile 0.4%