roceys

roceys

Geek Repo

Company:@google

Location:U.S.A

Home Page:https://roceys.org

Github PK Tool:Github PK Tool

roceys's repositories

ConvertImg4Wechat

自动转换微信加密图片.dat为jpg|png|gif Automatically convert WeChat encrypted pictures .dat to jpg | png | gif

Language:PythonLicense:Apache-2.0Stargazers:6Issues:1Issues:1

AutoRclone

AutoRclone: rclone copy/move/sync (automatically) with thousands of service accounts

Stargazers:0Issues:0Issues:0

AutoSeed

全自动发种姬 [流程图 https://www.processon.com/view/link/5c088855e4b0ca4b40c93a49 ]

License:GPL-3.0Stargazers:0Issues:0Issues:0

baiyue_onekey

佰阅部落一键脚本合集工具箱,集合25+优质开源项目,一步到位,全程中文交互提示,不懂代码也可以轻松搭建很多程序

License:MITStargazers:0Issues:0Issues:0

Bar_Chart_Race

Basic Bar Chart Race Codes in Python

Language:PythonStargazers:0Issues:0Issues:0

bilibili-img-uploader

又更新了一个版本。Chrome Extension bilibili img uploader。Chrome插件,哔哩哔哩图床上传

Stargazers:0Issues:0Issues:0

bilibili-live-tools

python实现的bilibili直播助手

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

bilibiliupload

Upload video to bilibili under command-line interface

License:MITStargazers:0Issues:0Issues:0

BiliDrive

☁️ 哔哩哔哩云,支持任意文件的全速上传与下载

Language:PythonLicense:NOASSERTIONStargazers:0Issues:1Issues:0
Stargazers:0Issues:0Issues:0

calendar-import-export

Import/export your Android calendars as ics files without using the Google cloud

Language:JavaLicense:GPL-3.0Stargazers:0Issues:0Issues:0

CVX

计算机视觉:阅读 & 写作 & 学习

Language:Jupyter NotebookLicense:MITStargazers:0Issues:0Issues:0

excesizepy

python可以做什么呢? 1、桌面应用 常用做界面的库有wx 到店综合商户与运营技术部 > python学习总结与分享 > 式工.png 2、游戏应用 不到300行的代码完成一个简易的飞机大战游戏: 到店综合商户与运营技术部 > python学习总结与分享 > WX20171108-175740@2x.png 3、web应用 作为一个脚本语言web应用是必须支持的,常见的web框架有django,flask 4、server 当然不在话下,作为一个月活超过5亿的app instagram 就是用python作server支撑的 到店综合商户与运营技术部 > python学习总结与分享 > server.png 5、爬虫 是最常见而且十分适合初学者学习的一个功能。 说的多不如动手写的多! 现在手把手来分享下一个简易的爬虫程序。 粟子1:获取某天气网站的天气数据 天气和空气质量的网站:http://www.pm25.com/shanghai.html 到店综合商户与运营技术部 > python学习总结与分享 > aqi.png site = 'http://www.pm25.com/shanghai.html' html = urllib2.urlopen(site) soup = BeautifulSoup(html, "lxml") 到店综合商户与运营技术部 > python学习总结与分享 > list.png quality = soup.find("span",{"class","bi_aqiarea_wuran"}) 到店综合商户与运营技术部 > python学习总结与分享 > treat.png city = soup.find(class_='bi_loaction_city') 到店综合商户与运营技术部 > python学习总结与分享 > twolev.png aqi = soup.find("a",{"class","bi_aqiarea_num"}) 到店综合商户与运营技术部 > python学习总结与分享 > othertwo.png desc = soup.find("div",{"class","bi_aqiarea_bottom"}) 粟子2:获取csdn xx用户首页的所有文章标题 xx用户的首页地址:http://blog.csdn.net/u014351782 soup.select('div #article_list .list_item div.article_title span.link_title')[0].text for item in soup.select('div #article_list .list_item div.article_title span.link_title'): print item.text 上面的还是比较简单的!下面来点有难度的粟子 粟子3:豆瓣top250电影抓取 电影网址:https://movie.douban.com/top250?start=25&filter= 到店综合商户与运营技术部 > python学习总结与分享 > WX20171108-191525@2x.png 网页显示的html源码: 到店综合商户与运营技术部 > python学习总结与分享 > WX20171108-191652@2x.png 先观查html源码发现获取需要的字段的方法如下: i = 1 print soup.select('ol.grid_view em.')[1*i].text #排名 print soup.select('ol.grid_view div.item div.pic img')[i].attrs['alt'] #标题 print str(soup.select('ol.grid_view div.info div.bd p.')[1*i].text.encode("utf-8")).lstrip().rstrip() #基本描述 print soup.select('ol.grid_view div.info div.bd span.rating_num')[1*i].text #评分 print soup.select('ol.grid_view div.info div.bd div.star span')[4*i +3].text #评论数 找到了后然后开始写完整代码。 首先需要个专门来存每部电影基本信息的实体类吧!MovieInfo class MovieInfo: def __init__(self,rank,title,desc,stars,commentcount): self.rank = rank self.title = title self.desc = desc self.stars = stars self.commentcount = commentcount 然后再写个爬取电影的类。 有三个方法:构造方法__init__(self), 获取每页soup的方法getPageData(self), 获取电影的方法getMovie(self), 写入文件的方法writeToFile(self) 执行方法main(self)。 # coding=utf-8 import urllib2 from bs4 import BeautifulSoup class MovieInfo: def __init__(self,rank,title,desc,stars,commentcount): self.rank = rank self.title = title self.desc = desc self.stars = stars self.commentcount = commentcount class Movie250: def __init__(self): self.start = 0 self.param = '&filter=&type=' self.movieList = [] self.pageNum = 0 self.filePath = '/Users/yuzhuo/myfile/fetchpic/movie/dbtop250.csv' # self.filePath = 'dbtop250.csv' def getPageData(self): try: site = 'https://movie.douban.com/top250?start='+str(self.start)+self.param html = urllib2.urlopen(site) soup = BeautifulSoup(html, "lxml") self.pageNum = (self.start + 25)/25 print "抓取第" + str(self.pageNum) + "页数据" self.start += 25 return soup except urllib2.URLError, e: if hasattr(e, 'reason'): print e.reason def getMovie(self): while self.start <=225: movieData = self.getPageData() for i in range(0,25,1): rank = movieData.select('ol.grid_view em.')[1 * i].text.encode("utf-8") title = movieData.select('ol.grid_view div.item div.pic img')[i].attrs['alt'].encode("utf-8") desc = str(movieData.select('ol.grid_view div.info div.bd p.')[1 * i].text.encode("utf-8")).replace("\n", "").lstrip().rstrip() stars = movieData.select('ol.grid_view div.info div.bd span.rating_num')[1 * i].text.encode("utf-8") commentcount = movieData.select('ol.grid_view div.info div.bd div.star span')[4 * i + 3].text.encode("utf-8") self.movieList.append(MovieInfo(rank, title, desc, stars, commentcount)) return self.movieList def writeToFile(self): fo = open(self.filePath, "wb+") fo.write("排名" + ',') fo.write("电影名" + ',') fo.write("描述" + ',') fo.write("评分" + ',') fo.write("总评论数" + '\n') try: for movieInfo in self.movieList: fo.write(movieInfo.rank + ',') fo.write(movieInfo.title + ',') fo.write(movieInfo.desc + ',') fo.write(movieInfo.stars + ',') fo.write(movieInfo.commentcount + '\n') print '文件写入成功!' finally: fo.close() def main(self): self.getMovie() self.writeToFile() if __name__ == '__main__': dbmovie = Movie250() dbmovie.main() 最后看下爬取的结果: 到店综合商户与运营技术部 > python学习总结与分享 > WX20171108-193014@2x.png 粟子4:抓取xx网站的图片 http://www.nphoto.net/news/2012-02/20/b143d88f8f937f69.shtml 单线程下载图片 for imgurl in self.imageList: self.download(imgurl) 多线程下载图片: 引入线程池方式threadpool pool = threadpool.ThreadPool(10) requests = threadpool.makeRequests(self.download, self.imageList) [pool.putRequest(req) for req in requests] pool.wait() 附录: bs4 安装 到店综合商户与运营技术部 > python学习总结与分享 > DX-20171109@2x.png 推荐几个比较好的学习文档和网站: 廖雪峰的学习博客:https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000 python的api中文文档http://www.runoob.com/python/python-tutorial.html bs4 官方文档说明:http://beautifulsoup.readthedocs.io/zh_CN/latest/

Language:PythonStargazers:0Issues:0Issues:0

FeHelper

😍FeHelper--Web前端助手(Awesome!Chrome & Firefox Extension, All in one Toolbox!)

Language:JavaScriptLicense:MITStargazers:0Issues:0Issues:0

Jackett

API Support for your favorite torrent trackers.

Language:C#License:GPL-2.0Stargazers:0Issues:0Issues:0

lede

Lean's OpenWrt source

Language:CLicense:GPL-2.0Stargazers:0Issues:0Issues:0

lx-music-desktop

一个基于 electron 的音乐软件

Language:JavaScriptLicense:Apache-2.0Stargazers:0Issues:0Issues:0

OpenWrt-CI

OpenWrt CI 在线集成自动编译环境

License:MITStargazers:0Issues:0Issues:0
Language:PythonLicense:MITStargazers:0Issues:1Issues:0

PT-Plugin-Plus

PT 助手 Plus,为 Google Chrome 和 Firefox 浏览器插件(Web Extensions),主要用于辅助下载 PT 站的种子。

Language:VueLicense:MITStargazers:0Issues:1Issues:0

remove-bg

A Python API wrapper for removing background using remove.bg's API

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

ruTorrent

Yet another web front-end for rTorrent

License:NOASSERTIONStargazers:0Issues:0Issues:0

SourceCodeOfBook

《Python爬虫开发 从入门到实战》配套源代码。

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

TaiChi-Magisk

A Magisk module providing magic power over TaiChi.

Stargazers:0Issues:0Issues:0

termux-app

Android terminal and Linux environment - app repository.

Language:JavaLicense:NOASSERTIONStargazers:0Issues:0Issues:0

toml

Tom's Obvious, Minimal Language

License:MITStargazers:0Issues:0Issues:0

watermark-webinars-janus

Веб-система для организации трансляций html5-вебинаров (не флеш); состоит из yii2 + js/jq и ядра системы на janus-gateway (см. README)

License:NOASSERTIONStargazers:0Issues:0Issues:0

we-media

本项目包含使用youtube data api自动上传video,使用爬虫自动下载素材,视频合成、添加水印处理等

Language:HTMLStargazers:0Issues:0Issues:0

wenyan-lang

文言文編程語言 A programming language for the ancient Chinese.

Language:JavaScriptLicense:MITStargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0