python 逐条读取网址xpath采集分析数据方案

from lxml import etree
from bs4 import BeautifulSoup
import requests

def readalight(address):
    html = requests.get(address).content.decode('utf-8')
    ##获取网页代码
    dom_tree = etree.HTML(html)
    ###XPath匹配
    links = dom_tree.xpath('//div[@id="mainCnt"]/p/text()')
    summary = dom_tree.xpath('//p[@class="summary"]/text()')
    for i in summary:
        print(i)
    for i in links:
        print("<p>"+i+"</p>")
    return

#-*- coding: UTF-8 -*- 
f = open('url.txt','r', encoding='UTF-8')
line = f.readline()
while line:
    #print line,面跟 ',' 将忽略换行符  
    print(line, end = '')
    readalight(line)
    line = f.readline()
f.close()

以上为 python 逐条读取网址，xpath采集数据方案

What's Hot

机械模具加工公司网站设计案例

快速原型公司案例

陶瓷加工网站案例

python 逐条读取网址xpath采集分析数据方案

python自动发布文章到wordpress

Python发布WordPress文章 – md 文件

【自动发文】python实现WordPress文章发布（三）：批量发布文章

Web Scraper——轻量数据爬取利器

Leave A Reply Cancel Reply

What's Hot

机械模具加工公司网站设计案例

快速原型公司案例

陶瓷加工网站案例

python 逐条读取网址xpath采集分析数据方案

Related Posts

python自动发布文章到wordpress

Python发布WordPress文章 – md 文件

【自动发文】python实现WordPress文章发布（三）：批量发布文章

Web Scraper——轻量数据爬取利器

Leave A Reply Cancel Reply