Python之网站地图-八零岁月
记录所见
分享所感

Python之网站地图

from urllib.request import urlopen
from bs4 import BeautifulSoup

pages = set()
def getLinks(pageUrl):
    global pages
    html = urlopen("https://www.80sy.com")
    bsObj = BeautifulSoup(html, features="html.parser")
    for link in bsObj.findAll("a"):
        if 'href' in link.attrs:
            if '80sy' in link.attrs['href']:
                pages.add(link.attrs['href'])
                if link.attrs['href'] not in pages:
                    newPage = link.attrs['href']
                    print(newPage)
                    pages.add(newPage)
                    getLinks(newPage)
getLinks("")
for i in pages:
    print(i)
print(len(pages))

文章转载请说明出处:八零岁月 » Python之网站地图

分享到:更多 ()

吐槽集中营 抢沙发

评论前必须登录!