Discuss / Python / 参考前面大神的作业,重新自己整理了一遍,

参考前面大神的作业,重新自己整理了一遍,

Topic source

邹lh

#1 Created at ... [Delete] [Delete and Lock User]
from html.parser import HTMLParser
from urllib import  request
import re

class MyHtmlParser(HTMLParser):

    da = ''
    flag = 0
    res = []

    #开始标签
    def handle_starttag(self, tag, attrs):
        if tag =='ul':
            print(attrs[0])
            for att in attrs:
                if 'list-recent-events menu' in att[1]:
                    self.flag = 1
        if tag == 'time'and self.flag == 1:
            self.da = 'time'

        if tag == 'span'and self.flag == 1:
            self.da = 'address'

        if tag == 'a'and self.flag == 1:
            self.da = 'title'


    def handle_endtag(self, tag):
       if tag == 'ul'and self.flag == 1:
          self.flag=0

    def handle_data(self, data):
        if self.flag ==1 and self.da != '':
            if self.da =='title':
                self.res.append({'title': data, 'time': '', 'address': ''})
            else:
                self.res[len(self.res)-1][self.da] = data
            self.da = ''

Parser = MyHtmlParser()

with request.urlopen('https://www.python.org/events/python-events/')as f:
    data = f.read().decode('utf-8')

Parser.feed(data)
for item in Parser.res:
    print('---------------------')
    for k,v in item.items():
        print('%s: %s'%(k,v))

for att in attrs: if 'list-recent-events menu' in att[1]: 请问为什么是att[1]?attrs有什么含义吗?

attrs是一个列表,其中每个元素是元组类型,所以用for att in attrs 会返回一个元组给att,att[0]表示的是属性名,att[1]表示的才是属性值,你自己print(attrs)一下,很快就能看懂了


  • 1

Reply