Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

第十四章知乎爬虫--查关注/被关注的朋友究竟是一维还是N维? #119

Open
yjshi2015 opened this issue Oct 30, 2019 · 0 comments

Comments

@yjshi2015
Copy link

yjshi2015 commented Oct 30, 2019

在爬虫解析模块(zhihu_com.py)逻辑中,已经获取到了当前人的关注者/被关注者,但接下来又以"关注者/被关注者"为入口,迭代查询他们的"关注者/被关注者",这样的话就是查N维的关系了,而非查当前登录人的关系了

   # 解析当前登录人的基础信息
    def parse_user_info(self,response):
           ......

    # 解析和我有关系的人
    def parse_relation(self,response):
        ......
      # 关注者/被关注者的主页
        relations_url = response.xpath("//*[@class='zh-general-list clearfix']/div/a/@href").extract()
        ......
        # !!!!!!!!!!!!!!有疑问的地方!!!!!!!!!!!!!!!!!
        # 这里重新进入关注者/被关注者的主页,并再次解析他们的关系,以此迭代,最终的结果是爬取自己的朋友,
        # 朋友的朋友,以此类推......
        # 如果只爬取自己的朋友的话,下面这个循环是不应该有的
        for url in relations_url:
            yield Request(response.urljoin(url=url),
                          meta={'cookiejar': response.meta['cookiejar']},
                          callback=self.parse_user_info,
                          errback=self.parse_err)

因此,这个demo的逻辑,是查询的N维关系.
这个是我理解的不对,还是就是要查N维关系?麻烦作者解答下 @qiyeboy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant