Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

提供一个爬取图片的例子 #29

Closed
nonacosa opened this issue May 11, 2017 · 3 comments
Closed

提供一个爬取图片的例子 #29

nonacosa opened this issue May 11, 2017 · 3 comments
Assignees

Comments

@nonacosa
Copy link
Collaborator

提供一个爬取图片的例子,批量爬取

@nonacosa nonacosa self-assigned this May 11, 2017
@wangtonghe
Copy link
Member

@pkwenda 图片批量爬取可参考HuabanImgDemo。另,对于图片视频等的爬取,建议应和爬取内容的方式尽量一致。爬取工作只负责寻找图片或视频的url,找到后直接交给专门的文件下载处理器去处理。这样思路也比较清晰。

@nonacosa
Copy link
Collaborator Author

nonacosa commented May 12, 2017

我认真想了想你的想法,觉得可以。

URLConnection urlConnection = destUrl.openConnection();
            InputStream inputStream = urlConnection.getInputStream();

关于用URL来获取的流 没有 content-type,无法自动匹配mineType后缀,达到自动化的目的.并且与前面的Setting互驳,我这边重构Task并提供一个依托HttpClientnewRequest函数方便你的专门的文件下载处理器内部正确的利用URL产生请求.

  • 自动获取mineType代码我写在:这里
if (destPath.endsWith("/")) {
                destPath = destPath.substring(0, destPath.length() - 1);
            }
 byte[] buffer = new byte[1024];

这部分,我来做,我会在不影响你的功能基础上抽离出来,我觉得可以抽离的抽离出来,因为可能很多地方要用。

以后关于下载问题我们去 #33 讨论。包括今后以后所有的问题,我觉得我们尽量在github讨论,多人协作,前期避免一个功能出现岔路。
@wangtonghe @trto1987 @biezhi
cheer up 😄

@wangtonghe
Copy link
Member

@pkwenda 好的。刚看到这个。昨天我也发现url直接下载的方式不能直接获取文件类型,后用判断流开始字节的部分暂时获得了文件类型。类似这样的处理。不过感觉不太优雅。你既提供,那就直接用你的了。再讨论。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants