We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
提供一个爬取图片的例子,批量爬取
The text was updated successfully, but these errors were encountered:
@pkwenda 图片批量爬取可参考HuabanImgDemo。另,对于图片视频等的爬取,建议应和爬取内容的方式尽量一致。爬取工作只负责寻找图片或视频的url,找到后直接交给专门的文件下载处理器去处理。这样思路也比较清晰。
Sorry, something went wrong.
我认真想了想你的想法,觉得可以。
URLConnection urlConnection = destUrl.openConnection(); InputStream inputStream = urlConnection.getInputStream();
关于用URL来获取的流 没有 content-type,无法自动匹配mineType后缀,达到自动化的目的.并且与前面的Setting互驳,我这边重构Task并提供一个依托HttpClient 的 newRequest函数方便你的专门的文件下载处理器内部正确的利用URL产生请求.
URL
content-type
mineType
HttpClient
newRequest
专门的文件下载处理器
if (destPath.endsWith("/")) { destPath = destPath.substring(0, destPath.length() - 1); }
byte[] buffer = new byte[1024];
这部分,我来做,我会在不影响你的功能基础上抽离出来,我觉得可以抽离的抽离出来,因为可能很多地方要用。
以后关于下载问题我们去 #33 讨论。包括今后以后所有的问题,我觉得我们尽量在github讨论,多人协作,前期避免一个功能出现岔路。 @wangtonghe @trto1987 @biezhi cheer up 😄
@pkwenda 好的。刚看到这个。昨天我也发现url直接下载的方式不能直接获取文件类型,后用判断流开始字节的部分暂时获得了文件类型。类似这样的处理。不过感觉不太优雅。你既提供,那就直接用你的了。再讨论。
nonacosa
No branches or pull requests
提供一个爬取图片的例子,批量爬取
The text was updated successfully, but these errors were encountered: