Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Fetch with limited concurrency #47

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

feat: Fetch with limited concurrency #47

wants to merge 9 commits into from

Conversation

YDX-2147483647
Copy link
Owner

@YDX-2147483647 YDX-2147483647 commented Oct 9, 2023

  • fetch_all_sources时,不再一下子发起所有请求,而一点一点发起,限制并发数量上限。
  • 配置文件新增fetch: { concurrency: number, sleep: number }

Resolves #45

- `fetch_all_sources`时,不再一下子发起所有请求,而一点一点发起,限制并发数量上限。
- 配置文件新增`fetch_concurrency: number`。

Resolves #45
@YDX-2147483647
Copy link
Owner Author

YDX-2147483647 commented Oct 9, 2023

服务器上还是不太行呢。

$ npm run update-server

> [email protected] update-server
> node dist/examples/server-cli.js

info: Signed in successfully.(proxy)
抓取通知 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 0% | 0/48 | 已用0s,预计还需0s
info: 发现48个通知来源。(cli)
抓取通知 ████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 10% | 5/48 | 已用2m11s,预计还需9s
error: 访问“特立”(ETIMEDOUT)超时,可
抓取通知 █████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 12% | 6/48 | 已用2m11s,预计还需15m20s
error: 访问“马克思”(ETIMEDOUT)超
error: 访问“知艺”(ETIMEDOUT)超时,可能因为访问太频繁。将忽略。(cli)
error: 访问“留学生”(ETIMEDOUT)超时,可能因为访问太频繁。将忽略。(cli)
error: 访问“北京书院”(ETIMEDOUT)超时,可能因为访问太频繁。将忽略。(cli)
抓取通知 ███████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 27% | 13/48 | 已用2m33s,预计还需8m60s
warn: 未从“创新创业”获取到任何通知
抓取通知 ███████████████░░░░░░░░░░░░░░░░░░░░░░░░░ 37% | 18/48 | 已用4m22s,预计还需3m40s
error: 访问“教学中心”(ETIMEDOUT)
抓取通知 ████████████████░░░░░░░░░░░░░░░░░░░░░░░░ 39% | 19/48 | 已用4m22s,预计还需7m5s
error: 访问“计算机”(ETIMEDOUT)超时
error: 访问“教务部”(ETIMEDOUT)超时,可能因为访问太频繁。将忽略。(cli)
抓取通知 ██████████████████░░░░░░░░░░░░░░░░░░░░░░ 43% | 21/48 | 已用4m45s,预计还需6m10s
error: 访问“光电”(ETIMEDOUT)超时
抓取通知 ███████████████████░░░░░░░░░░░░░░░░░░░░░ 47% | 23/48 | 已用5m28s,预计还需6m30s
error: 访问“网信”(ETIMEDOUT)超时
抓取通知 ██████████████████████░░░░░░░░░░░░░░░░░░ 54% | 26/48 | 已用6m56s,预计还需7m5s
error: 访问“校医院”(ETIMEDOUT)超时
抓取通知 ███████████████████████░░░░░░░░░░░░░░░░░ 58% | 28/48 | 已用7m4s,预计还需5m40s
error: 访问“国际交流”(ETIMEDOUT)超
抓取通知 ████████████████████████░░░░░░░░░░░░░░░░ 60% | 29/48 | 已用7m39s,预计还需5m40s
error: 访问“网安”(ETIMEDOUT)超时
抓取通知 █████████████████████████░░░░░░░░░░░░░░░ 62% | 30/48 | 已用7m39s,预计还需6m35s
error: 访问“信电”(ETIMEDOUT)超时
error: 访问“新生”(ETIMEDOUT)超时,可能因为访问太频繁。将忽略。(cli)
抓取通知 ███████████████████████████░░░░░░░░░░░░░ 66% | 32/48 | 已用9m5s,预计还需4m55s

串行也不行:

$ npm run update-server

> [email protected] update-server
> node dist/examples/server-cli.js

info: Signed in successfully.(proxy)
抓取通知 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 0% | 0/48 | 已用0s,预计还需0s
info: 发现48个通知来源。(cli)
抓取通知 ███░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 8% | 4/48 | 已用2m57s,预计还需8m20s
error: 访问“经管”(ETIMEDOUT)超时,
抓取通知 ████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 20% | 10/48 | 已用5m11s,预计还需16m55s

下一步计划

fetch_each里加sleep

@YDX-2147483647
Copy link
Owner Author

YDX-2147483647 commented Oct 26, 2023

等十秒还是有两个来源超时。二十秒也是。

@@ -1,3 +1,4 @@
+import { setTimeout } from 'node:timers/promises'
 import pMap from 'p-map'

 import type { HookCollectionType } from '../hooks_type.js'
@@ -28,6 +29,8 @@ export async function fetch_all_sources ({
             // First create a non-hook version.
             async function fetch_each ({ source }: { source: Source }): Promise<{ notices: Notice[] }> {
                 const notices = await source.fetch_notice({ _hook })
+                // Sleep for 10 seconds
+                await setTimeout(10000)
                 return { notices }
             }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FetchError: ETIMEDOUT
1 participant