Always be Killed #77

m4ra7h0n · 2024-07-30T22:51:13Z

我想爬取大概2000多个url，结果我需要写一个获取没爬取的url的文件，因为gospider经常被kill

import os

domains_ok = []
for filename in os.listdir('gs_output'):
    domain_ok = filename.replace('_', '.')
    domains_ok.append(domain_ok)

domains_not_ok = set()
with open('sub_alive.txt', 'r') as f:
    urls = f.read().split()
    for url in urls:
        flag = True
        for domain_ok in domains_ok:
            if domain_ok in url:
                flag = False
                break
        if flag:
            domains_not_ok.add(url)
            

with open('gs_continue.txt', 'w') as f:
    f.write('\n'.join(domains_not_ok))

能改改吗？别总被系统kill，我这样重复运行大概能有10次了，这2000个url还没爬完.

然后我通过使用systemd来解决这个问题，自动restart，然后更新未爬取的域名。期望代码赶快更新

[Unit]
Description=My Go Application

[Service]
# 指定你的 Go 应用程序可执行文件路径
ExecStart=/usr/lib/golang/bin/gospider -S /root/assets/dell/gs_continue.txt -o /root/assets/dell/gs_output -c 4 -d 2 --other-source --subs --sitemap --robots

# 停止时执行的操作
ExecStopPost=/usr/local/python3/bin/python3 /root/tools/gs_continue.py /root/assets/dell

# KillMode 控制了 systemd 如何发送信号来停止服务。mixed 模式意味着当服务需要停止时，systemd 将首先尝试使用 SIGTERM 信号来优雅地停止服务。如果服务在一定时间内没有响应，systemd 将使用更强烈的信号，如 SIGKILL，来强制终止服务。
KillMode=mixed

# 这个选项指定了 systemd 在尝试停止服务时最初使用的信号类型。SIGINT 通常与用户通过键盘中断程序（如按 Ctrl+C）所发出的信号相同。
KillSignal=SIGINT

# 可以创建的最大任务数量（通常是进程）
TasksMax=infinity

# 内存限制（例如，限制为 512MB）
MemoryMax=7.5G

# CPU 时间限制（例如，限制为 50%）
CPUQuota=95%

# 当达到资源限制时，允许的超时时间
TimeoutStopSec=10

# 失败时重启服务
Restart=on-failure
RestartSec=10s

[Install]
WantedBy=multi-user.target

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always be Killed #77

Always be Killed #77

m4ra7h0n commented Jul 30, 2024 •

edited

Loading

Always be Killed #77

Always be Killed #77

Comments

m4ra7h0n commented Jul 30, 2024 • edited Loading

m4ra7h0n commented Jul 30, 2024 •

edited

Loading