Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OK/NG classnet问题: #35

Open
2 tasks done
lpj0822 opened this issue Jan 11, 2021 · 31 comments
Open
2 tasks done

OK/NG classnet问题: #35

lpj0822 opened this issue Jan 11, 2021 · 31 comments
Assignees
Labels
H High Priority

Comments

@lpj0822
Copy link
Collaborator

lpj0822 commented Jan 11, 2021

  • 1.单独开一个task
  • 2.ROC曲线评价与阈值选取
@lpj0822
Copy link
Collaborator Author

lpj0822 commented Jan 24, 2021

代码已经增加

@lpj0822 lpj0822 changed the title OK/NG问题: OK/NG classnet问题: Feb 23, 2021
@vitahlin vitahlin added the M Middle Priority label Mar 14, 2021
@lpj0822
Copy link
Collaborator Author

lpj0822 commented Oct 19, 2021

测试

@lpj0822 lpj0822 added H High Priority and removed M Middle Priority labels Oct 24, 2021
@foww-0001
Copy link
Collaborator

ModuleNotFoundError: No module named 'imgaug'
Failed to start easy_ai

requirments中加入imgaug。

@foww-0001
Copy link
Collaborator

foww-0001 commented Oct 26, 2021

docker workspace中 运行报错:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/easy_data/easy_ai/easy_tools/easy_ai.py", line 12, in <module>
    from easy_tools.model_train.ai_train import EasyAiModelTrain
  File "/easy_data/easy_ai/easy_tools/model_train/ai_train.py", line 9, in <module>
    from easyai.train_task import TrainTask
  File "/easy_data/easy_ai/easyai/train_task.py", line 11, in <module>
    from easyai.tasks.utility.task_registry import REGISTERED_TRAIN_TASK
  File "/easy_data/easy_ai/easyai/tasks/__init__.py", line 1, in <module>
    from . import cls
  File "/easy_data/easy_ai/easyai/tasks/cls/__init__.py", line 3, in <module>
    from . import classify
  File "/easy_data/easy_ai/easyai/tasks/cls/classify.py", line 7, in <module>
    from easyai.tasks.utility.base_inference import BaseInference
  File "/easy_data/easy_ai/easyai/tasks/utility/base_inference.py", line 18, in <module>
    from easyai.torch_utility.torch_model_process import TorchModelProcess
  File "/easy_data/easy_ai/easyai/torch_utility/torch_model_process.py", line 11, in <module>
    from easyai.model.utility.model_factory import ModelFactory, ModelWeightInit
  File "/easy_data/easy_ai/easyai/model/__init__.py", line 1, in <module>
    from . import cls
  File "/easy_data/easy_ai/easyai/model/cls/__init__.py", line 1, in <module>
    from . import binary_classnet
  File "/easy_data/easy_ai/easyai/model/cls/binary_classnet.py", line 10, in <module>
    from easyai.model_block.base_block.common.utility_layer import FcLayer
  File "/easy_data/easy_ai/easyai/model_block/__init__.py", line 1, in <module>
    from . import backbone
  File "/easy_data/easy_ai/easyai/model_block/backbone/__init__.py", line 1, in <module>
    from . import cls
  File "/easy_data/easy_ai/easyai/model_block/backbone/cls/__init__.py", line 1, in <module>
    from . import resnet
  File "/easy_data/easy_ai/easyai/model_block/backbone/cls/resnet.py", line 9, in <module>
    from easyai.model_block.base_block.common.residual_block import ResidualBlock
  File "/easy_data/easy_ai/easyai/model_block/base_block/common/residual_block.py", line 5, in <module>
    from torchvision.ops import DeformConv2d
ModuleNotFoundError: No module named 'torchvision.ops'
Failed to start easy_ai

torchvision版本太低问题。使用torch==1.6.0和torchvision==0.7.0

@lpj0822
Copy link
Collaborator Author

lpj0822 commented Oct 27, 2021

ROC曲线评价与阈值选取 这个还没测试

@foww-0001
Copy link
Collaborator

OK_NG的脚本上传,运行以下命令进行:

./easy_tools/train_scripts/NG_OK.sh /easy_data/dataset/blt_classify/ImageSets/train.txt /easy_data/dataset/blt_classify/ImageSets/val.txt

@foww-0001
Copy link
Collaborator

foww-0001 commented Oct 27, 2021

运行脚本:

python3 easyai/inference_task.py -t classify -i /easy_data/dataset/blt_classify/ImageSets/val.txt -m binarynet -w .easy_log/snapshot/cls_best.pt -c .easy_log/config/classify_config.json 

运行报错:

2021-10-27 02:21:57,648 ERROR   [inference_task.py, 39] Traceback (most recent call last):
  File "easyai/inference_task.py", line 37, in infer
    task.process(self.input_path, self.data_type, self.is_show)
  File "/easy_data/easy_ai/easyai/tasks/cls/classify.py", line 29, in process
    class_index, class_confidence = self.single_image_process(batch_data)
  File "/easy_data/easy_ai/easyai/tasks/cls/classify.py", line 49, in single_image_process
    class_index, class_confidence = self.result_process.post_process(prediction)
  File "/easy_data/easy_ai/easyai/tasks/cls/classify_result_process.py", line 19, in post_process
    class_index, class_confidence = self.process_func(prediction)
  File "/easy_data/easy_ai/easyai/tasks/cls/post_process/max_post_process.py", line 19, in __call__
    assert output_count > 1
AssertionError

2021-10-27 02:21:57,648 ERROR   [inference_task.py, 40] 
2021-10-27 02:21:57,651 INFO    [inference_task.py, 51] Inference process end!

@lpj0822
Copy link
Collaborator Author

lpj0822 commented Oct 27, 2021

这个需要配置后处理

@lpj0822
Copy link
Collaborator Author

lpj0822 commented Oct 27, 2021

需要用配置文件

1 similar comment
@lpj0822
Copy link
Collaborator Author

lpj0822 commented Oct 27, 2021

需要用配置文件

@foww-0001
Copy link
Collaborator

"post_process": {
"type": "BinaryPostProcess"
},
修改为BinaryPostProcess。

@lpj0822
Copy link
Collaborator Author

lpj0822 commented Oct 27, 2021

ai_train.py 里面有网络的名称binarynet,按classnet的配置文件重新配置一个

@lpj0822
Copy link
Collaborator Author

lpj0822 commented Oct 27, 2021

task名称也有

@lpj0822
Copy link
Collaborator Author

lpj0822 commented Oct 27, 2021

ai_train.py文件在easy_tools下,这里的参数传递与trian_task参数传递一样

@lpj0822
Copy link
Collaborator Author

lpj0822 commented Oct 27, 2021

easy_tools里面的代码很简单,你自己可以看看

@foww-0001
Copy link
Collaborator

okay,需要生成一个新的配置文件吗

@lpj0822
Copy link
Collaborator Author

lpj0822 commented Oct 27, 2021

每一个网络配置一个配置文件,配置文件尽可能的多配置

@foww-0001
Copy link
Collaborator

ROC的具体运行代码在那

@lpj0822
Copy link
Collaborator Author

lpj0822 commented Oct 27, 2021

在easyai/tools/eval_tool/evaluation_show.py

1 similar comment
@lpj0822
Copy link
Collaborator Author

lpj0822 commented Oct 27, 2021

在easyai/tools/eval_tool/evaluation_show.py

@foww-0001
Copy link
Collaborator

运行:

python3 easyai/tools/eval_tool/evaluation_show.py -t classify -i /easy_data/dataset/blt_classify/ImageSets/val.txt -r .easy_log/classify_result.txt -c .easy_log/config/classify_config.json

报错如下:

start...
Traceback (most recent call last):
  File "easyai/tools/eval_tool/evaluation_show.py", line 72, in <module>
    main()
  File "easyai/tools/eval_tool/evaluation_show.py", line 67, in main
    test.roc_show()
  File "easyai/tools/eval_tool/evaluation_show.py", line 31, in roc_show
    class_roc_dict = classify_roc.eval(self.result_path, self.target_path)
  File "/easy_data/easy_ai/easyai/evaluation/utility/classify_pr.py", line 19, in eval
    result_data_list = self.get_result(result_path)
  File "/easy_data/easy_ai/easyai/evaluation/utility/classify_pr.py", line 73, in get_result
    class_index = int(split_datas[1])
ValueError: invalid literal for int() with base 10: '[1]'

@foww-0001
Copy link
Collaborator

输出的结果中带有[]:
001NG_1163.png [1] 0.55357

@lpj0822
Copy link
Collaborator Author

lpj0822 commented Oct 27, 2021

已经修改

@MiniBullLab
Copy link
Owner

Figure_1-1
结果出来了,不过结果有问题需要再定位一下

@foww-0001
Copy link
Collaborator

训练出来的精度都为prec1: 60.268,查看log,模型参数没有固定。

@foww-0001
Copy link
Collaborator

foww-0001 commented Oct 28, 2021

定位目前和模型参数,config参数无关;
classnet训练精度是100%。
而binarynet的loss不收敛。

@foww-0001
Copy link
Collaborator

目前还是定位不到问题,目前整体流程已经走通。

@lpj0822
Copy link
Collaborator Author

lpj0822 commented Oct 29, 2021

一起看看

@MiniBullLab
Copy link
Owner

在使用Pytorch进行神经网络训练时,有时会遇到训练学习率不下降的问题。出现这种问题的可能原因有很多,包括学习率过小,数据没有进行Normalization等。不过除了这些常规的原因,还有一种难以发现的原因:在计算loss时数据维数不匹配。

@MiniBullLab
Copy link
Owner

在ce2d_loss中加入两行后训练正常:

if targets.dim() < input_data.dim():
    targets = targets.unsqueeze(1)

后面主要考虑是加在dataloader中还是loss中。

@MiniBullLab
Copy link
Owner

ROC
ROC结果图

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
H High Priority
Projects
None yet
Development

No branches or pull requests

4 participants