-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reproduce your one/few-shot results on nas-bench 201 #1
Comments
Good catch. Thank you for pointing it out. Actually, the only difference between "FEW-SHOT-SUPERNET.config" and "ONE-SHOT-SUPERNET.config" is the training epoch. But this script is used for evaluation. Therefore, using the FEW-SHOT-SUPERNET.config should be worked as desired. But to eliminate the confusion, I will change this line to ONE-SHOT-SUPERNET.config later. Thank you again for your attention. |
I have changed the FEW-SHOT-SUPERNET.config to ONE-SHOT-SUPERNET.config. This is a typo. Thanks again for pointing it out and you can run this script directly. If you have further questions, please let me know. |
@aoiang can you please list the "exact step" to reproduce the results? can you please run your codes on your end to ensure everything are okay with those steps? After you will have done that, please update this thread. |
Thanks four your kind reply! urrent test-geno is Structure(4 nodes with |avg_pool_3x3 However, I DO NOT find the result of Kendall tau? Could you provide the steps to reproduce your ranking accuracy result? As far as I know, evaluating super-net is just a loop to obtain the proxy acc for each arch in the super-net and calculate the rank acc between the proxy acc and GT acc. So another question is that I cannot understand why the eval script should be run 4 times for the one-shot super-net (not parallel for speedup)? |
@aoiang I need you to list ALL your "specific" steps to answer the above question!! |
Sure, I will double-check my code on my local side and write down every specific step for addressing this issue and reproducing the results. Thank you for your kind reminder. |
Thank you for your asking, before answering your questions, I would like to list the specific steps to run our few-shot NAS on NasBench201.
Back to your questions, First, I would like to say that the type of the results to the screen was as desired. In other words, the script was running properly For the first question(Kendall tau), you can follow the steps I list above, after running the step6, the Kendall tau will be out to your screen. The second question is about the eval script. Actually, the total number of architectures in NasBench201 is 15625 with 5 different operator types, including skip_connection, average pooling, convolution3x3, convolution1x1, and none. I convert all architectures to a JSON file and then split it into 5 files based on the first operator in Nasbench201(The files are located on few-shot-NAS/Few-Shot_NasBench201/search_pool/). I did this is because it is convenient for the training few-shot models. Therefore, each file contains 3125 architecture. Back to the evaluation script, we should evaluate all 15625 architectures in NasBench201 to get the proxy accuracy, we run the eval script 5 times because the architectures to be evaluated are from the 5 files I mentioned before(You can go to the few-shot-NAS/Few-Shot_NasBench201/search_pool/ and open them to find the details). In other words, each time is to evaluate 3125 architectures and we run them 5 times to evaluate all 15625 architectures. I am sorry for the confusion with the eval script. Thank you again for your asking. |
Thanks! |
Thanks for your effort to replicate our results. We're looking forward to your results. |
I cannot reproduce your few-shot results. I run the few-shot scripts 5 times follow your readme. In each training script FLOP = 61.52 M, Params = 1.42 MB [Search the 000-150-th epoch] Time Left: [00:00:00], LR=0.025 I see the few-shot super-net is inherited from the one-shot super-net as expected. After running, I obtain files as follows: /home/checkpoint_few_shot_nas/few-shot drwxrwxrwx 2 root root 4096 Jul 15 02:50 checkpoint Then, I run the few-shot/eval.sh got the file few-shot-supernet Finally, I run the rank.sh got kendall tau 0.54295. I run the code with pytorch 1.6.0, Cuda 10.1. |
Thank you for reproducing the few-shot model. We would double-check our code to see if there is any difference between our implementations. Once we finish the check on my side, I will let you know here. |
@Jxu-Thu thank you. Did you run each methods for once? Yiyang is investigating, and we'd like confirm this first. Thank you. |
@linnanwang Yeah. I only run the experiments once. |
I have a question. Previous work needs to re-tune the BN (batchnorm) [re-calcuate the BN by forwarding the whole validation set] parameters for each architecture when evaluating the performance of the arches in the search space. It seems that you do not re-tune the BN in the script exp/supernet/one-shot-supernet_eval.py? |
Thank you for asking, refer to table 5 in NasBench201 original paper(https://arxiv.org/pdf/2001.00326.pdf), there are two blocks in the table representing different BN types, the BN type in the first block uses track_running_stats and in the second block, it does not keep running estimates but always use batch statistics. For our script, take one-shot for example(bash ./supernet/one-shot/train.sh cifar10 0 NASBENCH201_PATH), the third parameter "0" controls the BN type, which is equivalent to the second BN type in the table(not keep running estimates but always use batch statistics). Therefore, this is why we do not re-tune the BN in the script. |
@aoiang Many thanks! You are right. I found that some codes with BN=0 still re-tune BN, which is not necessary. |
Any update of few-shot results? |
Hi, I have trained and tuned the few-shot supernet in recent days and the results are upgraded significantly, which is currently obvious better than one-shot model. This is a positive signal and I will upload the checkpoints of my few-shot model later today and you are free test them directly on your side. Thank you. |
Hi there, first, sorry for the late reply. Recently, I am doing a full-time job and am busy with work. Also, due to the limited computation resource, the experiment results would be out slowly. So far, the Kendall tau is: we have uploaded the checkpoints of the few-shot model into google drive, see here(https://drive.google.com/drive/folders/13sZBqPxQsaoxxsJqDA6moPPgMI_udiHL?usp=sharing). You are free to download and evaluate it by yourself. There is a README in the folder and you can follow the README to evaluate the models and get the Kendall tau. To further tune the models, the Kendall tau can be much higher. Thank you for your patience. |
@aoiang great job Yiyang, thank you for the effort! |
I run two experiments. seed 0: seed 1: I fail to reproduce your results based on your codes. I have tried my best to reproduce your results in the paper. It validates the effectiveness of few-shot concepts although I cannot reproduce your results based on your codes. Anyway. Thanks for your kind reply and nice work :) |
What are you talking about? First you have costed us nearly two weeks to help you get the checkpoint attached above, which gives you a rank correlation of 0.6 based on our implementation. You need respect our time, and read the results above. I don’t know what might be wrong, there must be something fishy in your codes. Second, it looks like we establish a common ground that few shot NAS is effective. You independently replicated few shot NAS and tested on NASBench 201. Here is the rank correlation: Alright, to others that might have a future question like this, this is a great thread to read. It looks like we all nail down the conclusion that few shot NAS is effective, and let’s move forward and focus on the right thing. We will leave the thread open as a reference to other people. Thank you all and great job Yiyang! |
Hi aoiang, Thanks for your nice work and the code. Great thanks and best wishes. |
@ShunLu91 @aoiang The Kendall’s Tau 0.653 as reported in this paper is a mean value, that means there are 6 choice to split 1 edge to generate 5 sub-supernet . I guess if I choose any other edge(one of six edges) to split , it maby could get better Kendall’s Tau ,such as above 0.653. In your training code , could user change the different edge to split ? OPERATION_TO_SPLIT(0-4) just give the choice of different operations in ONE edge, any other edge maybe better. |
@Pcyslist Thanks for your constructive suggestions. I concur with you that different |
@ShunLu91 I have checked your few-shot training log, I found that when you trained five subnetworks, which you transferred different model weights to . This is incorrect. You should always transfer the model weights obtained by one-shot training to the sub-supernet. |
@ShunLu91 In addition, I would like to ask what type of GPU do you use? Why it is so slow for me to train supernet of one-shot with two 3090 gpus? |
@Pcyslist Sorry but I have to clarify that you might have some misunderstandings about my training. I followed the official instructions here to conduct the few-shot training rather than the one-shot training. I split the one-shot supernet into 5 sub-supernets. Thus for evaluation, I transferred different model weights for each sub-network from their corresponding sub-supernet. As for the part of the transfer learning, the author said "For this experiment, we train sub-supernets by skipping the transfer learning described in Section 3.2." in Section 4.1.1 in their paper. Therefore, the experiments on NAS-Bench-201 do not have the part of the transfer learning. Besides, I trained each sub-supernet on a single card of NVIDIA V100 GPU. I am glad for more discussions. |
"I followed the official instructions here to conduct the few-shot training rather than the one-shot training."——Before you conduct the few-shot training, you must conduct one-shot training at least once. Because the one-shot training generates the weight of original supernet which is used transferred to each sub-supernet. And then you can conduct few-shot training with initialized sub-supernet five times.
"Thus for evaluation, I transferred different model weights for each sub-network from their corresponding sub-supernet."——You should transfer the same weights (one-shot supernet) to each sub-supernet, because one-shot supernet is the parent supernet of 5 sub-supernets.
"For this experiment, we train sub-supernets by skipping the transfer learning described in Section 3.2." in Section 4.1.1 in their paper. Therefore, the experiments on NAS-Bench-201 do not have the part of the transfer learning."——No, it's not totally right. Only in the experiment of gradient based algorithm, author doesn't use transfer learning. As for search based algorithm, author conduct transfer learning obviously shown in figure6.
I conduct one-shot training for 19 hours using two NVIDIA 3090 GPUs, far away from the result 6.8 hours in paper. what method can I adopt to accelerate training speed ?
…---Original---
From: "Shun ***@***.***>
Date: Thu, Mar 23, 2023 14:11 PM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [aoiang/few-shot-NAS] reproduce your one/few-shot results onnas-bench 201 (#1)
@Pcyslist Sorry but I have to clarify that you might have some misunderstandings about my training. I followed the official instructions here to conduct the few-shot training rather than the one-shot training. I split the one-shot supernet into 5 sub-supernets. Thus for evaluation, I transferred different model weights for each sub-network from their corresponding sub-supernet. As for the part of the transfer learning, the author said "For this experiment, we train sub-supernets by skipping the transfer learning described in Section 3.2." in Section 4.1.1 in their paper. Therefore, the experiments on NAS-Bench-201 do not have the part of the transfer learning. Besides, I trained each sub-supernet on a single card of NVIDIA V100 GPU. I am glad for more discussions.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@Pcyslist Thanks for pointing our this error. I notice that a transfer learning procedure was missing in my training and I will repeat this experiment. In summary, I should first transfer the same weights from a pre-trained one-shot supernet and then train the few-shot sub-supernet. After the few-shot training, I can use the corresponding few-shot sub-supernet to evaluate each subnetwork. Is my understanding correct now? In this way, the training cost is obviously higher than the conventional one-shot NAS because of this post-training, i.e. the transfer learning and then few-shot. As for the training speed, I have chekced my previous one-shot training logs on one NVIDIA V100 GPU and found that the training time is nearly the same as reported 6.8 hours. Maybe you can increase the |
@ShunLu91 Yes, you can have a try. I have also solved the problem of how to accelerate training. I have found that using a single GPU is much faster than using two GPUs. |
@Pcyslist OK~Great thanks! |
@ShunLu91 Hi!I wonder how long it took you to evaluate 15625 networks? I found that evaluating 15625 networks(about 15 hours) takes longer than training a supernet(about 6 hours), is this normal? |
@Pcyslist I adopted a large valid batch size 512 with num_workers=16 when I evaluated each sub-network. It only took 6.5 hours for me to evaluate 15625 sub-networks. I suggest you to find out the most time-consuming step and then optimize it. |
Thank you, I have completed all the processes of the recurring experiment, and the experimental results obtained by running the rank.py file are as follows: |
@Pcyslist Thanks for your results, which are an important reference for me and other users. |
I try to reproduce your one-shpt results on nas-bench 201.
By running the code
https://github.com/aoiang/few-shot-NAS/blob/main/Few-Shot_NasBench201/supernet/one-shot/run.sh
and eval
https://github.com/aoiang/few-shot-NAS/blob/main/Few-Shot_NasBench201/supernet/one-shot/eval.sh
However, your eval.sh is
for (( c=0; c<=4; c++ ))
do
OMP_NUM_THREADS=4 python ./exps/supernet/one-shot-supernet_eval.py
--save_dir ${save_dir} --max_nodes ${max_nodes} --channel ${channel} --num_cells ${num_cells}
--dataset ${dataset} --data_path ${data_path}
--search_space_name ${space}
--arch_nas_dataset ${benchmark_file}
--config_path configs/nas-benchmark/algos/FEW-SHOT-SUPERNET.config
--track_running_stats ${BN}
--select_num 100
--output_dir ${OUTPUT}
--workers 4 --print_freq 200 --rand_seed 0 --edge_op ${c}
done
Why using te FEW-SHOT-SUPERNET.config?
The text was updated successfully, but these errors were encountered: