-
Notifications
You must be signed in to change notification settings - Fork 8
Previous Updates (January 2019)
Bahadir Sahin edited this page Feb 4, 2019
·
3 revisions
In this title, I will save the previous updates for me and the visitors to keep track.
- No bugfix/update commit.
- I said I will commit more smaller updates, but I failed to follow it =)
- Still trying to develop/fix NER training process.
- Found minor bugs in several places in the project.
- Hopefully, a big update will come till the end of this week.
- Experiment result update.
- I am mastering PyTorch while developing this repository. While I was following the LSTM/GRU tutorials in PyTorch's website, it was using two optimizers for encoder and decoder. That's why I separated my training flow. But, I learned that it can be done by single forward() and single optimizer, eventually. Hence, I added the forward() method to "ConvDeconvCNN" object and its trainer will be initialized as "single_model_trainer".
- I may remove "multiple_model_trainer" and its respective evaluator, but I am not sure about it for now.
- CRF's forward() method is updated, and a boolean "reduce" parameter is added. If it is true, then the "negative loglikelihood" return will be averaged.
- Due to difference between, classification and NER training flows, I am implementing a new trainer/evaluator. Also, I will implement performance metric calculators for NER (like precision, recall and F1 score).
- My initial plan is to push new trainer/evaluator in a week.
- This weekend (starting tonight), I will finalize the last 3 experiments on Google Cloud.
- Conditional Random Field (CRF) class is added into the project. I have not tested it yet. So, I am pretty sure it has lots of bugs =) (wait for the future updates).
- A new property "training_properties/task" is defined in config.json. Details are in "/config/README.md"
- Dataset reader code is updated to handle NER datasets. Previous version was reading the sentence and category columns of the dataset while ignoring ner column. Now, it reads NER column, assigns the column to the respective field, and builds NER vocabulary, if the "task" property is "ner".
- Eventually, I made some changes in main.py. I added CRF into the model creation method, but it is for testing. I don't have any plans to keep it there.
- NER-counterparts of the category-related actions are added to main.py.
- Again, CRF is not tested! In near future, I will spend some time on doing basic tests to idenfity bugs, missings and improvement possibilities.
- First, but not last, batch of bugfixes have been pushed.
- All problematic things related to DatasetLoader have been fixed (Check this commit for details).
- Second bugfix update of the day. Note that I continue to push such small bugfixes to be able to revert back easily.
- CRF initialization related bugs have been fixed (Check this commit for details).
- All print-oriented logs are converted to logging library-based loggers.
-
/config/config.logger
file is added as a logger configuration file. - README.md changes
- Table of contents added.
- Format changes (title revisions, section replacements, etc.).
- Thanks to Tesla V100, I got the latest experiment results in 20 hours (yay!).
- I find out that "Padam" optimizer works flawless w.r.t. usual Adam. It is more robust through each step and have not encountered any weird, numerical problems (which I've seen a lot while using Adam). So, if you are reading this and forking/copy-pasting this library to train your own models, I strongly suggest you to use Padam as your optimizer.
- I do not have any development/fix updates.
- However, I am working on CRF and plug-in/out CRF-Layer codes (Did I mention I hate CRF?).
- Also, replacing "print()" oriented logs with "logging" library.
- Finally, I got another test score (it took 1 month to finish 20 epoch in a workstation-strong CPU =)).
- Currently, I have no development and/or fix update.
- Instead, I am trying to find a solution for my resource bottleneck. In last 3 days, I was struggling to understand Google Cloud and its compute engine for my mental goodness. After 3 painful, soul-crashing days (GPU quota problem, GPU quota ticket problem, ssh problem, python problem, library problem, pip problem, fucking no module "xyz" is found problem), I could start a training in a machine with Tesla V100 (every poor human being's dream card).
- Hopefully, by opening lots of new google accounts (to leverage initial $300 credit, until my unique credit cards diminish), I will be able to get several test results faster.
- I added two new properties to
config.json/dataset_properties
(min_freq and fixed_length) to reduce memory consumption. You are still able to use dynamic input size and assign every seen word in your vocabulary if you have enough memory. Checkconfig/README.md
for detailed information. - Sadly, I encountered the worst problem in PyTorch related to CUDA OOM error, which is model reloading increases the memory consumption =/ In short, I could start a training process (English dataset/non-static/zeroes oov/text_cnn) and it iterated for 2 epochs without any problem (stable memory consumption with 1.5GB of free GPU memory). Then, I saved the model to continue the process later. However, after I loaded the model, the code directly raised CUDA OOM error. I tried to apply things that I've read in PyTorch's forums; however, those so called fixes did not help me. Things that I've found and tried:
- I tried to delete the checkpoint reference after model loading (https://discuss.pytorch.org/t/gpu-memory-usage-increases-by-90-after-torch-load/9213)
- I tried to catch OOM error and free some memory after it (https://discuss.pytorch.org/t/how-to-clean-gpu-memory-after-a-runtimeerror/28781/2?u=ptrblck)
- In conclusion, if you have a spare computer that can do your training until the end, I am %100 sure that this repository does not have memory leak. As long as your input and model sizes are reasonable, it will train. However, if you do not have such a luxury, I can't do anything about it. But if you have any suggestions, I'd be really happy to listen/apply =)
- I created a README for the config.json. It can be found in newly created config folder.
- Last night, I did some research, basic math (to calculate model size) and experiments about possible memory leaks to prevent CUDA OOM errors. Basically, I could not find any memory leak in normal memory and GPU memory. In conclusion, my model (for English) is too big to train in my own GPU.
- Eventually, I did not want to play with model parameters to reduce the size, but I decided to reduce it by dataset level.
- I have not fixed any sentence length and used all words in my vocabularies (min_freq=1). In Turkish experiments, since the dataset is not big, I did not face any problems, its a total different story in English.
- I am currently testing the fixed_length and min_freq parameters to control my model size. Until now, tests are going well. Depending on the results, I will put this two parameters into the config.json.
- After I find out vocabulary caching has bugs and could not fix it, I removed vocabulary caching functionality from code (both save/load parts).
- Even though saving is not a problem, to be able to load a Vocab object, one needs to do too much workaround. I wasted my 6 hours to make it work, but no chance (Vocab objects can be loaded by pickle, but all dataset iterators also want to hold a Vocab object inside which can be done by using
build_vocab()
method in normal dataset reading process. If one loads external, cached vocabularies, you jump this step and can't feed these iterators with vocab objects, a.k.a. can't train due to missing Vocab objects in iterator). - I will wait for torchtext to provide native support to vocabulary saving/loading.
- Even though saving is not a problem, to be able to load a Vocab object, one needs to do too much workaround. I wasted my 6 hours to make it work, but no chance (Vocab objects can be loaded by pickle, but all dataset iterators also want to hold a Vocab object inside which can be done by using
- I will spend some time on monitoring and optimizing my models/training flows for GPU memory optimization. In my laptop, I am bounded with 3GB GPU memory, and I cannot train big models (I have to say that I did not face such problems in Tensorflow for same model/dataset/parameter sets)
- Final fixes are applied in transformer model, and it is trainable.
- However, depending on the parameters and model size, it can produce CUDA OOM (out of memory) error pretty easily.
- Related to the memory error, somehow PyTorch seems can't handle CUDA memory as good as Tensorflow. I will do some research about it to optimize GPU memory in the following days (using
torch.cuda.empty_cache()
for this purpose in training steps isn't enough).
- Related to the memory error, somehow PyTorch seems can't handle CUDA memory as good as Tensorflow. I will do some research about it to optimize GPU memory in the following days (using
- There are some minor updates in training process (both in single and multiple trainers).
- Since NoamOptimizer does not inherit the PyTorch optimization, I put checkers into the trainers for this optimizer whenever ".zero_grad()", ".step()", ".save()" and ".load()" functions are called for the optimization object.
- A new optimizer is added into custom_optimizer: "Padam". The reference paper is Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks.
- Yesterday, I was reading reddit/ML about Adam-related problems and saw this paper. I have not tested it, in terms of optimality/training-test results, but I will give it a shot.
- I started to work on transformer_google model. Obviously, it cannot be trained by its current version.
- I have fixed several major bugs.
- Classifier block's keep_prob parameter was missing. Hence, it is added to config.json as well as the model flow.
- Nobody told me that in MultiHeadedAttention, model dimension should be divisible by the number of heads (attention layers). This lack of knowledge costed me 2 hours, but it is fixed (and will be checked inside the model).
- Tests are going on (not unit tests obviously)
- README.MD changes.
- MIT Licence is added.
I stopped being a lazy guy and changed the current code execution stuff:
- All hard-coded, property holding dictionaries inside main.py are removed.
- Instead, a "config.json" file is created and the main code will ask this file's path (as argument) from you to run the project, properly.
- Detailed description of this file will be added into this readme (but until I write it, you can always open the file. Believe me, it is not too complicated =)).
- With respect to new kind of property handling, I changed every related variable/argument initialization in the main and model files.
-
A complete README.MD overhaul is coming on its way. (Done!) - Still, I have not tested Transformer code. Don't be mad at me if you c/p it and can't get results for your homework(s) =)
- Tests are really really slow in CPU workstation and I still play games in my daily-life computer instead of running experiments.