Low-resource finetune #6

yiwei0730 · 2024-07-23T09:16:29Z

I'm sorry to bother you
I want to change this model from zero-shot to finetune with less data
When I use 4 data (repeated about 10 times) = 40 data for training. But He will keep popping up errors. I don’t know how to solve this.
"""
text_tokens, durations, latents = map(list, zip(*batch))
ValueError: not enough values to unpack (expected 3, got 0)
"""

yiwei0730 · 2024-07-23T09:30:20Z

ERROR MESSAGE:
[2024-07-23 17:46:00,103][pflow_encodec.data.sampler][INFO] - [rank: 0] Created 3 buckets
[2024-07-23 17:46:00,103][pflow_encodec.data.sampler][INFO] - [rank: 0] Boundaries: [3.0, 5.0, 7.0, 10.0]
[2024-07-23 17:46:00,103][pflow_encodec.data.sampler][INFO] - [rank: 0] Bucket sizes: [0, 66, 22]
/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers
which may be a bottleneck. Consider increasing the value of the num_workers argumenttonum_workers=31in theDataLoader` to improve performance.
/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py:298: The number of training batches (20) is smaller than the logging
interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
4
4
0
Epoch 0/1999 ━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2/20 0:00:17 • 0:00:03 8.77it/s v_num: 16.000 train/enc_loss_step: 0.098 train/duration_loss_step: 0.349
train/flow_matching_loss_step: 0.223 train/latent_loss_step: 0.321 train/loss_step: 0.671
other/grad_norm: 5.177
[2024-07-23 17:46:17,798][pflow_encodec.utils.utils][ERROR] - [rank: 0]
Traceback (most recent call last):
File "/workspace/yiwei/pflow-encodec/pflow_encodec/utils/utils.py", line 68, in wrap
metric_dict, object_dict = task_func(cfg=cfg)
File "/workspace/yiwei/pflow-encodec/pflow_encodec/train.py", line 89, in train
trainer.fit(model=model, datamodule=datamodule, ckpt_path=cfg.get("ckpt_path"))
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 987, in _run
results = self._run_stage()
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1033, in _run_stage
self.fit_loop.run()
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 205, in run
self.advance()
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 363, in advance
self.epoch_loop.run(self._data_fetcher)
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 140, in run
self.advance(data_fetcher)
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 212, in advance
batch, _, __ = next(data_fetcher)
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/fetchers.py", line 133, in next
batch = super().next()
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/loops/fetchers.py", line 60, in next
batch = next(self.iterator)
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/utilities/combined_loader.py", line 341, in next
out = next(self._iterator)
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/lightning/pytorch/utilities/combined_loader.py", line 78, in next
out[i] = next(self.iterators[i])
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next
data = self._next_data()
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/root/anaconda3/envs/pflow-encodec/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/workspace/yiwei/pflow-encodec/pflow_encodec/data/datamodule.py", line 83, in _collate
text_tokens, durations, latents = map(list, zip(*batch))
ValueError: not enough values to unpack (expected 3, got 0)

seastar105 · 2024-07-30T10:12:16Z

@yiwei0730
it seems there's an bug in BucketSampler, since error says sampler does not delete empty bucket properly.
could you modify Sampler's code below and run?

pflow-encodec/pflow_encodec/data/sampler.py

Lines 62 to 75 in 6dd5339

    
           def _create_bucket(self): 
        
               buckets = [[] for _ in range(len(self.boundaries) - 1)] 
        
               for i in range(len(self.durations)): 
        
                   length = self.durations[i] 
        
                   idx_bucket = self._bisect(length) 
        
                   if idx_bucket != -1: 
        
                       buckets[idx_bucket].append(i) 
        
               for i in range(len(buckets) - 1, 0, -1): 
        
                   if len(buckets[i]) == 0: 
        
                       buckets.pop(i) 
        
                       self.boundaries.pop(i + 1) 
        
               return buckets

    def _create_bucket(self):
        buckets = [[] for _ in range(len(self.boundaries) - 1)]
        for i in range(len(self.durations)):
            length = self.durations[i]
            idx_bucket = self._bisect(length)
            if idx_bucket != -1:
                buckets[idx_bucket].append(i)

        empty_indices = []
        for idx, bucket in enumerate(buckets):
            if len(bucket) == 0:
                empty_indices.append(idx)

        for i in sorted(empty_indices, reverse=True):
            del buckets[i]
            del self.boundaries[i+1]

        return buckets

by the way, Was chinese, english mixed training did well?

yiwei0730 · 2024-08-01T01:33:08Z

The result of zero-shot is not very good and the similarity is not good.
I tried low-resource fintune (4 sentenses) myself, but the effect was very poor, and the similarity and naturalness were not good. Moreover, his language judgment will be confused. When I input Japanese or Korean, the output will become strange language.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low-resource finetune #6

Low-resource finetune #6

yiwei0730 commented Jul 23, 2024 •

edited

Loading

yiwei0730 commented Jul 23, 2024

seastar105 commented Jul 30, 2024

yiwei0730 commented Aug 1, 2024

Low-resource finetune #6

Low-resource finetune #6

Comments

yiwei0730 commented Jul 23, 2024 • edited Loading

yiwei0730 commented Jul 23, 2024

seastar105 commented Jul 30, 2024

yiwei0730 commented Aug 1, 2024

yiwei0730 commented Jul 23, 2024 •

edited

Loading