是否可以修改依赖中的transformers版本,怀疑下面报错为依赖问题 #524
Labels
environment
related to third-party dependency, DJ-pypi, DJ-docker, etc.
question
Further information is requested
Before Asking 在提问之前
I have read the README carefully. 我已经仔细阅读了 README 上的操作指引。
I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。
Search before asking 先搜索,再提问
Question
Traceback (most recent call last):
File "/data/xinqiuqing/data-juicer/data_juicer/ops/base_op.py", line 60, in wrapper
return method(samples, *args, **kwargs)
File "/data/xinqiuqing/data-juicer/data_juicer/ops/mapper/generate_qa_from_text_mapper.py", line 113, in process_batched
model, _ = get_model(self.model_key, rank, self.use_cuda())
File "/data/xinqiuqing/data-juicer/data_juicer/utils/model_utils.py", line 807, in get_model
MODEL_ZOO[model_key] = model_key(device=device)
File "/data/xinqiuqing/data-juicer/data_juicer/utils/model_utils.py", line 377, in prepare_huggingface_model
pipe = transformers.pipeline(task=pipe_task,
File "/root/miniconda3/envs/xinqiuqing/lib/python3.10/site-packages/transformers/pipelines/init.py", line 1178, in pipeline
return pipeline_class(model=model, framework=framework, task=task, **kwargs)
File "/root/miniconda3/envs/xinqiuqing/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 106, in init
if self.prefix is not None:
AttributeError: 'TextGenerationPipeline' object has no attribute 'prefix'
generate_qa_from_text_mapper_process: 100%|##########| 129/129 [01:34<00:00, 1.36 examples/s]
2024-12-26 16:54:19 | INFO | data_juicer.core.data:226 - [7/7] OP [generate_qa_from_text_mapper] Done in 95.495s. Left 0 samples.
Additional 额外信息
这是我的配置yaml文件
project_name: 'test01'
dataset_path: 'data/test/'
export_path: 'outputs/test/test.jsonl'
export_shard_size: 0
export_in_parallel: false
np: 4
suffixes: ['.txt']
process:
对文本进行分片处理
max_len: 1000
split_pattern: '\n\n'
overlap_len: 200
tokenizer: 'qwen2.5-72b-instruct'
trust_remote_code: True
删除链接,例如以 http 或 ftp 开头的
删除 HTML 标签并返回所有节点的纯文本
删除 TeX 文档的参考文献
删除 TeX 文档头,例如标题、章节数字/名称等
drop_no_head: true
删除样本中的重复句子
lowercase: false
ignore_special_character: true
min_repeat_sentence_length: 2
从文本中生成问答对
hf_model: 'impira/layoutlm-document-qa'
output_pattern: null
enable_vllm: false
model_params: {}
sampling_params: {}
mem_required: '10GB
The text was updated successfully, but these errors were encountered: