diff --git a/README.MD b/README.MD index c20954f..3fa16a6 100644 --- a/README.MD +++ b/README.MD @@ -150,6 +150,9 @@ Please try downgrading the ```protobuf``` dependency package to 3.20.3, or set e **If the dependency package error after updating, please double clicking ```repair_dependency.bat``` (for Official ComfyUI Protable) or ```repair_dependency_aki.bat``` (for ComfyUI-aki-v1.x) in the plugin folder to reinstall the dependency packages.
+* Commit [JoyCaption2Split](#JoyCaption2Split) and [LoadJoyCaption2Model](#LoadJoyCaption2Model) nodes, Sharing the model across multiple JoyCaption2 nodes improves efficiency. +* [SegmentAnythingUltra](#SegmentAnythingUltra) and [SegmentAnythingUltraV2](#SegmentAnythingUltraV2) add the ```cache_model``` option, Easy to flexibly manage VRAM usage. + * Due to the high version requirements of the [LlamaVision](#LlamaVision) node for ```transformers```, which affects the loading of some older third-party plugins, so the LayerStyle plugin has lowered the default requirement to 4.43.2. If you need to run LlamaVision, please upgrade to 4.45.0 or above on your own. * Commit [JoyCaption2](#JoyCaption2) and [JoyCaption2ExtraOptions](#JoyCaption2ExtraOptions) nodes. New dependency packages need to be installed. @@ -952,6 +955,34 @@ Node Options: * temperature: The temperature parameter of LLM. * cache_model: Whether to cache the model. +### JoyCaption2Split +The node of JoyCaption2 separate model loading and inference, and when multiple JoyCaption2 nodes are used, the model can be shared to improve efficiency. + +Node Options: +![image](image/joycaption2_split_node.jpg) + +* image: Image input.。 +* joy2_model: The JoyCaption model input. +* extra_options: Input the extra_options. +* caption_type: Caption type options, including: "Descriptive", "Descriptive (Informal)", "Training Prompt", "MidJourney", "Booru tag list", "Booru-like tag list", "Art Critic", "Product Listing", "Social Media Post". +* caption_length: The length of caption. +* user_prompt: User prompt words for LLM model. If there is content here, it will overwrite all the settings for caption_type and extra_options. +* max_new_tokens: The max_new_token parameter of LLM. +* do_sample: The do_sample parameter of LLM. +* top-p: The top_p parameter of LLM. +* temperature: The temperature parameter of LLM. + +### LoadJoyCaption2Model +JoyCaption2's model loading node, used in conjunction with JoyCaption2Split. + +Node Options: +![image](image/load_joycaption2_model_node.jpg) + +* llm_model: There are two LLM models to choose, Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2 and unsloth/Meta-Llama-3.1-8B-Instruct. +* device: Model loading device. Currently, only CUDA is supported. +* dtype: Model precision, nf4 and bf16. +* vlm_lora: Whether to load text_madel. + ### JoyCaption2ExtraOptions The extra_options parameter node of JoyCaption2. @@ -2065,6 +2096,7 @@ Node options: * white_point: Edge white sampling threshold. * process_detail: Set to false here will skip edge processing to save runtime. * prompt: Input for SAM's prompt. +* cache_model: Set whether to cache the model. ### SegmentAnythingUltraV2 diff --git a/README_CN.MD b/README_CN.MD index ab2bd2b..847a65a 100644 --- a/README_CN.MD +++ b/README_CN.MD @@ -127,6 +127,8 @@ If this call came from a _pb2.py file, your generated code is out of date and mu ## 更新说明 **如果本插件更新后出现依赖包错误,请双击运行插件目录下的```install_requirements.bat```(官方便携包),或 ```install_requirements_aki.bat```(秋叶整合包) 重新安装依赖包。 +* 添加 [JoyCaption2Split](#JoyCaption2Split) 和 [LoadJoyCaption2Model](#LoadJoyCaption2Model) 节点,在多个JoyCaption2节点时共用模型提高效率。 +* [SegmentAnythingUltra](#SegmentAnythingUltra) 和 [SegmentAnythingUltraV2](#SegmentAnythingUltraV2) 增加 ```cache_model``` 参数,便于灵活管理显存。 * 鉴于[LlamaVision](#LlamaVision)节点对 ```transformers``` 的要求版本较高而影响某些旧版第三方插件的加载,LayerStyle 插件已将默认要求降低到4.43.2, 如有运行LlamaVision的需求请自行升级至4.45.0以上。 * 添加 [JoyCaption2](#JoyCaption2) 和 [JoyCaption2ExtraOptions](#JoyCaption2ExtraOptions) 节点,使用JoyCaption-alpha-two模型生成提示词。 请从 [百度网盘](https://pan.baidu.com/s/1dOjbUEacUOhzFitAQ3uIeQ?pwd=4ypv) 以及 [百度网盘](https://pan.baidu.com/s/1mH1SuW45Dy6Wga7aws5siQ?pwd=w6h5) , @@ -849,6 +851,35 @@ ImageScaleByAspectRatio的V2升级版 * temperature: LLM的temperature参数。 * cache_model: 是否缓存模型。 +### JoyCaption2Split +JoyCaption2 的分离式节点,将模型加载与推理分离,使用多个JoyCaption2节点时可共用模型提高效率。 + +节点选项说明: +![image](image/joycaption2_split_node.jpg) + +* image: 图片输入。 +* joy2_model: JoyCaption模型输入。 +* extra_options: extra_options参数输入。 +* caption_type: caption类型选项, 包括"Descriptive"(正式语气描述), "Descriptive (Informal)"(非正式语气描述), "Training Prompt"(SD训练描述), "MidJourney"(MJ风格描述), "Booru tag list"(标签列表), "Booru-like tag list"(类标签列表), "Art Critic"(艺术评论), "Product Listing"(产品列表), "Social Media Post"(社交媒体风格)。 +* caption_length: 描述长度。 +* user_prompt: LLM模型的用户提示词。如果这里有内容将覆盖caption_type和extra_options的所有设置。 +* max_new_tokens: LLM的max_new_tokens参数。 +* do_sample: LLM的do_sample参数。 +* top-p: LLM的top_p参数。 +* temperature: LLM的temperature参数。 + +### LoadJoyCaption2Model +JoyCaption2 的模型加载节点,与JoyCaption2Split配合使用。 + +节点选项说明: +![image](image/load_joycaption2_model_node.jpg) + +* llm_model: 目前有 Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2 和 unsloth/Meta-Llama-3.1-8B-Instruct 两种LLM模型可选择。 +* device: 模型加载设备。目前仅支持cuda。 +* dtype: 模型加载精度,有nf4 和 bf16 两个选项。 +* vlm_lora: 是否加载text_model。 + + ### JoyCaption2ExtraOptions JoyCaption2的extra_options参数节点。 @@ -1843,6 +1874,7 @@ mask为可选输入项,如果这里输入遮罩,将作用于输出结果。 * white_point: 边缘白色采样阈值。 * process_detail: 此处设为False将跳过边缘处理以节省运行时间。 * prompt: SAM的prompt输入。 +* cache_model: 是否缓存模型。 ### SegmentAnythingUltraV2 SegmentAnythingUltra的V2升级版,增加了VITMatte边缘处理方法。 diff --git a/image/joycaption2_split_node.jpg b/image/joycaption2_split_node.jpg new file mode 100644 index 0000000..4eccf6f Binary files /dev/null and b/image/joycaption2_split_node.jpg differ diff --git a/image/load_joycaption2_model_node.jpg b/image/load_joycaption2_model_node.jpg new file mode 100644 index 0000000..7b6ae05 Binary files /dev/null and b/image/load_joycaption2_model_node.jpg differ diff --git a/image/segment_anything_ultra_node.jpg b/image/segment_anything_ultra_node.jpg index a50adb1..9a9e7fc 100644 Binary files a/image/segment_anything_ultra_node.jpg and b/image/segment_anything_ultra_node.jpg differ diff --git a/image/segment_anything_ultra_v2_node.jpg b/image/segment_anything_ultra_v2_node.jpg index fa12110..ea31e86 100644 Binary files a/image/segment_anything_ultra_v2_node.jpg and b/image/segment_anything_ultra_v2_node.jpg differ diff --git a/py/joycaption_alpha_2.py b/py/joycaption_alpha_2.py index 9ca73aa..bfb60a3 100644 --- a/py/joycaption_alpha_2.py +++ b/py/joycaption_alpha_2.py @@ -524,13 +524,115 @@ def joycaption2(self, image, llm_model, device, dtype, vlm_lora, caption_type, c return (ret_text,) +class LS_LoadJoyCaption2Model: + + CATEGORY = '😺dzNodes/LayerUtility' + FUNCTION = "load_joycaption2_model" + RETURN_TYPES = ("JoyCaption2_Model",) + RETURN_NAMES = ("joy2_model",) + OUTPUT_IS_LIST = (True,) + + def __init__(self): + self.NODE_NAME = 'LoadJoyCaption2Model' + + @classmethod + def INPUT_TYPES(self): + llm_model_list = ["Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2", "unsloth/Meta-Llama-3.1-8B-Instruct"] + device_list = ['cuda'] + dtype_list = ['nf4','bf16'] + vlm_lora_list = ['text_model', 'none'] + + return { + "required": { + "llm_model": (llm_model_list,), + "device": (device_list,), + "dtype": (dtype_list,), + "vlm_lora": (vlm_lora_list,), + }, + "optional": { + } + } + + def load_joycaption2_model(self, llm_model, device, dtype, vlm_lora): + llm_model_path = download_hg_model(llm_model, "LLM") + model = load_models(llm_model_path, dtype, vlm_lora, device) + + return ([[model,device]],) + +class LS_JoyCaption2Split: + + CATEGORY = '😺dzNodes/LayerUtility' + FUNCTION = "joycaption2split" + RETURN_TYPES = ("STRING",) + RETURN_NAMES = ("text",) + OUTPUT_IS_LIST = (True,) + + def __init__(self): + self.NODE_NAME = 'JoyCaption2split' + self.previous_model = None + + @classmethod + def INPUT_TYPES(self): + caption_type_list = ["Descriptive", "Descriptive (Informal)", "Training Prompt", "MidJourney", + "Booru tag list", "Booru-like tag list", "Art Critic", "Product Listing", + "Social Media Post"] + caption_length_list = ["any", "very short", "short", "medium-length", "long", "very long"] + [str(i) for i in range(20, 261, 10)] + + return { + "required": { + "image": ("IMAGE",), + "joy2_model": ("JoyCaption2_Model",), + "caption_type": (caption_type_list,), + "caption_length": (caption_length_list,), + "user_prompt": ("STRING", {"default": "","multiline": False}), + "max_new_tokens": ("INT", {"default": 300, "min": 8, "max": 4096, "step": 1}), + "top_p": ("FLOAT", {"default": 0.9, "min": 0, "max": 1, "step": 0.01}), + "temperature": ("FLOAT", {"default": 0.6, "min": 0, "max": 1, "step": 0.01}), + }, + "optional": { + "extra_options": ("JoyCaption2ExtraOption",), + } + } + + def joycaption2split(self, image, joy2_model, caption_type, caption_length, + user_prompt, max_new_tokens, top_p, temperature, + extra_options=None): + + model, device = joy2_model + # device = "cuda" + ret_text = [] + extra = [] + character_name = "" + if extra_options is not None: + extra, character_name = extra_options + + for img in image: + img = tensor2pil(img.unsqueeze(0)).convert('RGB') + # log(f"{self.NODE_NAME}: caption_type={caption_type}, caption_length={caption_length}, extra={extra}, character_name={character_name}, user_prompt={user_prompt}") + caption = stream_chat([img], caption_type, caption_length, + extra, character_name, user_prompt, + max_new_tokens, top_p, temperature, 1, + model, device) + log(f"{self.NODE_NAME}: caption={caption[0]}") + ret_text.append(caption[0]) + + del joy2_model + del model, device + clear_memory() + + return (ret_text,) + NODE_CLASS_MAPPINGS = { + "LayerUtility: LoadJoyCaption2Model": LS_LoadJoyCaption2Model, + "LayerUtility: JoyCaption2Split": LS_JoyCaption2Split, "LayerUtility: JoyCaption2": LS_JoyCaption2, "LayerUtility: JoyCaption2ExtraOptions": LS_JoyCaptionExtraOptions } NODE_DISPLAY_NAME_MAPPINGS = { + "LayerUtility: LoadJoyCaption2Model": "LayerUtility: Load JoyCaption2 Model", + "LayerUtility: JoyCaption2Split": "LayerUtility: JoyCaption2 Split", "LayerUtility: JoyCaption2": "LayerUtility: JoyCaption2", "LayerUtility: JoyCaption2ExtraOptions": "LayerUtility: JoyCaption2 Extra Options" } \ No newline at end of file diff --git a/py/segment_anything_ultra.py b/py/segment_anything_ultra.py index e687c43..caf1e21 100644 --- a/py/segment_anything_ultra.py +++ b/py/segment_anything_ultra.py @@ -3,12 +3,12 @@ NODE_NAME = 'SegmentAnythingUltra' -SAM_MODEL = None -DINO_MODEL = None - class SegmentAnythingUltra: def __init__(self): - pass + self.SAM_MODEL = None + self.DINO_MODEL = None + self.previous_sam_model = "" + self.previous_dino_model = "" @classmethod def INPUT_TYPES(cls): @@ -24,6 +24,7 @@ def INPUT_TYPES(cls): "white_point": ("FLOAT", {"default": 0.99, "min": 0.02, "max": 0.99, "step": 0.01}), "process_detail": ("BOOLEAN", {"default": True}), "prompt": ("STRING", {"default": "subject"}), + "cache_model": ("BOOLEAN", {"default": False}), }, "optional": { } @@ -36,11 +37,16 @@ def INPUT_TYPES(cls): def segment_anything_ultra(self, image, sam_model, grounding_dino_model, threshold, detail_range, black_point, white_point, process_detail, - prompt, ): - global SAM_MODEL - global DINO_MODEL - if SAM_MODEL is None: SAM_MODEL = load_sam_model(sam_model) - if DINO_MODEL is None: DINO_MODEL = load_groundingdino_model(grounding_dino_model) + prompt, cache_model): + + if self.previous_sam_model != sam_model or self.SAM_MODEL is None: + self.SAM_MODEL = load_sam_model(sam_model) + self.previous_sam_model = sam_model + if self.previous_dino_model != grounding_dino_model or self.DINO_MODEL is None: + self.DINO_MODEL = load_groundingdino_model(grounding_dino_model) + self.previous_dino_model = grounding_dino_model + + ret_images = [] ret_masks = [] @@ -48,10 +54,10 @@ def segment_anything_ultra(self, image, sam_model, grounding_dino_model, thresho i = torch.unsqueeze(i, 0) i = pil2tensor(tensor2pil(i).convert('RGB')) item = tensor2pil(i).convert('RGBA') - boxes = groundingdino_predict(DINO_MODEL, item, prompt, threshold) + boxes = groundingdino_predict(self.DINO_MODEL, item, prompt, threshold) if boxes.shape[0] == 0: break - (_, _mask) = sam_segment(SAM_MODEL, item, boxes) + (_, _mask) = sam_segment(self.SAM_MODEL, item, boxes) _mask = _mask[0] if process_detail: _mask = tensor2pil(mask_edge_detail(i, _mask, detail_range, black_point, white_point)) @@ -66,6 +72,13 @@ def segment_anything_ultra(self, image, sam_model, grounding_dino_model, thresho empty_mask = torch.zeros((1, height, width), dtype=torch.uint8, device="cpu") return (empty_mask, empty_mask) + if not cache_model: + self.SAM_MODEL = None + self.DINO_MODEL = None + self.previous_sam_model = "" + self.previous_dino_model = "" + clear_memory() + log(f"{NODE_NAME} Processed {len(ret_masks)} image(s).", message_type='finish') return (torch.cat(ret_images, dim=0), torch.cat(ret_masks, dim=0),) diff --git a/py/segment_anything_ultra_v2.py b/py/segment_anything_ultra_v2.py index 9b60f13..d60e7b8 100644 --- a/py/segment_anything_ultra_v2.py +++ b/py/segment_anything_ultra_v2.py @@ -3,13 +3,15 @@ NODE_NAME = 'SegmentAnythingUltra V2' -SAM_MODEL = None -DINO_MODEL = None -previous_sam_model = "" -previous_dino_model = "" + + class SegmentAnythingUltraV2: def __init__(self): + self.SAM_MODEL = None + self.DINO_MODEL = None + self.previous_sam_model = "" + self.previous_dino_model = "" pass @classmethod @@ -32,6 +34,7 @@ def INPUT_TYPES(cls): "prompt": ("STRING", {"default": "subject"}), "device": (device_list,), "max_megapixels": ("FLOAT", {"default": 2.0, "min": 1, "max": 999, "step": 0.1}), + "cache_model": ("BOOLEAN", {"default": False}), }, "optional": { } @@ -45,24 +48,23 @@ def INPUT_TYPES(cls): def segment_anything_ultra_v2(self, image, sam_model, grounding_dino_model, threshold, detail_method, detail_erode, detail_dilate, black_point, white_point, process_detail, prompt, - device, max_megapixels + device, max_megapixels, cache_model ): - global SAM_MODEL - global DINO_MODEL - global previous_sam_model - global previous_dino_model if detail_method == 'VITMatte(local)': local_files_only = True else: local_files_only = False - if previous_sam_model != sam_model: - SAM_MODEL = load_sam_model(sam_model) - previous_sam_model = sam_model - if previous_dino_model != grounding_dino_model: - DINO_MODEL = load_groundingdino_model(grounding_dino_model) - previous_dino_model = grounding_dino_model + if self.previous_sam_model != sam_model or self.SAM_MODEL is None: + self.SAM_MODEL = load_sam_model(sam_model) + self.previous_sam_model = sam_model + if self.previous_dino_model != grounding_dino_model or self.DINO_MODEL is None: + self.DINO_MODEL = load_groundingdino_model(grounding_dino_model) + self.previous_dino_model = grounding_dino_model + + SAM_MODEL = load_sam_model(sam_model) + DINO_MODEL = load_groundingdino_model(grounding_dino_model) ret_images = [] ret_masks = [] @@ -70,10 +72,10 @@ def segment_anything_ultra_v2(self, image, sam_model, grounding_dino_model, thre i = torch.unsqueeze(i, 0) i = pil2tensor(tensor2pil(i).convert('RGB')) _image = tensor2pil(i).convert('RGBA') - boxes = groundingdino_predict(DINO_MODEL, _image, prompt, threshold) + boxes = groundingdino_predict(self.DINO_MODEL, _image, prompt, threshold) if boxes.shape[0] == 0: break - (_, _mask) = sam_segment(SAM_MODEL, _image, boxes) + (_, _mask) = sam_segment(self.SAM_MODEL, _image, boxes) _mask = _mask[0] detail_range = detail_erode + detail_dilate if process_detail: @@ -97,6 +99,13 @@ def segment_anything_ultra_v2(self, image, sam_model, grounding_dino_model, thre empty_mask = torch.zeros((1, height, width), dtype=torch.uint8, device="cpu") return (empty_mask, empty_mask) + if not cache_model: + self.SAM_MODEL = None + self.DINO_MODEL = None + self.previous_sam_model = "" + self.previous_dino_model = "" + clear_memory() + log(f"{NODE_NAME} Processed {len(ret_masks)} image(s).", message_type='finish') return (torch.cat(ret_images, dim=0), torch.cat(ret_masks, dim=0),) diff --git a/pyproject.toml b/pyproject.toml index 5b69cf5..cbf149a 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,7 +1,7 @@ [project] name = "comfyui_layerstyle" description = "A set of nodes for ComfyUI it generate image like Adobe Photoshop's Layer Style. the Drop Shadow is first completed node, and follow-up work is in progress." -version = "1.0.78" +version = "1.0.79" license = "MIT" dependencies = ["numpy", "pillow", "torch", "matplotlib", "Scipy", "scikit_image", "scikit_learn", "opencv-contrib-python", "pymatting", "segment_anything", "timm", "addict", "yapf", "colour-science", "wget", "mediapipe", "loguru", "typer_config", "fastapi", "rich", "google-generativeai", "diffusers", "omegaconf", "tqdm", "transformers", "kornia", "image-reward", "ultralytics", "blend_modes", "blind-watermark", "qrcode", "pyzbar", "transparent-background", "huggingface_hub", "accelerate", "bitsandbytes", "torchscale", "wandb", "hydra-core", "psd-tools", "inference-cli[yolo-world]", "inference-gpu[yolo-world]", "onnxruntime", "peft"]