Commit JoyCaption2Split,LoadJoyCaption2Model nodes, SegmentAnythingUl…

…tra nodes add cache_modeloption
chflame163 · Oct 17, 2024 · 999346d · 999346d
1 parent c7c61ff
commit 999346d
Show file tree

Hide file tree

Showing 10 changed files with 217 additions and 29 deletions.
diff --git a/README.MD b/README.MD
@@ -150,6 +150,9 @@ Please try downgrading the ```protobuf``` dependency package to 3.20.3, or set e
 
 <font size="4">**If the dependency package error after updating,  please double clicking ```repair_dependency.bat``` (for Official ComfyUI Protable) or  ```repair_dependency_aki.bat``` (for ComfyUI-aki-v1.x) in the plugin folder to reinstall the dependency packages. </font><br />    
 
+* Commit [JoyCaption2Split](#JoyCaption2Split) and [LoadJoyCaption2Model](#LoadJoyCaption2Model) nodes, Sharing the model across multiple JoyCaption2 nodes improves efficiency.
+* [SegmentAnythingUltra](#SegmentAnythingUltra) and [SegmentAnythingUltraV2](#SegmentAnythingUltraV2) add the  ```cache_model``` option, Easy to flexibly manage VRAM usage.
+
 * Due to the high version requirements of the [LlamaVision](#LlamaVision) node for ```transformers```, which affects the loading of some older third-party plugins, so the LayerStyle plugin has lowered the default requirement to 4.43.2. If you need to run LlamaVision, please upgrade to 4.45.0 or above on your own. 
 
 * Commit [JoyCaption2](#JoyCaption2) and [JoyCaption2ExtraOptions](#JoyCaption2ExtraOptions) nodes. New dependency packages need to be installed.
@@ -952,6 +955,34 @@ Node Options:
 * temperature: The temperature parameter of LLM.
 * cache_model: Whether to cache the model.
 
+### <a id="table1">JoyCaption2Split</a>
+The node of JoyCaption2 separate model loading and inference, and when multiple JoyCaption2 nodes are used, the model can be shared to improve efficiency.
+
+Node Options:   
+![image](image/joycaption2_split_node.jpg)    
+
+* image: Image input.。
+* joy2_model: The JoyCaption model input.
+* extra_options: Input the extra_options.
+* caption_type: Caption type options, including: "Descriptive", "Descriptive (Informal)", "Training Prompt", "MidJourney", "Booru tag list", "Booru-like tag list", "Art Critic", "Product Listing", "Social Media Post".
+* caption_length: The length of caption.
+* user_prompt: User prompt words for LLM model. If there is content here, it will overwrite all the settings for caption_type and extra_options.
+* max_new_tokens: The max_new_token parameter of LLM.
+* do_sample: The do_sample parameter of LLM.
+* top-p: The top_p parameter of LLM.
+* temperature: The temperature parameter of LLM.
+
+### <a id="table1">LoadJoyCaption2Model</a>
+JoyCaption2's model loading node, used in conjunction with JoyCaption2Split.
+
+Node Options:   
+![image](image/load_joycaption2_model_node.jpg)    
+
+* llm_model: There are two LLM models to choose, Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2 and unsloth/Meta-Llama-3.1-8B-Instruct.
+* device: Model loading device. Currently, only CUDA is supported.
+* dtype: Model precision, nf4 and bf16.
+* vlm_lora: Whether to load text_madel.
+
 ### <a id="table1">JoyCaption2ExtraOptions</a>
 The extra_options parameter node of JoyCaption2.
 
@@ -2065,6 +2096,7 @@ Node options:
 * white_point: Edge white sampling threshold.
 * process_detail: Set to false here will skip edge processing to save runtime.
 * prompt: Input for SAM's prompt.
+* cache_model: Set whether to cache the model.
 
 ### <a id="table1">SegmentAnythingUltraV2</a>
 

diff --git a/README_CN.MD b/README_CN.MD
@@ -127,6 +127,8 @@ If this call came from a _pb2.py file, your generated code is out of date and mu
 ## 更新说明
 <font size="4">**如果本插件更新后出现依赖包错误，请双击运行插件目录下的```install_requirements.bat```(官方便携包)，或 ```install_requirements_aki.bat```(秋叶整合包) 重新安装依赖包。
 
+* 添加 [JoyCaption2Split](#JoyCaption2Split) 和 [LoadJoyCaption2Model](#LoadJoyCaption2Model) 节点，在多个JoyCaption2节点时共用模型提高效率。
+* [SegmentAnythingUltra](#SegmentAnythingUltra) 和 [SegmentAnythingUltraV2](#SegmentAnythingUltraV2) 增加 ```cache_model``` 参数，便于灵活管理显存。
 * 鉴于[LlamaVision](#LlamaVision)节点对 ```transformers``` 的要求版本较高而影响某些旧版第三方插件的加载，LayerStyle 插件已将默认要求降低到4.43.2， 如有运行LlamaVision的需求请自行升级至4.45.0以上。
 * 添加 [JoyCaption2](#JoyCaption2) 和 [JoyCaption2ExtraOptions](#JoyCaption2ExtraOptions) 节点，使用JoyCaption-alpha-two模型生成提示词。
 请从 [百度网盘](https://pan.baidu.com/s/1dOjbUEacUOhzFitAQ3uIeQ?pwd=4ypv) 以及 [百度网盘](https://pan.baidu.com/s/1mH1SuW45Dy6Wga7aws5siQ?pwd=w6h5) ， 
@@ -849,6 +851,35 @@ ImageScaleByAspectRatio的V2升级版
 * temperature: LLM的temperature参数。
 * cache_model: 是否缓存模型。
 
+### <a id="table1">JoyCaption2Split</a>
+JoyCaption2 的分离式节点，将模型加载与推理分离，使用多个JoyCaption2节点时可共用模型提高效率。
+
+节点选项说明:   
+![image](image/joycaption2_split_node.jpg)    
+
+* image: 图片输入。
+* joy2_model: JoyCaption模型输入。
+* extra_options: extra_options参数输入。
+* caption_type: caption类型选项, 包括"Descriptive"(正式语气描述), "Descriptive (Informal)"(非正式语气描述), "Training Prompt"(SD训练描述), "MidJourney"(MJ风格描述), "Booru tag list"(标签列表), "Booru-like tag list"(类标签列表), "Art Critic"(艺术评论), "Product Listing"(产品列表), "Social Media Post"(社交媒体风格)。
+* caption_length: 描述长度。
+* user_prompt: LLM模型的用户提示词。如果这里有内容将覆盖caption_type和extra_options的所有设置。
+* max_new_tokens: LLM的max_new_tokens参数。
+* do_sample: LLM的do_sample参数。
+* top-p: LLM的top_p参数。
+* temperature: LLM的temperature参数。
+
+### <a id="table1">LoadJoyCaption2Model</a>
+JoyCaption2 的模型加载节点，与JoyCaption2Split配合使用。
+
+节点选项说明:   
+![image](image/load_joycaption2_model_node.jpg)    
+
+* llm_model: 目前有 Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2 和 unsloth/Meta-Llama-3.1-8B-Instruct 两种LLM模型可选择。
+* device: 模型加载设备。目前仅支持cuda。
+* dtype: 模型加载精度，有nf4 和 bf16 两个选项。
+* vlm_lora: 是否加载text_model。
+
+
 ### <a id="table1">JoyCaption2ExtraOptions</a>
 JoyCaption2的extra_options参数节点。 
 
@@ -1843,6 +1874,7 @@ mask为可选输入项，如果这里输入遮罩，将作用于输出结果。
 * white_point: 边缘白色采样阈值。
 * process_detail: 此处设为False将跳过边缘处理以节省运行时间。
 * prompt: SAM的prompt输入。
+* cache_model: 是否缓存模型。
 
 ### <a id="table1">SegmentAnythingUltraV2</a>
 SegmentAnythingUltra的V2升级版，增加了VITMatte边缘处理方法。

diff --git a/image/joycaption2_split_node.jpg b/image/joycaption2_split_node.jpg
diff --git a/image/load_joycaption2_model_node.jpg b/image/load_joycaption2_model_node.jpg
diff --git a/image/segment_anything_ultra_node.jpg b/image/segment_anything_ultra_node.jpg
diff --git a/image/segment_anything_ultra_v2_node.jpg b/image/segment_anything_ultra_v2_node.jpg
diff --git a/py/joycaption_alpha_2.py b/py/joycaption_alpha_2.py
@@ -524,13 +524,115 @@ def joycaption2(self, image, llm_model, device, dtype, vlm_lora, caption_type, c
 
         return (ret_text,)
 
+class LS_LoadJoyCaption2Model:
+
+    CATEGORY = '😺dzNodes/LayerUtility'
+    FUNCTION = "load_joycaption2_model"
+    RETURN_TYPES = ("JoyCaption2_Model",)
+    RETURN_NAMES = ("joy2_model",)
+    OUTPUT_IS_LIST = (True,)
+
+    def __init__(self):
+        self.NODE_NAME = 'LoadJoyCaption2Model'
+
+    @classmethod
+    def INPUT_TYPES(self):
+        llm_model_list = ["Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2", "unsloth/Meta-Llama-3.1-8B-Instruct"]
+        device_list = ['cuda']
+        dtype_list = ['nf4','bf16']
+        vlm_lora_list = ['text_model', 'none']
+
+        return {
+            "required": {
+                "llm_model": (llm_model_list,),
+                "device": (device_list,),
+                "dtype": (dtype_list,),
+                "vlm_lora": (vlm_lora_list,),
+            },
+            "optional": {
+            }
+        }
+
+    def load_joycaption2_model(self, llm_model, device, dtype, vlm_lora):
+        llm_model_path = download_hg_model(llm_model, "LLM")
+        model = load_models(llm_model_path, dtype, vlm_lora, device)
+
+        return ([[model,device]],)
+
+class LS_JoyCaption2Split:
+
+    CATEGORY = '😺dzNodes/LayerUtility'
+    FUNCTION = "joycaption2split"
+    RETURN_TYPES = ("STRING",)
+    RETURN_NAMES = ("text",)
+    OUTPUT_IS_LIST = (True,)
+
+    def __init__(self):
+        self.NODE_NAME = 'JoyCaption2split'
+        self.previous_model = None
+
+    @classmethod
+    def INPUT_TYPES(self):
+        caption_type_list = ["Descriptive", "Descriptive (Informal)", "Training Prompt", "MidJourney",
+                   "Booru tag list", "Booru-like tag list", "Art Critic", "Product Listing",
+                   "Social Media Post"]
+        caption_length_list = ["any", "very short", "short", "medium-length", "long", "very long"] + [str(i) for i in range(20, 261, 10)]
+
+        return {
+            "required": {
+                "image": ("IMAGE",),
+                "joy2_model": ("JoyCaption2_Model",),
+                "caption_type": (caption_type_list,),
+                "caption_length": (caption_length_list,),
+                "user_prompt": ("STRING", {"default": "","multiline": False}),
+                "max_new_tokens": ("INT", {"default": 300, "min": 8, "max": 4096, "step": 1}),
+                "top_p": ("FLOAT", {"default": 0.9, "min": 0, "max": 1, "step": 0.01}),
+                "temperature": ("FLOAT", {"default": 0.6, "min": 0, "max": 1, "step": 0.01}),
+            },
+            "optional": {
+                "extra_options": ("JoyCaption2ExtraOption",),
+            }
+        }
+
+    def joycaption2split(self, image, joy2_model, caption_type, caption_length,
+                         user_prompt, max_new_tokens, top_p, temperature,
+                         extra_options=None):
+
+        model, device = joy2_model
+        # device = "cuda"
+        ret_text = []
+        extra = []
+        character_name = ""
+        if extra_options is not None:
+            extra, character_name = extra_options
+
+        for img in image:
+            img = tensor2pil(img.unsqueeze(0)).convert('RGB')
+            # log(f"{self.NODE_NAME}: caption_type={caption_type}, caption_length={caption_length}, extra={extra}, character_name={character_name}, user_prompt={user_prompt}")
+            caption = stream_chat([img], caption_type, caption_length,
+                                   extra, character_name, user_prompt,
+                                   max_new_tokens, top_p, temperature, 1,
+                                   model, device)
+            log(f"{self.NODE_NAME}: caption={caption[0]}")
+            ret_text.append(caption[0])
+
+        del joy2_model
+        del model, device
+        clear_memory()
+
+        return (ret_text,)
+
 
 NODE_CLASS_MAPPINGS = {
+    "LayerUtility: LoadJoyCaption2Model": LS_LoadJoyCaption2Model,
+    "LayerUtility: JoyCaption2Split": LS_JoyCaption2Split,
     "LayerUtility: JoyCaption2": LS_JoyCaption2,
     "LayerUtility: JoyCaption2ExtraOptions": LS_JoyCaptionExtraOptions
 }
 
 NODE_DISPLAY_NAME_MAPPINGS = {
+    "LayerUtility: LoadJoyCaption2Model": "LayerUtility: Load JoyCaption2 Model",
+    "LayerUtility: JoyCaption2Split": "LayerUtility: JoyCaption2 Split",
     "LayerUtility: JoyCaption2": "LayerUtility: JoyCaption2",
     "LayerUtility: JoyCaption2ExtraOptions": "LayerUtility: JoyCaption2 Extra Options"
 }
diff --git a/py/segment_anything_ultra.py b/py/segment_anything_ultra.py
@@ -3,12 +3,12 @@
 
 NODE_NAME = 'SegmentAnythingUltra'
 
-SAM_MODEL = None
-DINO_MODEL = None
-
 class SegmentAnythingUltra:
     def __init__(self):
-        pass
+        self.SAM_MODEL = None
+        self.DINO_MODEL = None
+        self.previous_sam_model = ""
+        self.previous_dino_model = ""
 
     @classmethod
     def INPUT_TYPES(cls):
@@ -24,6 +24,7 @@ def INPUT_TYPES(cls):
                 "white_point": ("FLOAT", {"default": 0.99, "min": 0.02, "max": 0.99, "step": 0.01}),
                 "process_detail": ("BOOLEAN", {"default": True}),
                 "prompt": ("STRING", {"default": "subject"}),
+                "cache_model": ("BOOLEAN", {"default": False}),
             },
             "optional": {
             }
@@ -36,22 +37,27 @@ def INPUT_TYPES(cls):
 
     def segment_anything_ultra(self, image, sam_model, grounding_dino_model, threshold,
                                detail_range, black_point, white_point, process_detail,
-                               prompt, ):
-        global SAM_MODEL
-        global DINO_MODEL
-        if SAM_MODEL is None: SAM_MODEL = load_sam_model(sam_model)
-        if DINO_MODEL is None: DINO_MODEL = load_groundingdino_model(grounding_dino_model)
+                               prompt, cache_model):
+
+        if self.previous_sam_model != sam_model or self.SAM_MODEL is None:
+            self.SAM_MODEL = load_sam_model(sam_model)
+            self.previous_sam_model = sam_model
+        if self.previous_dino_model != grounding_dino_model or self.DINO_MODEL is None:
+            self.DINO_MODEL = load_groundingdino_model(grounding_dino_model)
+            self.previous_dino_model = grounding_dino_model
+
+
         ret_images = []
         ret_masks = []
 
         for i in image:
             i = torch.unsqueeze(i, 0)
             i = pil2tensor(tensor2pil(i).convert('RGB'))
             item = tensor2pil(i).convert('RGBA')
-            boxes = groundingdino_predict(DINO_MODEL, item, prompt, threshold)
+            boxes = groundingdino_predict(self.DINO_MODEL, item, prompt, threshold)
             if boxes.shape[0] == 0:
                 break
-            (_, _mask) = sam_segment(SAM_MODEL, item, boxes)
+            (_, _mask) = sam_segment(self.SAM_MODEL, item, boxes)
             _mask = _mask[0]
             if process_detail:
                 _mask = tensor2pil(mask_edge_detail(i, _mask, detail_range, black_point, white_point))
@@ -66,6 +72,13 @@ def segment_anything_ultra(self, image, sam_model, grounding_dino_model, thresho
             empty_mask = torch.zeros((1, height, width), dtype=torch.uint8, device="cpu")
             return (empty_mask, empty_mask)
 
+        if not cache_model:
+            self.SAM_MODEL = None
+            self.DINO_MODEL = None
+            self.previous_sam_model = ""
+            self.previous_dino_model = ""
+            clear_memory()
+
         log(f"{NODE_NAME} Processed {len(ret_masks)} image(s).", message_type='finish')
         return (torch.cat(ret_images, dim=0), torch.cat(ret_masks, dim=0),)
 

diff --git a/py/segment_anything_ultra_v2.py b/py/segment_anything_ultra_v2.py
@@ -3,13 +3,15 @@
 
 NODE_NAME = 'SegmentAnythingUltra V2'
 
-SAM_MODEL = None
-DINO_MODEL = None
-previous_sam_model = ""
-previous_dino_model = ""
+
+
 
 class SegmentAnythingUltraV2:
     def __init__(self):
+        self.SAM_MODEL = None
+        self.DINO_MODEL = None
+        self.previous_sam_model = ""
+        self.previous_dino_model = ""
         pass
 
     @classmethod
@@ -32,6 +34,7 @@ def INPUT_TYPES(cls):
                 "prompt": ("STRING", {"default": "subject"}),
                 "device": (device_list,),
                 "max_megapixels": ("FLOAT", {"default": 2.0, "min": 1, "max": 999, "step": 0.1}),
+                "cache_model": ("BOOLEAN", {"default": False}),
             },
             "optional": {
             }
@@ -45,35 +48,34 @@ def INPUT_TYPES(cls):
     def segment_anything_ultra_v2(self, image, sam_model, grounding_dino_model, threshold,
                                   detail_method, detail_erode, detail_dilate,
                                   black_point, white_point, process_detail, prompt,
-                                  device, max_megapixels
+                                  device, max_megapixels, cache_model
                                   ):
-        global SAM_MODEL
-        global DINO_MODEL
-        global previous_sam_model
-        global previous_dino_model
 
         if detail_method == 'VITMatte(local)':
             local_files_only = True
         else:
             local_files_only = False
 
-        if previous_sam_model != sam_model:
-            SAM_MODEL = load_sam_model(sam_model)
-            previous_sam_model = sam_model
-        if previous_dino_model != grounding_dino_model:
-            DINO_MODEL = load_groundingdino_model(grounding_dino_model)
-            previous_dino_model = grounding_dino_model
+        if self.previous_sam_model != sam_model or self.SAM_MODEL is None:
+            self.SAM_MODEL = load_sam_model(sam_model)
+            self.previous_sam_model = sam_model
+        if self.previous_dino_model != grounding_dino_model or self.DINO_MODEL is None:
+            self.DINO_MODEL = load_groundingdino_model(grounding_dino_model)
+            self.previous_dino_model = grounding_dino_model
+
+        SAM_MODEL = load_sam_model(sam_model)
+        DINO_MODEL = load_groundingdino_model(grounding_dino_model)
         ret_images = []
         ret_masks = []
 
         for i in image:
             i = torch.unsqueeze(i, 0)
             i = pil2tensor(tensor2pil(i).convert('RGB'))
             _image = tensor2pil(i).convert('RGBA')
-            boxes = groundingdino_predict(DINO_MODEL, _image, prompt, threshold)
+            boxes = groundingdino_predict(self.DINO_MODEL, _image, prompt, threshold)
             if boxes.shape[0] == 0:
                 break
-            (_, _mask) = sam_segment(SAM_MODEL, _image, boxes)
+            (_, _mask) = sam_segment(self.SAM_MODEL, _image, boxes)
             _mask = _mask[0]
             detail_range = detail_erode + detail_dilate
             if process_detail:
@@ -97,6 +99,13 @@ def segment_anything_ultra_v2(self, image, sam_model, grounding_dino_model, thre
             empty_mask = torch.zeros((1, height, width), dtype=torch.uint8, device="cpu")
             return (empty_mask, empty_mask)
 
+        if not cache_model:
+            self.SAM_MODEL = None
+            self.DINO_MODEL = None
+            self.previous_sam_model = ""
+            self.previous_dino_model = ""
+            clear_memory()
+
         log(f"{NODE_NAME} Processed {len(ret_masks)} image(s).", message_type='finish')
         return (torch.cat(ret_images, dim=0), torch.cat(ret_masks, dim=0),)
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -1,7 +1,7 @@
 [project]
 name = "comfyui_layerstyle"
 description = "A set of nodes for ComfyUI it generate image like Adobe Photoshop's Layer Style. the Drop Shadow is first completed node, and follow-up work is in progress."
-version = "1.0.78"
+version = "1.0.79"
 license = "MIT"
 dependencies = ["numpy", "pillow", "torch", "matplotlib", "Scipy", "scikit_image", "scikit_learn", "opencv-contrib-python", "pymatting", "segment_anything", "timm", "addict", "yapf", "colour-science", "wget", "mediapipe", "loguru", "typer_config", "fastapi", "rich", "google-generativeai", "diffusers", "omegaconf", "tqdm", "transformers", "kornia", "image-reward", "ultralytics", "blend_modes", "blind-watermark", "qrcode", "pyzbar", "transparent-background", "huggingface_hub", "accelerate", "bitsandbytes", "torchscale", "wandb", "hydra-core", "psd-tools", "inference-cli[yolo-world]", "inference-gpu[yolo-world]", "onnxruntime", "peft"]