diff --git a/README.MD b/README.MD
index c20954f..3fa16a6 100644
--- a/README.MD
+++ b/README.MD
@@ -150,6 +150,9 @@ Please try downgrading the ```protobuf``` dependency package to 3.20.3, or set e
**If the dependency package error after updating, please double clicking ```repair_dependency.bat``` (for Official ComfyUI Protable) or ```repair_dependency_aki.bat``` (for ComfyUI-aki-v1.x) in the plugin folder to reinstall the dependency packages.
+* Commit [JoyCaption2Split](#JoyCaption2Split) and [LoadJoyCaption2Model](#LoadJoyCaption2Model) nodes, Sharing the model across multiple JoyCaption2 nodes improves efficiency.
+* [SegmentAnythingUltra](#SegmentAnythingUltra) and [SegmentAnythingUltraV2](#SegmentAnythingUltraV2) add the ```cache_model``` option, Easy to flexibly manage VRAM usage.
+
* Due to the high version requirements of the [LlamaVision](#LlamaVision) node for ```transformers```, which affects the loading of some older third-party plugins, so the LayerStyle plugin has lowered the default requirement to 4.43.2. If you need to run LlamaVision, please upgrade to 4.45.0 or above on your own.
* Commit [JoyCaption2](#JoyCaption2) and [JoyCaption2ExtraOptions](#JoyCaption2ExtraOptions) nodes. New dependency packages need to be installed.
@@ -952,6 +955,34 @@ Node Options:
* temperature: The temperature parameter of LLM.
* cache_model: Whether to cache the model.
+### JoyCaption2Split
+The node of JoyCaption2 separate model loading and inference, and when multiple JoyCaption2 nodes are used, the model can be shared to improve efficiency.
+
+Node Options:
+![image](image/joycaption2_split_node.jpg)
+
+* image: Image input.。
+* joy2_model: The JoyCaption model input.
+* extra_options: Input the extra_options.
+* caption_type: Caption type options, including: "Descriptive", "Descriptive (Informal)", "Training Prompt", "MidJourney", "Booru tag list", "Booru-like tag list", "Art Critic", "Product Listing", "Social Media Post".
+* caption_length: The length of caption.
+* user_prompt: User prompt words for LLM model. If there is content here, it will overwrite all the settings for caption_type and extra_options.
+* max_new_tokens: The max_new_token parameter of LLM.
+* do_sample: The do_sample parameter of LLM.
+* top-p: The top_p parameter of LLM.
+* temperature: The temperature parameter of LLM.
+
+### LoadJoyCaption2Model
+JoyCaption2's model loading node, used in conjunction with JoyCaption2Split.
+
+Node Options:
+![image](image/load_joycaption2_model_node.jpg)
+
+* llm_model: There are two LLM models to choose, Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2 and unsloth/Meta-Llama-3.1-8B-Instruct.
+* device: Model loading device. Currently, only CUDA is supported.
+* dtype: Model precision, nf4 and bf16.
+* vlm_lora: Whether to load text_madel.
+
### JoyCaption2ExtraOptions
The extra_options parameter node of JoyCaption2.
@@ -2065,6 +2096,7 @@ Node options:
* white_point: Edge white sampling threshold.
* process_detail: Set to false here will skip edge processing to save runtime.
* prompt: Input for SAM's prompt.
+* cache_model: Set whether to cache the model.
### SegmentAnythingUltraV2
diff --git a/README_CN.MD b/README_CN.MD
index ab2bd2b..847a65a 100644
--- a/README_CN.MD
+++ b/README_CN.MD
@@ -127,6 +127,8 @@ If this call came from a _pb2.py file, your generated code is out of date and mu
## 更新说明
**如果本插件更新后出现依赖包错误,请双击运行插件目录下的```install_requirements.bat```(官方便携包),或 ```install_requirements_aki.bat```(秋叶整合包) 重新安装依赖包。
+* 添加 [JoyCaption2Split](#JoyCaption2Split) 和 [LoadJoyCaption2Model](#LoadJoyCaption2Model) 节点,在多个JoyCaption2节点时共用模型提高效率。
+* [SegmentAnythingUltra](#SegmentAnythingUltra) 和 [SegmentAnythingUltraV2](#SegmentAnythingUltraV2) 增加 ```cache_model``` 参数,便于灵活管理显存。
* 鉴于[LlamaVision](#LlamaVision)节点对 ```transformers``` 的要求版本较高而影响某些旧版第三方插件的加载,LayerStyle 插件已将默认要求降低到4.43.2, 如有运行LlamaVision的需求请自行升级至4.45.0以上。
* 添加 [JoyCaption2](#JoyCaption2) 和 [JoyCaption2ExtraOptions](#JoyCaption2ExtraOptions) 节点,使用JoyCaption-alpha-two模型生成提示词。
请从 [百度网盘](https://pan.baidu.com/s/1dOjbUEacUOhzFitAQ3uIeQ?pwd=4ypv) 以及 [百度网盘](https://pan.baidu.com/s/1mH1SuW45Dy6Wga7aws5siQ?pwd=w6h5) ,
@@ -849,6 +851,35 @@ ImageScaleByAspectRatio的V2升级版
* temperature: LLM的temperature参数。
* cache_model: 是否缓存模型。
+### JoyCaption2Split
+JoyCaption2 的分离式节点,将模型加载与推理分离,使用多个JoyCaption2节点时可共用模型提高效率。
+
+节点选项说明:
+![image](image/joycaption2_split_node.jpg)
+
+* image: 图片输入。
+* joy2_model: JoyCaption模型输入。
+* extra_options: extra_options参数输入。
+* caption_type: caption类型选项, 包括"Descriptive"(正式语气描述), "Descriptive (Informal)"(非正式语气描述), "Training Prompt"(SD训练描述), "MidJourney"(MJ风格描述), "Booru tag list"(标签列表), "Booru-like tag list"(类标签列表), "Art Critic"(艺术评论), "Product Listing"(产品列表), "Social Media Post"(社交媒体风格)。
+* caption_length: 描述长度。
+* user_prompt: LLM模型的用户提示词。如果这里有内容将覆盖caption_type和extra_options的所有设置。
+* max_new_tokens: LLM的max_new_tokens参数。
+* do_sample: LLM的do_sample参数。
+* top-p: LLM的top_p参数。
+* temperature: LLM的temperature参数。
+
+### LoadJoyCaption2Model
+JoyCaption2 的模型加载节点,与JoyCaption2Split配合使用。
+
+节点选项说明:
+![image](image/load_joycaption2_model_node.jpg)
+
+* llm_model: 目前有 Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2 和 unsloth/Meta-Llama-3.1-8B-Instruct 两种LLM模型可选择。
+* device: 模型加载设备。目前仅支持cuda。
+* dtype: 模型加载精度,有nf4 和 bf16 两个选项。
+* vlm_lora: 是否加载text_model。
+
+
### JoyCaption2ExtraOptions
JoyCaption2的extra_options参数节点。
@@ -1843,6 +1874,7 @@ mask为可选输入项,如果这里输入遮罩,将作用于输出结果。
* white_point: 边缘白色采样阈值。
* process_detail: 此处设为False将跳过边缘处理以节省运行时间。
* prompt: SAM的prompt输入。
+* cache_model: 是否缓存模型。
### SegmentAnythingUltraV2
SegmentAnythingUltra的V2升级版,增加了VITMatte边缘处理方法。
diff --git a/image/joycaption2_split_node.jpg b/image/joycaption2_split_node.jpg
new file mode 100644
index 0000000..4eccf6f
Binary files /dev/null and b/image/joycaption2_split_node.jpg differ
diff --git a/image/load_joycaption2_model_node.jpg b/image/load_joycaption2_model_node.jpg
new file mode 100644
index 0000000..7b6ae05
Binary files /dev/null and b/image/load_joycaption2_model_node.jpg differ
diff --git a/image/segment_anything_ultra_node.jpg b/image/segment_anything_ultra_node.jpg
index a50adb1..9a9e7fc 100644
Binary files a/image/segment_anything_ultra_node.jpg and b/image/segment_anything_ultra_node.jpg differ
diff --git a/image/segment_anything_ultra_v2_node.jpg b/image/segment_anything_ultra_v2_node.jpg
index fa12110..ea31e86 100644
Binary files a/image/segment_anything_ultra_v2_node.jpg and b/image/segment_anything_ultra_v2_node.jpg differ
diff --git a/py/joycaption_alpha_2.py b/py/joycaption_alpha_2.py
index 9ca73aa..bfb60a3 100644
--- a/py/joycaption_alpha_2.py
+++ b/py/joycaption_alpha_2.py
@@ -524,13 +524,115 @@ def joycaption2(self, image, llm_model, device, dtype, vlm_lora, caption_type, c
return (ret_text,)
+class LS_LoadJoyCaption2Model:
+
+ CATEGORY = '😺dzNodes/LayerUtility'
+ FUNCTION = "load_joycaption2_model"
+ RETURN_TYPES = ("JoyCaption2_Model",)
+ RETURN_NAMES = ("joy2_model",)
+ OUTPUT_IS_LIST = (True,)
+
+ def __init__(self):
+ self.NODE_NAME = 'LoadJoyCaption2Model'
+
+ @classmethod
+ def INPUT_TYPES(self):
+ llm_model_list = ["Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2", "unsloth/Meta-Llama-3.1-8B-Instruct"]
+ device_list = ['cuda']
+ dtype_list = ['nf4','bf16']
+ vlm_lora_list = ['text_model', 'none']
+
+ return {
+ "required": {
+ "llm_model": (llm_model_list,),
+ "device": (device_list,),
+ "dtype": (dtype_list,),
+ "vlm_lora": (vlm_lora_list,),
+ },
+ "optional": {
+ }
+ }
+
+ def load_joycaption2_model(self, llm_model, device, dtype, vlm_lora):
+ llm_model_path = download_hg_model(llm_model, "LLM")
+ model = load_models(llm_model_path, dtype, vlm_lora, device)
+
+ return ([[model,device]],)
+
+class LS_JoyCaption2Split:
+
+ CATEGORY = '😺dzNodes/LayerUtility'
+ FUNCTION = "joycaption2split"
+ RETURN_TYPES = ("STRING",)
+ RETURN_NAMES = ("text",)
+ OUTPUT_IS_LIST = (True,)
+
+ def __init__(self):
+ self.NODE_NAME = 'JoyCaption2split'
+ self.previous_model = None
+
+ @classmethod
+ def INPUT_TYPES(self):
+ caption_type_list = ["Descriptive", "Descriptive (Informal)", "Training Prompt", "MidJourney",
+ "Booru tag list", "Booru-like tag list", "Art Critic", "Product Listing",
+ "Social Media Post"]
+ caption_length_list = ["any", "very short", "short", "medium-length", "long", "very long"] + [str(i) for i in range(20, 261, 10)]
+
+ return {
+ "required": {
+ "image": ("IMAGE",),
+ "joy2_model": ("JoyCaption2_Model",),
+ "caption_type": (caption_type_list,),
+ "caption_length": (caption_length_list,),
+ "user_prompt": ("STRING", {"default": "","multiline": False}),
+ "max_new_tokens": ("INT", {"default": 300, "min": 8, "max": 4096, "step": 1}),
+ "top_p": ("FLOAT", {"default": 0.9, "min": 0, "max": 1, "step": 0.01}),
+ "temperature": ("FLOAT", {"default": 0.6, "min": 0, "max": 1, "step": 0.01}),
+ },
+ "optional": {
+ "extra_options": ("JoyCaption2ExtraOption",),
+ }
+ }
+
+ def joycaption2split(self, image, joy2_model, caption_type, caption_length,
+ user_prompt, max_new_tokens, top_p, temperature,
+ extra_options=None):
+
+ model, device = joy2_model
+ # device = "cuda"
+ ret_text = []
+ extra = []
+ character_name = ""
+ if extra_options is not None:
+ extra, character_name = extra_options
+
+ for img in image:
+ img = tensor2pil(img.unsqueeze(0)).convert('RGB')
+ # log(f"{self.NODE_NAME}: caption_type={caption_type}, caption_length={caption_length}, extra={extra}, character_name={character_name}, user_prompt={user_prompt}")
+ caption = stream_chat([img], caption_type, caption_length,
+ extra, character_name, user_prompt,
+ max_new_tokens, top_p, temperature, 1,
+ model, device)
+ log(f"{self.NODE_NAME}: caption={caption[0]}")
+ ret_text.append(caption[0])
+
+ del joy2_model
+ del model, device
+ clear_memory()
+
+ return (ret_text,)
+
NODE_CLASS_MAPPINGS = {
+ "LayerUtility: LoadJoyCaption2Model": LS_LoadJoyCaption2Model,
+ "LayerUtility: JoyCaption2Split": LS_JoyCaption2Split,
"LayerUtility: JoyCaption2": LS_JoyCaption2,
"LayerUtility: JoyCaption2ExtraOptions": LS_JoyCaptionExtraOptions
}
NODE_DISPLAY_NAME_MAPPINGS = {
+ "LayerUtility: LoadJoyCaption2Model": "LayerUtility: Load JoyCaption2 Model",
+ "LayerUtility: JoyCaption2Split": "LayerUtility: JoyCaption2 Split",
"LayerUtility: JoyCaption2": "LayerUtility: JoyCaption2",
"LayerUtility: JoyCaption2ExtraOptions": "LayerUtility: JoyCaption2 Extra Options"
}
\ No newline at end of file
diff --git a/py/segment_anything_ultra.py b/py/segment_anything_ultra.py
index e687c43..caf1e21 100644
--- a/py/segment_anything_ultra.py
+++ b/py/segment_anything_ultra.py
@@ -3,12 +3,12 @@
NODE_NAME = 'SegmentAnythingUltra'
-SAM_MODEL = None
-DINO_MODEL = None
-
class SegmentAnythingUltra:
def __init__(self):
- pass
+ self.SAM_MODEL = None
+ self.DINO_MODEL = None
+ self.previous_sam_model = ""
+ self.previous_dino_model = ""
@classmethod
def INPUT_TYPES(cls):
@@ -24,6 +24,7 @@ def INPUT_TYPES(cls):
"white_point": ("FLOAT", {"default": 0.99, "min": 0.02, "max": 0.99, "step": 0.01}),
"process_detail": ("BOOLEAN", {"default": True}),
"prompt": ("STRING", {"default": "subject"}),
+ "cache_model": ("BOOLEAN", {"default": False}),
},
"optional": {
}
@@ -36,11 +37,16 @@ def INPUT_TYPES(cls):
def segment_anything_ultra(self, image, sam_model, grounding_dino_model, threshold,
detail_range, black_point, white_point, process_detail,
- prompt, ):
- global SAM_MODEL
- global DINO_MODEL
- if SAM_MODEL is None: SAM_MODEL = load_sam_model(sam_model)
- if DINO_MODEL is None: DINO_MODEL = load_groundingdino_model(grounding_dino_model)
+ prompt, cache_model):
+
+ if self.previous_sam_model != sam_model or self.SAM_MODEL is None:
+ self.SAM_MODEL = load_sam_model(sam_model)
+ self.previous_sam_model = sam_model
+ if self.previous_dino_model != grounding_dino_model or self.DINO_MODEL is None:
+ self.DINO_MODEL = load_groundingdino_model(grounding_dino_model)
+ self.previous_dino_model = grounding_dino_model
+
+
ret_images = []
ret_masks = []
@@ -48,10 +54,10 @@ def segment_anything_ultra(self, image, sam_model, grounding_dino_model, thresho
i = torch.unsqueeze(i, 0)
i = pil2tensor(tensor2pil(i).convert('RGB'))
item = tensor2pil(i).convert('RGBA')
- boxes = groundingdino_predict(DINO_MODEL, item, prompt, threshold)
+ boxes = groundingdino_predict(self.DINO_MODEL, item, prompt, threshold)
if boxes.shape[0] == 0:
break
- (_, _mask) = sam_segment(SAM_MODEL, item, boxes)
+ (_, _mask) = sam_segment(self.SAM_MODEL, item, boxes)
_mask = _mask[0]
if process_detail:
_mask = tensor2pil(mask_edge_detail(i, _mask, detail_range, black_point, white_point))
@@ -66,6 +72,13 @@ def segment_anything_ultra(self, image, sam_model, grounding_dino_model, thresho
empty_mask = torch.zeros((1, height, width), dtype=torch.uint8, device="cpu")
return (empty_mask, empty_mask)
+ if not cache_model:
+ self.SAM_MODEL = None
+ self.DINO_MODEL = None
+ self.previous_sam_model = ""
+ self.previous_dino_model = ""
+ clear_memory()
+
log(f"{NODE_NAME} Processed {len(ret_masks)} image(s).", message_type='finish')
return (torch.cat(ret_images, dim=0), torch.cat(ret_masks, dim=0),)
diff --git a/py/segment_anything_ultra_v2.py b/py/segment_anything_ultra_v2.py
index 9b60f13..d60e7b8 100644
--- a/py/segment_anything_ultra_v2.py
+++ b/py/segment_anything_ultra_v2.py
@@ -3,13 +3,15 @@
NODE_NAME = 'SegmentAnythingUltra V2'
-SAM_MODEL = None
-DINO_MODEL = None
-previous_sam_model = ""
-previous_dino_model = ""
+
+
class SegmentAnythingUltraV2:
def __init__(self):
+ self.SAM_MODEL = None
+ self.DINO_MODEL = None
+ self.previous_sam_model = ""
+ self.previous_dino_model = ""
pass
@classmethod
@@ -32,6 +34,7 @@ def INPUT_TYPES(cls):
"prompt": ("STRING", {"default": "subject"}),
"device": (device_list,),
"max_megapixels": ("FLOAT", {"default": 2.0, "min": 1, "max": 999, "step": 0.1}),
+ "cache_model": ("BOOLEAN", {"default": False}),
},
"optional": {
}
@@ -45,24 +48,23 @@ def INPUT_TYPES(cls):
def segment_anything_ultra_v2(self, image, sam_model, grounding_dino_model, threshold,
detail_method, detail_erode, detail_dilate,
black_point, white_point, process_detail, prompt,
- device, max_megapixels
+ device, max_megapixels, cache_model
):
- global SAM_MODEL
- global DINO_MODEL
- global previous_sam_model
- global previous_dino_model
if detail_method == 'VITMatte(local)':
local_files_only = True
else:
local_files_only = False
- if previous_sam_model != sam_model:
- SAM_MODEL = load_sam_model(sam_model)
- previous_sam_model = sam_model
- if previous_dino_model != grounding_dino_model:
- DINO_MODEL = load_groundingdino_model(grounding_dino_model)
- previous_dino_model = grounding_dino_model
+ if self.previous_sam_model != sam_model or self.SAM_MODEL is None:
+ self.SAM_MODEL = load_sam_model(sam_model)
+ self.previous_sam_model = sam_model
+ if self.previous_dino_model != grounding_dino_model or self.DINO_MODEL is None:
+ self.DINO_MODEL = load_groundingdino_model(grounding_dino_model)
+ self.previous_dino_model = grounding_dino_model
+
+ SAM_MODEL = load_sam_model(sam_model)
+ DINO_MODEL = load_groundingdino_model(grounding_dino_model)
ret_images = []
ret_masks = []
@@ -70,10 +72,10 @@ def segment_anything_ultra_v2(self, image, sam_model, grounding_dino_model, thre
i = torch.unsqueeze(i, 0)
i = pil2tensor(tensor2pil(i).convert('RGB'))
_image = tensor2pil(i).convert('RGBA')
- boxes = groundingdino_predict(DINO_MODEL, _image, prompt, threshold)
+ boxes = groundingdino_predict(self.DINO_MODEL, _image, prompt, threshold)
if boxes.shape[0] == 0:
break
- (_, _mask) = sam_segment(SAM_MODEL, _image, boxes)
+ (_, _mask) = sam_segment(self.SAM_MODEL, _image, boxes)
_mask = _mask[0]
detail_range = detail_erode + detail_dilate
if process_detail:
@@ -97,6 +99,13 @@ def segment_anything_ultra_v2(self, image, sam_model, grounding_dino_model, thre
empty_mask = torch.zeros((1, height, width), dtype=torch.uint8, device="cpu")
return (empty_mask, empty_mask)
+ if not cache_model:
+ self.SAM_MODEL = None
+ self.DINO_MODEL = None
+ self.previous_sam_model = ""
+ self.previous_dino_model = ""
+ clear_memory()
+
log(f"{NODE_NAME} Processed {len(ret_masks)} image(s).", message_type='finish')
return (torch.cat(ret_images, dim=0), torch.cat(ret_masks, dim=0),)
diff --git a/pyproject.toml b/pyproject.toml
index 5b69cf5..cbf149a 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,7 +1,7 @@
[project]
name = "comfyui_layerstyle"
description = "A set of nodes for ComfyUI it generate image like Adobe Photoshop's Layer Style. the Drop Shadow is first completed node, and follow-up work is in progress."
-version = "1.0.78"
+version = "1.0.79"
license = "MIT"
dependencies = ["numpy", "pillow", "torch", "matplotlib", "Scipy", "scikit_image", "scikit_learn", "opencv-contrib-python", "pymatting", "segment_anything", "timm", "addict", "yapf", "colour-science", "wget", "mediapipe", "loguru", "typer_config", "fastapi", "rich", "google-generativeai", "diffusers", "omegaconf", "tqdm", "transformers", "kornia", "image-reward", "ultralytics", "blend_modes", "blind-watermark", "qrcode", "pyzbar", "transparent-background", "huggingface_hub", "accelerate", "bitsandbytes", "torchscale", "wandb", "hydra-core", "psd-tools", "inference-cli[yolo-world]", "inference-gpu[yolo-world]", "onnxruntime", "peft"]