Skip to content

Commit

Permalink
Commit JoyCaption2Split,LoadJoyCaption2Model nodes, SegmentAnythingUl…
Browse files Browse the repository at this point in the history
…tra nodes add cache_modeloption
  • Loading branch information
chflame163 committed Oct 17, 2024
1 parent c7c61ff commit 999346d
Show file tree
Hide file tree
Showing 10 changed files with 217 additions and 29 deletions.
32 changes: 32 additions & 0 deletions README.MD
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,9 @@ Please try downgrading the ```protobuf``` dependency package to 3.20.3, or set e

<font size="4">**If the dependency package error after updating, please double clicking ```repair_dependency.bat``` (for Official ComfyUI Protable) or ```repair_dependency_aki.bat``` (for ComfyUI-aki-v1.x) in the plugin folder to reinstall the dependency packages. </font><br />

* Commit [JoyCaption2Split](#JoyCaption2Split) and [LoadJoyCaption2Model](#LoadJoyCaption2Model) nodes, Sharing the model across multiple JoyCaption2 nodes improves efficiency.
* [SegmentAnythingUltra](#SegmentAnythingUltra) and [SegmentAnythingUltraV2](#SegmentAnythingUltraV2) add the ```cache_model``` option, Easy to flexibly manage VRAM usage.

* Due to the high version requirements of the [LlamaVision](#LlamaVision) node for ```transformers```, which affects the loading of some older third-party plugins, so the LayerStyle plugin has lowered the default requirement to 4.43.2. If you need to run LlamaVision, please upgrade to 4.45.0 or above on your own.

* Commit [JoyCaption2](#JoyCaption2) and [JoyCaption2ExtraOptions](#JoyCaption2ExtraOptions) nodes. New dependency packages need to be installed.
Expand Down Expand Up @@ -952,6 +955,34 @@ Node Options:
* temperature: The temperature parameter of LLM.
* cache_model: Whether to cache the model.

### <a id="table1">JoyCaption2Split</a>
The node of JoyCaption2 separate model loading and inference, and when multiple JoyCaption2 nodes are used, the model can be shared to improve efficiency.

Node Options:
![image](image/joycaption2_split_node.jpg)

* image: Image input.。
* joy2_model: The JoyCaption model input.
* extra_options: Input the extra_options.
* caption_type: Caption type options, including: "Descriptive", "Descriptive (Informal)", "Training Prompt", "MidJourney", "Booru tag list", "Booru-like tag list", "Art Critic", "Product Listing", "Social Media Post".
* caption_length: The length of caption.
* user_prompt: User prompt words for LLM model. If there is content here, it will overwrite all the settings for caption_type and extra_options.
* max_new_tokens: The max_new_token parameter of LLM.
* do_sample: The do_sample parameter of LLM.
* top-p: The top_p parameter of LLM.
* temperature: The temperature parameter of LLM.

### <a id="table1">LoadJoyCaption2Model</a>
JoyCaption2's model loading node, used in conjunction with JoyCaption2Split.

Node Options:
![image](image/load_joycaption2_model_node.jpg)

* llm_model: There are two LLM models to choose, Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2 and unsloth/Meta-Llama-3.1-8B-Instruct.
* device: Model loading device. Currently, only CUDA is supported.
* dtype: Model precision, nf4 and bf16.
* vlm_lora: Whether to load text_madel.

### <a id="table1">JoyCaption2ExtraOptions</a>
The extra_options parameter node of JoyCaption2.

Expand Down Expand Up @@ -2065,6 +2096,7 @@ Node options:
* white_point: Edge white sampling threshold.
* process_detail: Set to false here will skip edge processing to save runtime.
* prompt: Input for SAM's prompt.
* cache_model: Set whether to cache the model.

### <a id="table1">SegmentAnythingUltraV2</a>

Expand Down
32 changes: 32 additions & 0 deletions README_CN.MD
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,8 @@ If this call came from a _pb2.py file, your generated code is out of date and mu
## 更新说明
<font size="4">**如果本插件更新后出现依赖包错误,请双击运行插件目录下的```install_requirements.bat```(官方便携包),或 ```install_requirements_aki.bat```(秋叶整合包) 重新安装依赖包。

* 添加 [JoyCaption2Split](#JoyCaption2Split)[LoadJoyCaption2Model](#LoadJoyCaption2Model) 节点,在多个JoyCaption2节点时共用模型提高效率。
* [SegmentAnythingUltra](#SegmentAnythingUltra)[SegmentAnythingUltraV2](#SegmentAnythingUltraV2) 增加 ```cache_model``` 参数,便于灵活管理显存。
* 鉴于[LlamaVision](#LlamaVision)节点对 ```transformers``` 的要求版本较高而影响某些旧版第三方插件的加载,LayerStyle 插件已将默认要求降低到4.43.2, 如有运行LlamaVision的需求请自行升级至4.45.0以上。
* 添加 [JoyCaption2](#JoyCaption2)[JoyCaption2ExtraOptions](#JoyCaption2ExtraOptions) 节点,使用JoyCaption-alpha-two模型生成提示词。
请从 [百度网盘](https://pan.baidu.com/s/1dOjbUEacUOhzFitAQ3uIeQ?pwd=4ypv) 以及 [百度网盘](https://pan.baidu.com/s/1mH1SuW45Dy6Wga7aws5siQ?pwd=w6h5)
Expand Down Expand Up @@ -849,6 +851,35 @@ ImageScaleByAspectRatio的V2升级版
* temperature: LLM的temperature参数。
* cache_model: 是否缓存模型。

### <a id="table1">JoyCaption2Split</a>
JoyCaption2 的分离式节点,将模型加载与推理分离,使用多个JoyCaption2节点时可共用模型提高效率。

节点选项说明:
![image](image/joycaption2_split_node.jpg)

* image: 图片输入。
* joy2_model: JoyCaption模型输入。
* extra_options: extra_options参数输入。
* caption_type: caption类型选项, 包括"Descriptive"(正式语气描述), "Descriptive (Informal)"(非正式语气描述), "Training Prompt"(SD训练描述), "MidJourney"(MJ风格描述), "Booru tag list"(标签列表), "Booru-like tag list"(类标签列表), "Art Critic"(艺术评论), "Product Listing"(产品列表), "Social Media Post"(社交媒体风格)。
* caption_length: 描述长度。
* user_prompt: LLM模型的用户提示词。如果这里有内容将覆盖caption_type和extra_options的所有设置。
* max_new_tokens: LLM的max_new_tokens参数。
* do_sample: LLM的do_sample参数。
* top-p: LLM的top_p参数。
* temperature: LLM的temperature参数。

### <a id="table1">LoadJoyCaption2Model</a>
JoyCaption2 的模型加载节点,与JoyCaption2Split配合使用。

节点选项说明:
![image](image/load_joycaption2_model_node.jpg)

* llm_model: 目前有 Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2 和 unsloth/Meta-Llama-3.1-8B-Instruct 两种LLM模型可选择。
* device: 模型加载设备。目前仅支持cuda。
* dtype: 模型加载精度,有nf4 和 bf16 两个选项。
* vlm_lora: 是否加载text_model。


### <a id="table1">JoyCaption2ExtraOptions</a>
JoyCaption2的extra_options参数节点。

Expand Down Expand Up @@ -1843,6 +1874,7 @@ mask为可选输入项,如果这里输入遮罩,将作用于输出结果。
* white_point: 边缘白色采样阈值。
* process_detail: 此处设为False将跳过边缘处理以节省运行时间。
* prompt: SAM的prompt输入。
* cache_model: 是否缓存模型。

### <a id="table1">SegmentAnythingUltraV2</a>
SegmentAnythingUltra的V2升级版,增加了VITMatte边缘处理方法。
Expand Down
Binary file added image/joycaption2_split_node.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added image/load_joycaption2_model_node.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified image/segment_anything_ultra_node.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified image/segment_anything_ultra_v2_node.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
102 changes: 102 additions & 0 deletions py/joycaption_alpha_2.py
Original file line number Diff line number Diff line change
Expand Up @@ -524,13 +524,115 @@ def joycaption2(self, image, llm_model, device, dtype, vlm_lora, caption_type, c

return (ret_text,)

class LS_LoadJoyCaption2Model:

CATEGORY = '😺dzNodes/LayerUtility'
FUNCTION = "load_joycaption2_model"
RETURN_TYPES = ("JoyCaption2_Model",)
RETURN_NAMES = ("joy2_model",)
OUTPUT_IS_LIST = (True,)

def __init__(self):
self.NODE_NAME = 'LoadJoyCaption2Model'

@classmethod
def INPUT_TYPES(self):
llm_model_list = ["Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2", "unsloth/Meta-Llama-3.1-8B-Instruct"]
device_list = ['cuda']
dtype_list = ['nf4','bf16']
vlm_lora_list = ['text_model', 'none']

return {
"required": {
"llm_model": (llm_model_list,),
"device": (device_list,),
"dtype": (dtype_list,),
"vlm_lora": (vlm_lora_list,),
},
"optional": {
}
}

def load_joycaption2_model(self, llm_model, device, dtype, vlm_lora):
llm_model_path = download_hg_model(llm_model, "LLM")
model = load_models(llm_model_path, dtype, vlm_lora, device)

return ([[model,device]],)

class LS_JoyCaption2Split:

CATEGORY = '😺dzNodes/LayerUtility'
FUNCTION = "joycaption2split"
RETURN_TYPES = ("STRING",)
RETURN_NAMES = ("text",)
OUTPUT_IS_LIST = (True,)

def __init__(self):
self.NODE_NAME = 'JoyCaption2split'
self.previous_model = None

@classmethod
def INPUT_TYPES(self):
caption_type_list = ["Descriptive", "Descriptive (Informal)", "Training Prompt", "MidJourney",
"Booru tag list", "Booru-like tag list", "Art Critic", "Product Listing",
"Social Media Post"]
caption_length_list = ["any", "very short", "short", "medium-length", "long", "very long"] + [str(i) for i in range(20, 261, 10)]

return {
"required": {
"image": ("IMAGE",),
"joy2_model": ("JoyCaption2_Model",),
"caption_type": (caption_type_list,),
"caption_length": (caption_length_list,),
"user_prompt": ("STRING", {"default": "","multiline": False}),
"max_new_tokens": ("INT", {"default": 300, "min": 8, "max": 4096, "step": 1}),
"top_p": ("FLOAT", {"default": 0.9, "min": 0, "max": 1, "step": 0.01}),
"temperature": ("FLOAT", {"default": 0.6, "min": 0, "max": 1, "step": 0.01}),
},
"optional": {
"extra_options": ("JoyCaption2ExtraOption",),
}
}

def joycaption2split(self, image, joy2_model, caption_type, caption_length,
user_prompt, max_new_tokens, top_p, temperature,
extra_options=None):

model, device = joy2_model
# device = "cuda"
ret_text = []
extra = []
character_name = ""
if extra_options is not None:
extra, character_name = extra_options

for img in image:
img = tensor2pil(img.unsqueeze(0)).convert('RGB')
# log(f"{self.NODE_NAME}: caption_type={caption_type}, caption_length={caption_length}, extra={extra}, character_name={character_name}, user_prompt={user_prompt}")
caption = stream_chat([img], caption_type, caption_length,
extra, character_name, user_prompt,
max_new_tokens, top_p, temperature, 1,
model, device)
log(f"{self.NODE_NAME}: caption={caption[0]}")
ret_text.append(caption[0])

del joy2_model
del model, device
clear_memory()

return (ret_text,)


NODE_CLASS_MAPPINGS = {
"LayerUtility: LoadJoyCaption2Model": LS_LoadJoyCaption2Model,
"LayerUtility: JoyCaption2Split": LS_JoyCaption2Split,
"LayerUtility: JoyCaption2": LS_JoyCaption2,
"LayerUtility: JoyCaption2ExtraOptions": LS_JoyCaptionExtraOptions
}

NODE_DISPLAY_NAME_MAPPINGS = {
"LayerUtility: LoadJoyCaption2Model": "LayerUtility: Load JoyCaption2 Model",
"LayerUtility: JoyCaption2Split": "LayerUtility: JoyCaption2 Split",
"LayerUtility: JoyCaption2": "LayerUtility: JoyCaption2",
"LayerUtility: JoyCaption2ExtraOptions": "LayerUtility: JoyCaption2 Extra Options"
}
35 changes: 24 additions & 11 deletions py/segment_anything_ultra.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@

NODE_NAME = 'SegmentAnythingUltra'

SAM_MODEL = None
DINO_MODEL = None

class SegmentAnythingUltra:
def __init__(self):
pass
self.SAM_MODEL = None
self.DINO_MODEL = None
self.previous_sam_model = ""
self.previous_dino_model = ""

@classmethod
def INPUT_TYPES(cls):
Expand All @@ -24,6 +24,7 @@ def INPUT_TYPES(cls):
"white_point": ("FLOAT", {"default": 0.99, "min": 0.02, "max": 0.99, "step": 0.01}),
"process_detail": ("BOOLEAN", {"default": True}),
"prompt": ("STRING", {"default": "subject"}),
"cache_model": ("BOOLEAN", {"default": False}),
},
"optional": {
}
Expand All @@ -36,22 +37,27 @@ def INPUT_TYPES(cls):

def segment_anything_ultra(self, image, sam_model, grounding_dino_model, threshold,
detail_range, black_point, white_point, process_detail,
prompt, ):
global SAM_MODEL
global DINO_MODEL
if SAM_MODEL is None: SAM_MODEL = load_sam_model(sam_model)
if DINO_MODEL is None: DINO_MODEL = load_groundingdino_model(grounding_dino_model)
prompt, cache_model):

if self.previous_sam_model != sam_model or self.SAM_MODEL is None:
self.SAM_MODEL = load_sam_model(sam_model)
self.previous_sam_model = sam_model
if self.previous_dino_model != grounding_dino_model or self.DINO_MODEL is None:
self.DINO_MODEL = load_groundingdino_model(grounding_dino_model)
self.previous_dino_model = grounding_dino_model


ret_images = []
ret_masks = []

for i in image:
i = torch.unsqueeze(i, 0)
i = pil2tensor(tensor2pil(i).convert('RGB'))
item = tensor2pil(i).convert('RGBA')
boxes = groundingdino_predict(DINO_MODEL, item, prompt, threshold)
boxes = groundingdino_predict(self.DINO_MODEL, item, prompt, threshold)
if boxes.shape[0] == 0:
break
(_, _mask) = sam_segment(SAM_MODEL, item, boxes)
(_, _mask) = sam_segment(self.SAM_MODEL, item, boxes)
_mask = _mask[0]
if process_detail:
_mask = tensor2pil(mask_edge_detail(i, _mask, detail_range, black_point, white_point))
Expand All @@ -66,6 +72,13 @@ def segment_anything_ultra(self, image, sam_model, grounding_dino_model, thresho
empty_mask = torch.zeros((1, height, width), dtype=torch.uint8, device="cpu")
return (empty_mask, empty_mask)

if not cache_model:
self.SAM_MODEL = None
self.DINO_MODEL = None
self.previous_sam_model = ""
self.previous_dino_model = ""
clear_memory()

log(f"{NODE_NAME} Processed {len(ret_masks)} image(s).", message_type='finish')
return (torch.cat(ret_images, dim=0), torch.cat(ret_masks, dim=0),)

Expand Down
43 changes: 26 additions & 17 deletions py/segment_anything_ultra_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,15 @@

NODE_NAME = 'SegmentAnythingUltra V2'

SAM_MODEL = None
DINO_MODEL = None
previous_sam_model = ""
previous_dino_model = ""



class SegmentAnythingUltraV2:
def __init__(self):
self.SAM_MODEL = None
self.DINO_MODEL = None
self.previous_sam_model = ""
self.previous_dino_model = ""
pass

@classmethod
Expand All @@ -32,6 +34,7 @@ def INPUT_TYPES(cls):
"prompt": ("STRING", {"default": "subject"}),
"device": (device_list,),
"max_megapixels": ("FLOAT", {"default": 2.0, "min": 1, "max": 999, "step": 0.1}),
"cache_model": ("BOOLEAN", {"default": False}),
},
"optional": {
}
Expand All @@ -45,35 +48,34 @@ def INPUT_TYPES(cls):
def segment_anything_ultra_v2(self, image, sam_model, grounding_dino_model, threshold,
detail_method, detail_erode, detail_dilate,
black_point, white_point, process_detail, prompt,
device, max_megapixels
device, max_megapixels, cache_model
):
global SAM_MODEL
global DINO_MODEL
global previous_sam_model
global previous_dino_model

if detail_method == 'VITMatte(local)':
local_files_only = True
else:
local_files_only = False

if previous_sam_model != sam_model:
SAM_MODEL = load_sam_model(sam_model)
previous_sam_model = sam_model
if previous_dino_model != grounding_dino_model:
DINO_MODEL = load_groundingdino_model(grounding_dino_model)
previous_dino_model = grounding_dino_model
if self.previous_sam_model != sam_model or self.SAM_MODEL is None:
self.SAM_MODEL = load_sam_model(sam_model)
self.previous_sam_model = sam_model
if self.previous_dino_model != grounding_dino_model or self.DINO_MODEL is None:
self.DINO_MODEL = load_groundingdino_model(grounding_dino_model)
self.previous_dino_model = grounding_dino_model

SAM_MODEL = load_sam_model(sam_model)
DINO_MODEL = load_groundingdino_model(grounding_dino_model)
ret_images = []
ret_masks = []

for i in image:
i = torch.unsqueeze(i, 0)
i = pil2tensor(tensor2pil(i).convert('RGB'))
_image = tensor2pil(i).convert('RGBA')
boxes = groundingdino_predict(DINO_MODEL, _image, prompt, threshold)
boxes = groundingdino_predict(self.DINO_MODEL, _image, prompt, threshold)
if boxes.shape[0] == 0:
break
(_, _mask) = sam_segment(SAM_MODEL, _image, boxes)
(_, _mask) = sam_segment(self.SAM_MODEL, _image, boxes)
_mask = _mask[0]
detail_range = detail_erode + detail_dilate
if process_detail:
Expand All @@ -97,6 +99,13 @@ def segment_anything_ultra_v2(self, image, sam_model, grounding_dino_model, thre
empty_mask = torch.zeros((1, height, width), dtype=torch.uint8, device="cpu")
return (empty_mask, empty_mask)

if not cache_model:
self.SAM_MODEL = None
self.DINO_MODEL = None
self.previous_sam_model = ""
self.previous_dino_model = ""
clear_memory()

log(f"{NODE_NAME} Processed {len(ret_masks)} image(s).", message_type='finish')
return (torch.cat(ret_images, dim=0), torch.cat(ret_masks, dim=0),)

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[project]
name = "comfyui_layerstyle"
description = "A set of nodes for ComfyUI it generate image like Adobe Photoshop's Layer Style. the Drop Shadow is first completed node, and follow-up work is in progress."
version = "1.0.78"
version = "1.0.79"
license = "MIT"
dependencies = ["numpy", "pillow", "torch", "matplotlib", "Scipy", "scikit_image", "scikit_learn", "opencv-contrib-python", "pymatting", "segment_anything", "timm", "addict", "yapf", "colour-science", "wget", "mediapipe", "loguru", "typer_config", "fastapi", "rich", "google-generativeai", "diffusers", "omegaconf", "tqdm", "transformers", "kornia", "image-reward", "ultralytics", "blend_modes", "blind-watermark", "qrcode", "pyzbar", "transparent-background", "huggingface_hub", "accelerate", "bitsandbytes", "torchscale", "wandb", "hydra-core", "psd-tools", "inference-cli[yolo-world]", "inference-gpu[yolo-world]", "onnxruntime", "peft"]

Expand Down

0 comments on commit 999346d

Please sign in to comment.