-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
numpy._core._exceptions._ArrayMemoryError: Unable to allocate 11.9 MiB for an array with shape (6, 1080, 1920) and data type bool #336
Comments
Hi @Jarradmorden 👋 Thank you for the report. Indeed, the notebook is a bit behind, but more importantly, Autodistill hasn't been updated to handle large datasets. I'll have a look at both this evening. Most likely I can bring it up to speed. |
Thanks that would be great! |
The notebook now uses the latest supervision, and the dataset is loaded lazily. Depending on the part where you ran out of RAM previously, this time it could work. Notably, a high-ram operation is calling It might be enough, and Autodistill won't need any changes. |
Awesome thanks! Very swift and speedy I will let you know if it works before accepting answer =] |
To get back over to which part was asking for an extreme amount of RAM was the distillation, it gets to 100% and then the labels are created and the distillation takes forever and never ends and crashes before finishing. Do you know which part specifically in your code fixes this, I have been trying to see the difference of the first one and last one but it says the file is too large, was it many changes or a specific function, thanks! basically just after this part base_model = GroundedSAM(ontology=ontology) FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers |
I fixed a later part. Looks like an autodistill issue; guess I should try fixing that too |
Thanks, I will keep a look out :) |
Hi @Jarradmorden 👋 I had commented on the wrong issue. There's a branch on autodistill I created yesterday, that introduces lazy dataset loading. Would you mind checking if it makes a difference? pip install git+https://github.com/autodistill/autodistill.git@feat/supervision-0.24.0-support |
thats perfect thanks I will get back to you today or tomorrow thank you so much! |
Hi Do you know which branch i should check I am getting issues when loading my images with the path
is this what i should be changing? I could not notice the changes on the notebook, Thanks! |
The branch name of autodistill is I expect autodistill to work as before, and when taken together with my proposed changes of the Colab notebook, the memory usage should be much lower. The main change is this: Previously dataset would need all images loaded to memory for a dataset to be created. Now, internally autodistill only uses image paths, and if needed, loads one image to memory temporarily. However, this is only true if you don't call To avoid high RAM usage, use the autodistill branch proided, and in your code use the loop |
hey! been stuck on this for a few days ever since i installed the pip support package It no longer finds the paths of the files
my code ### import sys
import os
import shutil
import cv2
from PySide6 import QtWidgets, QtGui, QtCore
from autodistill_grounded_sam import GroundedSAM
from autodistill.detection import CaptionOntology
import supervision as sv
from autodistill_yolov8 import YOLOv8
# Paths for images, labels, and dataset
IMAGE_DIR_PATH = "data"
DATASET_DIR_PATH = "dataset"
TRAIN_IMAGES_DIRECTORY = os.path.join(DATASET_DIR_PATH, "train/images")
TRAIN_LABELS_DIRECTORY = os.path.join(DATASET_DIR_PATH, "train/labels")
VALID_IMAGES_DIRECTORY = os.path.join(DATASET_DIR_PATH, "valid/images")
VALID_LABELS_DIRECTORY = os.path.join(DATASET_DIR_PATH, "valid/labels")
DATA_YAML_PATH = os.path.join(DATASET_DIR_PATH, "data.yaml")
class ImageAnnotatorApp(QtWidgets.QWidget):
def __init__(self):
super().__init__()
self.initUI()
# Define ontology and base model
self.ontology = CaptionOntology({
"back of bananas": "back of bananas",
"front of bananas": "front of bananas",
})
self.base_model = GroundedSAM(ontology=self.ontology)
self.dataset = None
self.current_page_index = 0
self.image_status = {}
# Clear any existing train and valid directories at the start to ensure a clean state
self.clear_directory(TRAIN_IMAGES_DIRECTORY)
self.clear_directory(TRAIN_LABELS_DIRECTORY)
self.clear_directory(VALID_IMAGES_DIRECTORY)
self.clear_directory(VALID_LABELS_DIRECTORY)
def initUI(self):
self.setWindowTitle('Image Annotator')
self.setMinimumSize(1000, 800)
self.resize(1200, 900)
self.pageLabel = QtWidgets.QLabel(self)
self.pageLabel.setAlignment(QtCore.Qt.AlignCenter)
self.statusLabels = [QtWidgets.QLabel("Unreviewed", self) for _ in range(4)]
for status_label in self.statusLabels:
status_label.setAlignment(QtCore.Qt.AlignCenter)
self.gridLayout = QtWidgets.QGridLayout()
self.imageLabels = [QtWidgets.QLabel(self) for _ in range(4)]
self.rejectButtons = [QtWidgets.QCheckBox("Reject", self) for _ in range(4)]
for i, label in enumerate(self.imageLabels):
label.setSizePolicy(QtWidgets.QSizePolicy.Expanding, QtWidgets.QSizePolicy.Expanding)
label.setStyleSheet("border: 1px solid black;")
label.setAlignment(QtCore.Qt.AlignCenter)
row, col = divmod(i, 2)
self.gridLayout.addWidget(label, row * 3, col)
self.gridLayout.addWidget(self.rejectButtons[i], row * 3 + 1, col)
self.gridLayout.addWidget(self.statusLabels[i], row * 3 + 2, col)
control_layout = QtWidgets.QHBoxLayout()
self.loadButton = QtWidgets.QPushButton('Load Images', self)
self.loadButton.clicked.connect(self.load_images)
self.prevButton = QtWidgets.QPushButton('Previous Page', self)
self.prevButton.clicked.connect(self.prev_page)
self.nextButton = QtWidgets.QPushButton('Next Page', self)
self.nextButton.clicked.connect(self.next_page)
self.acceptButton = QtWidgets.QPushButton('Accept Page', self)
self.acceptButton.clicked.connect(self.accept_page)
self.splitButton = QtWidgets.QPushButton('Create Validation Split', self)
self.splitButton.clicked.connect(self.create_validation_split)
self.trainButton = QtWidgets.QPushButton('Train Model', self)
self.trainButton.clicked.connect(self.train_model)
for button in [self.loadButton, self.prevButton, self.nextButton, self.acceptButton, self.splitButton, self.trainButton]:
button.setFixedHeight(40)
control_layout.addWidget(button)
main_layout = QtWidgets.QVBoxLayout(self)
main_layout.addWidget(self.pageLabel)
main_layout.addLayout(self.gridLayout)
main_layout.addLayout(control_layout)
self.setLayout(main_layout)
def clear_directory(self, directory_path):
if os.path.exists(directory_path):
shutil.rmtree(directory_path)
os.makedirs(directory_path, exist_ok=True)
def load_images(self):
# Label the images and prepare the dataset
self.dataset = self.base_model.label(
input_folder=IMAGE_DIR_PATH,
extension=".jpeg",
output_folder=DATASET_DIR_PATH
)
# Set total pages based on the dataset length
self.total_pages = (len(self.dataset) + 3) // 4
self.current_page_index = 0 # Start at the first page
self.annotate_images()
def annotate_images(self):
# Calculate the range of images to display on the current page
start_index = self.current_page_index * 4
end_index = start_index + 4
self.pageLabel.setText(f"Page {self.current_page_index + 1} of {self.total_pages}")
mask_annotator = sv.MaskAnnotator()
box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()
# Iterate through the dataset and display only the current page's images
for i, (image_name, annotations) in enumerate(self.dataset):
if start_index <= i < end_index:
image_path = os.path.join(IMAGE_DIR_PATH, image_name)
image = cv2.imread(image_path)
# Annotate image
annotated_image = image.copy()
annotated_image = mask_annotator.annotate(scene=annotated_image, detections=annotations)
annotated_image = box_annotator.annotate(scene=annotated_image, detections=annotations)
annotated_image = label_annotator.annotate(scene=annotated_image, detections=annotations)
# Display image and update status labels
self.display_image(annotated_image, i - start_index)
status = self.image_status.get(image_name, "Unreviewed")
self.statusLabels[i - start_index].setText(f"Status: {status}")
self.rejectButtons[i - start_index].setChecked(status == "Rejected")
elif i >= end_index:
break # Stop once we've loaded the images for the current page
def display_image(self, annotated_image, label_index):
if label_index < len(self.imageLabels):
qt_image = QtGui.QImage(annotated_image.data, annotated_image.shape[1],
annotated_image.shape[0], QtGui.QImage.Format_RGB888)
pixmap = QtGui.QPixmap.fromImage(qt_image).scaled(400, 400, QtCore.Qt.KeepAspectRatio)
self.imageLabels[label_index].setPixmap(pixmap)
def accept_page(self):
current_image_names = [
image_name for i, (image_name, _) in enumerate(self.dataset)
if self.current_page_index * 4 <= i < (self.current_page_index + 1) * 4
]
for i, image_name in enumerate(current_image_names):
base_name = os.path.splitext(image_name)[0]
image_path = os.path.join(TRAIN_IMAGES_DIRECTORY, image_name)
label_path = os.path.join(TRAIN_LABELS_DIRECTORY, f"{base_name}.txt")
if self.rejectButtons[i].isChecked():
self.image_status[image_name] = "Rejected"
self.statusLabels[i].setText("Status: Rejected")
if os.path.exists(image_path):
os.remove(image_path)
if os.path.exists(label_path):
os.remove(label_path)
else:
self.image_status[image_name] = "Accepted"
self.statusLabels[i].setText("Status: Accepted")
original_image_path = os.path.join(IMAGE_DIR_PATH, image_name)
if not os.path.exists(image_path):
shutil.copyfile(original_image_path, image_path)
with open(label_path, "w") as label_file:
annotations = self.dataset.detections_map.get(image_name)
for class_id, x, y, w, h in annotations.xywh:
label_file.write(f"{class_id} {x} {y} {w} {h}\n")
QtWidgets.QMessageBox.information(self, "Accept", "Processed selected images and labels.")
def create_validation_split(self):
train_images = [name for name, _ in self.dataset if self.image_status.get(name) == "Accepted"]
split_index = int(0.2 * len(train_images))
for image_name in train_images[:split_index]:
shutil.move(os.path.join(TRAIN_IMAGES_DIRECTORY, image_name), os.path.join(VALID_IMAGES_DIRECTORY, image_name))
QtWidgets.QMessageBox.information(self, "Split", "Validation split created from accepted training images.")
def next_page(self):
if self.current_page_index < self.total_pages - 1:
self.current_page_index += 1
self.clear_grid()
self.annotate_images()
def prev_page(self):
if self.current_page_index > 0:
self.current_page_index -= 1
self.clear_grid()
self.annotate_images()
def clear_grid(self):
for label in self.imageLabels:
label.clear()
for status_label in self.statusLabels:
status_label.setText("Unreviewed")
def train_model(self):
if not os.listdir(TRAIN_IMAGES_DIRECTORY):
QtWidgets.QMessageBox.warning(self, "No Images", "No images to train on. Accept some images first.")
return
self.target_model = YOLOv8("yolov8s-seg.pt")
self.target_model.train(DATA_YAML_PATH, epochs=1)
QtWidgets.QMessageBox.information(self, "Training", "Model training complete.")
if __name__ == '__main__':
app = QtWidgets.QApplication(sys.argv)
window = ImageAnnotatorApp()
window.show()
sys.exit(app.exec()) ###ISSUE## |
I believe the important part of the error message is at the end, but got cut off. |
Apologies |
Apologies!! i know it's not a problem with my paths because it clearly tries to label them all |
Please let me know if you see anything that could be causing this :) |
I'm tempted to believe the error - something's off with I'll have a look, but don't have time right now |
Thanks so much again! No worries take your time =] |
If you'd like to speed things up, you could create a Colab with your usage, most likely without the QT UI - just something that shows the error happening. I reckon, memory wouldn't be an issue so we can use a small dataset too, something from Universe. In any case, I'll get round to solving this eventually, hopefully in the next few days. |
Okay I will be on it. |
Labeling images with GroundedSAM model...
|
Thank you! I'll have a look as soon as I can, but this will 100% make it much faster |
I did test with the normal pip installation so I am certain it is due to the updated supervision-support, and I agree with you I think it is the dataset_detection.py something is going wrong with the relative path / full path situation |
I believe I have it fixed but you may not want the fix implemented this way but if you like :) I only needed to modify detection_base_model.py and add a flag like import datetime
import enum
import glob
import os
from abc import abstractmethod
from dataclasses import dataclass
from pathlib import Path
from typing import Dict
import cv2
import numpy as np
import roboflow
import supervision as sv
from PIL import Image
from supervision.utils.file import save_text_file
from tqdm import tqdm
from autodistill.core import BaseModel
from autodistill.helpers import load_image, split_data
from .detection_ontology import DetectionOntology
class NmsSetting(str, enum.Enum):
NONE = "no_nms"
CLASS_SPECIFIC = "class_specific"
CLASS_AGNOSTIC = "class_agnostic"
@dataclass
class DetectionBaseModel(BaseModel):
ontology: DetectionOntology
@abstractmethod
def predict(self, input: str | np.ndarray | Image.Image) -> sv.Detections:
pass
def sahi_predict(self, input: str | np.ndarray | Image.Image) -> sv.Detections:
slicer = sv.InferenceSlicer(callback=self.predict)
return slicer(load_image(input, return_format="cv2"))
def _record_confidence_in_files(
self,
annotations_directory_path: str,
images: Dict[str, np.ndarray],
annotations: Dict[str, sv.Detections],
) -> None:
Path(annotations_directory_path).mkdir(parents=True, exist_ok=True)
for image_name, _ in images.items():
detections = annotations[image_name]
yolo_annotations_name, _ = os.path.splitext(image_name)
confidence_path = os.path.join(
annotations_directory_path,
"confidence-" + yolo_annotations_name + ".txt",
)
confidence_list = [str(x) for x in detections.confidence.tolist()]
save_text_file(lines=confidence_list, file_path=confidence_path)
print("Saved confidence file: " + confidence_path)
def label(
self,
input_folder: str,
extension: str = ".jpg",
output_folder: str = None,
human_in_the_loop: bool = False,
roboflow_project: str = None,
roboflow_tags: str = ["autodistill"],
sahi: bool = False,
record_confidence: bool = False,
nms_settings: NmsSetting = NmsSetting.NONE,
lazy_load: bool = False # Enables lazy loading mode for incremental processing
) -> sv.DetectionDataset:
"""
Label a dataset with the model, saving each image and its annotations as processed.
"""
if output_folder is None:
output_folder = input_folder + "_labeled"
# Prepare directories for images and annotations
os.makedirs(os.path.join(output_folder, "images"), exist_ok=True)
os.makedirs(os.path.join(output_folder, "annotations"), exist_ok=True)
# List of image files to process
files = glob.glob(input_folder + "/*" + extension)
images_map = {}
detections_map = {}
# Process each image in lazy load mode
progress_bar = tqdm(files, desc="Labeling images (Lazy Load)" if lazy_load else "Labeling images")
for f_path in progress_bar:
f_path_short = os.path.basename(f_path)
# Load and predict on the image
image = cv2.imread(f_path)
detections = self.predict(image) if not sahi else sv.InferenceSlicer(callback=self.predict)(image)
# Apply NMS if needed
if nms_settings == NmsSetting.CLASS_SPECIFIC:
detections = detections.with_nms()
elif nms_settings == NmsSetting.CLASS_AGNOSTIC:
detections = detections.with_nms(class_agnostic=True)
# Save each processed image and label immediately to disk
cv2.imwrite(os.path.join(output_folder, "images", f"{f_path_short}"), image)
label_path = os.path.join(output_folder, "annotations", f"{os.path.splitext(f_path_short)[0]}.txt")
with open(label_path, "w") as label_file:
for box, class_id in zip(detections.xyxy, detections.class_id):
x_min, y_min, x_max, y_max = box
x_center = (x_min + x_max) / 2
y_center = (y_min + y_max) / 2
width = x_max - x_min
height = y_max - y_min
label_file.write(f"{class_id} {x_center} {y_center} {width} {height}\n")
# Optionally save confidence data for each detection
if record_confidence:
confidence_list = [str(x) for x in detections.confidence.tolist()]
confidence_path = os.path.join(output_folder, "annotations", f"confidence-{f_path_short}.txt")
save_text_file(lines=confidence_list, file_path=confidence_path)
# Add processed image and detections to maps for dataset creation
images_map[f_path_short] = image
detections_map[f_path_short] = detections
# Construct the dataset object
dataset = sv.DetectionDataset(self.ontology.classes(), images_map, detections_map)
return dataset
#########################FIX####################
self.dataset = self.base_model.label(
input_folder=selected_dir,
output_folder=BASE_DIR, # Direct output to the labeling directory
extension="*",
lazy_load=True
) |
Hi @Jarradmorden, If you're still struggling with the issue, the new autodistill version supports latest supervision, and lazy data loading. Let me know if you're still having issues. |
Search before asking
Notebook name
https://github.com/roboflow/notebooks/blob/main/notebooks/how-to-auto-train-yolov8-model-with-autodistill.ipynb
Bug
Hello,
I am following some of the tutorials that roboflow offers, I am doing a custom dataset I have over 9000 pictures and I am using the ontology, it would seem when I reach the end of my training I run into memory issues "numpy._core._exceptions._ArrayMemoryError: Unable to allocate 11.9 MiB for an array with shape (6, 1080, 1920) and data type bool" and it only works if I train with a much smaller sample, I am guessing this is because they are all being processed in one go, I followed the tutorial for making a custom dataset but I think this would happen to anyone with a much larger size. How can I get around this issue.
I understand lowering the resolution would help but still by having so many issues I face the same issue, how would i go around this?
I put this as a bug because I am not sure if this notebook accounts for very large datasets so it would be good to do that if it's something that can happen to anyone, thank you
#####ISSUE########
left_9_2024-10-18 15-14-24.770364.png: 100%|██████████████████████████████████| 12498/12498 [6:07:58<00:00, 1.77s/it]
Passing a
Dict[str, np.ndarray]
intoDetectionDataset
is deprecated and will be removed insupervision-0.26.0
. Use a list of pathsList[str]
instead.Found dataset\train\images\left_9339_2024-10-18 15-23-39.366113.jpg as already present, not moving anything to dataset\train\images
Found dataset\train\labels\left_9339_2024-10-18 15-23-39.366113.txt as already present, not moving anything to dataset\train\labels
Found dataset\train\images\left_4584_2024-10-18 15-18-59.357296.jpg as already present, not moving anything to dataset\train\images
Found dataset\train\labels\left_4584_2024-10-18 15-18-59.357296.txt as already present, not moving anything to dataset\train\labels
Found dataset\train\images\left_4583_2024-10-18 15-18-59.323971.jpg as already present, not moving anything to dataset\train\images
Found dataset\train\labels\left_4583_2024-10-18 15-18-59.323971.txt as already present, not moving anything to dataset\train\labels
Found dataset\train\images\left_3070_2024-10-18 15-17-30.481540.jpg as already present, not moving anything to dataset\train\images
Found dataset\train\labels\left_3070_2024-10-18 15-17-30.481540.txt as already present, not moving anything to dataset\train\labels
Found dataset\train\images\left_9338_2024-10-18 15-23-39.299582.jpg as already present, not moving anything to dataset\train\images
Found dataset\train\labels\left_9338_2024-10-18 15-23-39.299582.txt as already present, not moving anything to dataset\train\labels
Found dataset\train\images\left_9335_2024-10-18 15-23-39.131904.jpg as already present, not moving anything to dataset\train\images
Found dataset\train\labels\left_9335_2024-10-18 15-23-39.131904.txt as already present, not moving anything to dataset\train\labels
Found dataset\train\images\left_9336_2024-10-18 15-23-39.198825.jpg as already present, not moving anything to dataset\train\images
Found dataset\train\labels\left_9336_2024-10-18 15-23-39.198825.txt as already present, not moving anything to dataset\train\labels
Found dataset\train\images\left_9337_2024-10-18 15-23-39.265686.jpg as already present, not moving anything to dataset\train\images
Found dataset\train\labels\left_9337_2024-10-18 15-23-39.265686.txt as already present, not moving anything to dataset\train\labels
Labeled dataset created - ready for distillation.
Traceback (most recent call last):
File "c:\Users\xxxxxx\DATA\model.py", line 45, in
dataset = sv.DetectionDataset.from_yolo(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Usersxxxx\AppData\Local\anaconda3\envs\visper_environment\Lib\site-packages\supervision\dataset\core.py", line 497, in from_yolo
classes, image_paths, annotations = load_yolo_annotations(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "cxxxxxx\AppData\Local\anaconda3\envs\xxxx_environment\Lib\site-packages\supervision\dataset\formats\yolo.py", line 120, in yolo_annotations_to_detections
mask = _polygons_to_masks(polygons=polygons, resolution_wh=resolution_wh)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\xxxxxx\AppData\Local\anaconda3\envs\xxx_environment\Lib\site-packages\supervision\dataset\formats\yolo.py", line 50, in _polygons_to_masks
return np.array(
^^^^^^^^^
numpy._core._exceptions._ArrayMemoryError: Unable to allocate 11.9 MiB for an array with shape (6, 1080, 1920) and data type bool
####MY CODE#####
Windows 11
I have quite good specs too
python 3.11
and NVIDEA RTX A5000 Graphics card
The text was updated successfully, but these errors were encountered: