-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ignore one class subfolder while using image_dataset_from_directory() function #1386
Comments
This is not directly supported by the convenience function The simplest fix is probably to use # first get the label sorted in the same order as keras
# sorted directories in main_directory
library(reticulate)
os <- reticulate::import("os")
sorted_labels <- os$walk(main_directory) |> iter_next() |> _[[2]]
labels <- seq(0, along = sorted_labels)
names(labels) <- sorted_labels
my_unwanted_label <- labels %>% .[!names(.) %in% c("class_c")] %>% unname()
library(keras)
library(tfdatasets)
ds <- image_dataset_from_directory(....) %>%
dataset_map(\(images, labels) {
keep <- my_unwanted_labels |>
lapply(\(bad_label) labels != bad_label) |>
purrr::reduce(`&`)
tuple(images[keep], labels[keep])
}) or using my_unwanted_labels %<>% as_tensor()
ds <- image_dataset_from_directory(....) %>%
dataset_unbatch() %>%
dataset_filter(\(image, label) !k_any(label == my_unwanted_labels)
dataset_batch(batch_size = 32) Alternatively, instead of fixing up the output of library(fs)
library(keras)
curated_dataset <- fs::path("curated_dataset") |> path_abs()
dir_create(curated_dataset)
class_dirs <- dir_ls(main_directory, recurse = FALSE) %>%
.[!basename(.) %in% c("class_c")] %>%
path_abs()
link_create(class_dirs, # link target
path(curated_dataset, basename(class_dirs)) # link location
ds <- image_dataset_from_directory(curated_dataset, follow_links = FALSE) (all the code snippets are above untested, but I trust you can figure out the rest). |
Thanks for this! Very helpful. The fixing up the input solution works, (you do have a parenthesis missing, but that can be easily fixed).
However, I think that the fixing the output solution is more desirable. For one it does not create all these curated symlinks. However, I can not see if the other arguments of
First, I wanted to note that I get:
But I also would lke to pass some arguments to the
How do I do this? Thanks again for this wonderful resource that allows me to use R with keras! |
To get rid of this warning: In `[.tensorflow.tensor`(images, keep) :
Incorrect number of dimensions supplied.... You can change the call ( The second note is issued when you are subsetting a tensor with another tensor. It's a one-time warning per R session, to help remind you that I think that all the other arguments should still work. The one thing that might change is the exact output shape of the tfdataset that is returned, and you'd have to adjust the formals of the function passed to When in doubt about what the exact signature is needed, and to avoid a guessing game, you can quickly test by passing a function with image_dataset_from_directory(<many args>) %>%
dataset_map(function(...) {
str(list(...))
# you can also do "browser-driven development", and write the body of the
# function with live references to the symbolic "graph-mode" tensors available
# for interactive, line-by-line testing, by dropping into a browser() context here:
browser()
# just be sure to exit the browser() by "(c)ontinuing" and not by "(q)uiting".
# If you quit, tensorflow keeps the tracing context open, leaving the sesion in a
# broken state that requires an R session restart to fix.
}) Then when you are done experimenting/writing, you can update the function signature for future readability: image_dataset_from_directory(...., validation_split = .... ) %>%
dataset_map(function(train, val) {
names(train) <- names(val) <- c("images", "labels")
for(nm in c("images", "labels")) {
train[[nm]] %<>% .[keep, all_dims()]
val[[nm]] %<>% .[keep, all_dims()]
}
tuple(lapply(list(train, val), unname))
}) |
My apologies: I have been trying several things for a while, but I am still confused about this. Let us use the dataset (AFHQ in your co-authored book). I want to only focus on the cats and dogs to sort of match what you are doing there in Chapter 8, but as a learning experience,I do not want to create a new folder of images as you have done there.
However, I get:
I feel like I am almost there, however, I am still stuck. Thanks again for all your help! And thanks also for the book, and the resource! |
Here is a working example using a mnist dataset (most convenient for me right now) library(purrr)
library(fs)
library(keras)
library(tfdatasets)
class_names <- xfun::n2w(0:9)
unwanted_class_names <- xfun::n2w(c(6, 9))
class_labels <- seq.int(from = 0, along.with = class_names)
names(class_labels) <- class_names
unwanted_labels <- local({
class_labels %>% .[names(.) %in% unwanted_class_names]
})
dir <- tempfile("mnist-")
dir_create(dir, class_names)
mnist <- dataset_mnist()
walk(seq_len(nrow(mnist$train$x)), \(i) {
img <- mnist$train$x[i,,]/255
lbl <- mnist$train$y[i]
jpeg::writeJPEG(image = img,
target = path(dir, xfun::n2w(lbl), i, ext = "jpeg"))
})
ds <- image_dataset_from_directory(dir, class_names = class_names)
ds <- ds %>%
dataset_unbatch() %>%
dataset_filter(\(img, lbl) k_all(lbl != unwanted_lbls)) %>%
dataset_batch(32)
# confirm the unwanted labels aren't there
seen_labels <- ds %>%
dataset_take(10) %>%
as_array_iterator() %>%
reticulate::iterate(\(x) {
c(images, labels) %<-% x
unique(labels)
}) %>%
unlist() %>% unique() %>% sort()
# 0 1 2 3 4 5 7 8
stopifnot(!unwanted_labels %in% seen_labels)
# Note, in the upcoming keras 3 / keras_core, passing a subset of names to `class_names` will work:
ds <- image_dataset_from_directory(dir, class_names = class_names[1:3])
Thank you! I'm glad to hear you find it helpful. |
Thank you! There is a typo there, for anyone looking at this for future reference. It is obvious, but Btw, after the reduction, |
|
Making n_images <- list.files(dir, full.names = TRUE) %>%
.[!basename(.) %in% unwanted_class_names] %>%
list.files(pattern = "\\.jpe?g$") %>%
length()
ds <- image_dataset_from_directory(dir, class_names = class_names)
ds <- ds %>%
dataset_unbatch() %>%
dataset_filter(\(img, lbl) k_all(lbl != unwanted_labels)) %>%
{ .$apply(tf$data$experimental$assert_cardinality(n_images)) } %>%
dataset_batch(32)
length(ds) # 1505 |
Odd. I have a problem with the AFHQ dataset (sorry):
I get:
Then,
I don't quite understand what is going wrong here. Thanks! |
I think rather than working around the current TF Dataset cardinality limitations, it's simpler to create temporary links: library(fs)
library(keras)
image_dataset_from_directory_subset <- function(directory, ..., class_names) {
directory2 <- dir_create(path_temp(file_temp(), path_file(directory)))
stopifnot(class_names %in% list.files(directory))
link_create(path(directory, class_names), # link target
path(directory2, class_names)) # link location
keras::image_dataset_from_directory(directory2, ..., class_names = class_names)
}
ds <- image_dataset_from_directory_subset(dir, class_names = class_names[1:5]) |
I am looking into the R package
keras
and the functionimage_dataset_from_directory()
According to the help page,
Then calling ‘image_dataset_from_directory(main_directory, labels='inferred')’ will return a ‘tf.data.Dataset’ that yields batches of images from the subdirectories class_a and class_b, together with labels 0 and 1 (0 corresponding to class_a and 1 corresponding to class_b).
However, I have three folders:
I want to read only two of these classes (and ignore the third). Is there a way to do this using the
image_dataset_from_directory()
or some other function?The text was updated successfully, but these errors were encountered: