FELES provides a set of ready-to-use datasets and models for bootstrapping FL algorithms implementation and comparison.
The datasets and models are taken from well known sources and provided by TensorFlow Datasets.
The available datasets are:
name | task | reference |
---|---|---|
mnist |
image classification | MNIST |
fashion_mnist |
image classification | Fashion MNIST |
cifar10 |
image classification | CIFAR10 |
cifar100 |
image classification | CIFAR100 |
imdb_reviews |
text classification, sentiment | IMDB Reviews |
boston_housing |
regression | Boston Housing |
emnist |
image classification | EMNIST |
sentiment140 |
text classification, sentiment | Sentiment140 |
shakespeare |
text generation (char level) | Shakespeare |
wisdm |
activity recognition | WISDM |
oxford_iiit_pet:3.*.* |
image classification | Oxford Pets |
tff_cifar100 |
image classification | TFF_CIFAR100 |
tff_emnist |
image classification | TFF_EMNIST |
tff_shakespeare |
text generation | TFF_SHAKESPEARE |
- name:
mnist
- description: the MNIST dataset of handwritten digits has a training set of 60,000 examples,and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image
- url: http://yann.lecun.com/exdb/mnist/
- source: TensorFlow Datasets
- IID: yes
- task: image classification
- visualization: Know Your Data
- model: neural network from TensorFlow
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 784) 0
_________________________________________________________________
dense (Dense) (None, 128) 100480
_________________________________________________________________
dropout (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 10) 1290
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
- name:
fashion_mnist
- description: fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes
- url: https://github.com/zalandoresearch/fashion-mnist
- source: TensorFlow Datasets
- IID: yes
- task: image classification
- visualization: Know Your Data
- model: neural network from TensorFlow
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_1 (Flatten) (None, 784) 0
_________________________________________________________________
dense_2 (Dense) (None, 128) 100480
_________________________________________________________________
dense_3 (Dense) (None, 10) 1290
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
- name:
cifar10
- description: the CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images
- url: https://www.cs.toronto.edu/%7Ekriz/cifar.html
- source: TensorFlow Datasets
- IID: yes
- task: image classification
- visualization: Know Your Data
- model: CNN from TensorFlow
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 30, 30, 32) 896
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 13, 13, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 4, 4, 64) 36928
_________________________________________________________________
flatten_2 (Flatten) (None, 1024) 0
_________________________________________________________________
dense_4 (Dense) (None, 64) 65600
_________________________________________________________________
dense_5 (Dense) (None, 10) 650
=================================================================
Total params: 122,570
Trainable params: 122,570
Non-trainable params: 0
_________________________________________________________________
- name:
cifar100
- description: the CIFAR-100 dataset consists of 50,000 32x32 color training images and 10,000 test images, labeled over 100 fine-grained classes that are grouped into 20 coarse-grained classes.
- url: https://www.cs.toronto.edu/%7Ekriz/cifar.html
- source: TensorFlow Datasets
- IID: yes
- task: image classification
- model: neural network from Tensorflow
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_6 (Conv2D) (None, 30, 30, 32) 896
max_pooling2d_4 (MaxPooling (None, 15, 15, 32) 0
2D)
conv2d_7 (Conv2D) (None, 13, 13, 64) 18496
max_pooling2d_5 (MaxPooling (None, 6, 6, 64) 0
2D)
conv2d_8 (Conv2D) (None, 4, 4, 64) 36928
flatten_5 (Flatten) (None, 1024) 0
dense_20 (Dense) (None, 64) 65600
dense_21 (Dense) (None, 100) 6500
=================================================================
Total params: 128,420
Trainable params: 128,420
Non-trainable params: 0
_________________________________________________________________
- name:
imdb_reviews
- description: Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
- url: http://ai.stanford.edu/%7Eamaas/data/sentiment/
- source: TensorFlow Datasets
- IID: yes
- task: text classification, sentiment
- model: neural network from Builtin
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_6 (Dense) (None, 50) 500050
_________________________________________________________________
dropout_1 (Dropout) (None, 50) 0
_________________________________________________________________
dense_7 (Dense) (None, 50) 2550
_________________________________________________________________
dropout_2 (Dropout) (None, 50) 0
_________________________________________________________________
dense_8 (Dense) (None, 50) 2550
_________________________________________________________________
dense_9 (Dense) (None, 1) 51
=================================================================
Total params: 505,201
Trainable params: 505,201
Non-trainable params: 0
_________________________________________________________________
- name:
boston_housing
- description: this dataset is taken from the StatLib library which is maintained at Carnegie Mellon University. Samples contain 13 attributes of houses at different locations around the Boston suburbs in the late 1970s. Targets are the median values of the houses at a location (in k$).
- url: http://lib.stat.cmu.edu/datasets/boston
- source: TensorFlow Datasets
- IID: yes
- task: regression
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_15 (Dense) (None, 64) 896
dense_16 (Dense) (None, 64) 4160
dense_17 (Dense) (None, 1) 65
=================================================================
Total params: 5,121
Trainable params: 5,121
Non-trainable params: 0
_________________________________________________________________
- name:
emnist
- description: the EMNIST dataset is a set of handwritten character digits derived from the NIST Special Database 19 and converted to a 28x28 pixel image format and dataset structure that directly matches the MNIST dataset.
- url: https://www.nist.gov/itl/products-and-services/emnist-dataset
- source: TensorFlow Datasets
- IID: yes
- task: image classification
- model: neural network from TensorFlow
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_3 (Flatten) (None, 784) 0
dense_6 (Dense) (None, 128) 100480
dropout_3 (Dropout) (None, 128) 0
dense_7 (Dense) (None, 62) 7998
=================================================================
Total params: 108,478
Trainable params: 108,478
Non-trainable params: 0
_________________________________________________________________
- name:
sentiment140
- description: Sentiment140 allows you to discover the sentiment of a brand, product, or topic on Twitter. The data is a CSV with emoticons removed. Data file format has 6 fields: 0 - the polarity of the tweet (0 = negative, 2 = neutral, 4 = positive) 1 - the id of the tweet (2087) 2 - the date of the tweet (Sat May 16 23:58:44 UTC 2009) 3 - the query (lyx). If there is no query, then this value is NO_QUERY. 4 - the user that tweeted (robotickilldozr) 5 - the text of the tweet (Lyx is cool)
- url: http://help.sentiment140.com/home
- source: Standford Datasets
- IID: yes
- task: text classification, sentiment
- model: neural network from Builtin
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_24 (Dense) (None, 50) 500050
dropout_4 (Dropout) (None, 50) 0
dense_25 (Dense) (None, 50) 2550
dropout_5 (Dropout) (None, 50) 0
dense_26 (Dense) (None, 50) 2550
dense_27 (Dense) (None, 1) 51
=================================================================
Total params: 505,201
Trainable params: 505,201
Non-trainable params: 0
_________________________________________________________________
- name:
shakespeare
- description: 40,000 lines of Shakespeare from a variety of Shakespeare's plays. Featured in Andrej Karpathy's blog post 'The Unreasonable Effectiveness of Recurrent Neural Networks': http://karpathy.github.io/2015/05/21/rnn-effectiveness/
- url: https://github.com/karpathy/char-rnn/blob/master/data/tinyshakespeare/input.txt
- source: Tensorflow Datasets
- IID: yes
- task: text generation (char level)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, None, 65)] 0
lstm (LSTM) [(None, None, 128), 99328
(None, 128),
(None, 128)]
lstm_1 (LSTM) [(None, None, 128), 131584
(None, 128),
(None, 128)]
dense (Dense) (None, None, 65) 8385
=================================================================
Total params: 239,297
Trainable params: 239,297
Non-trainable params: 0
_________________________________________________________________
- name:
wisdm
- description: the WISDM dataset contains accelerometer and gyroscope time-series sensor data collected from a smartphone and smartwatch as 51 test subjects perform 18 activities for 3 minutes each.
- url: https://www.cis.fordham.edu/wisdm/includes/datasets/
- source: Fordham University Dataset
- IID: yes
- task: activity recognition
- model: neural network from Github Repository
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_17 (Conv2D) (None, 79, 2, 16) 80
dropout_6 (Dropout) (None, 79, 2, 16) 0
conv2d_18 (Conv2D) (None, 78, 1, 32) 2080
dropout_7 (Dropout) (None, 78, 1, 32) 0
flatten_7 (Flatten) (None, 2496) 0
dense_28 (Dense) (None, 64) 159808
dropout_8 (Dropout) (None, 64) 0
dense_29 (Dense) (None, 6) 390
=================================================================
Total params: 162,358
Trainable params: 162,358
Non-trainable params: 0
_________________________________________________________________
- name:
oxford_iiit_pet:3.*.*
- description: The Oxford-IIIT pet dataset is a 37 category pet image dataset with roughly 200 images for each class. The images have large variations in scale, pose and lighting. All images have an associated ground truth annotation of breed.
- url: http://www.robots.ox.ac.uk/~vgg/data/pets/
- source: TensorFlow Datasets
- IID: yes
- task: image segmentation
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg16 (Functional) (None, 4, 4, 512) 14714688
up_sampling2d (UpSampling2D (None, 8, 8, 512) 0
)
conv2d_11 (Conv2D) (None, 8, 8, 256) 1179904
re_lu (ReLU) (None, 8, 8, 256) 0
up_sampling2d_1 (UpSampling (None, 16, 16, 256) 0
2D)
conv2d_12 (Conv2D) (None, 16, 16, 128) 295040
re_lu_1 (ReLU) (None, 16, 16, 128) 0
up_sampling2d_2 (UpSampling (None, 32, 32, 128) 0
2D)
conv2d_13 (Conv2D) (None, 32, 32, 64) 73792
re_lu_2 (ReLU) (None, 32, 32, 64) 0
up_sampling2d_3 (UpSampling (None, 64, 64, 64) 0
2D)
conv2d_14 (Conv2D) (None, 64, 64, 32) 18464
re_lu_3 (ReLU) (None, 64, 64, 32) 0
up_sampling2d_4 (UpSampling (None, 128, 128, 32) 0
2D)
conv2d_15 (Conv2D) (None, 128, 128, 16) 4624
re_lu_4 (ReLU) (None, 128, 128, 16) 0
conv2d_16 (Conv2D) (None, 128, 128, 21) 357
=================================================================
Total params: 16,286,869
Trainable params: 1,572,181
Non-trainable params: 14,714,688
_________________________________________________________________
- name:
tff_cifar100
- description: a federated version of the CIFAR-100 dataset. The training and testing examples are partitioned across 500 and 100 clients (respectively).
- url: https://www.cs.toronto.edu/%7Ekriz/cifar.html
- source: Tensorflow Dataset
- IID: no
- task: image classification
- model: neural network from Tensorflow
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_6 (Conv2D) (None, 30, 30, 32) 896
max_pooling2d_4 (MaxPooling (None, 15, 15, 32) 0
2D)
conv2d_7 (Conv2D) (None, 13, 13, 64) 18496
max_pooling2d_5 (MaxPooling (None, 6, 6, 64) 0
2D)
conv2d_8 (Conv2D) (None, 4, 4, 64) 36928
flatten_5 (Flatten) (None, 1024) 0
dense_20 (Dense) (None, 64) 65600
dense_21 (Dense) (None, 100) 6500
=================================================================
Total params: 128,420
Trainable params: 128,420
Non-trainable params: 0
_________________________________________________________________
- name:
tff_emnist
- description: a federated version of the EMNIST dataset. The dataset contains 671,585 train examples and 77,483 test examples
- url: https://github.com/TalwalkarLab/leaf
- source: Tensorflow Dataset
- IID: no
- task: image classification
- model: neural network from TensorFlow
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_3 (Flatten) (None, 784) 0
dense_6 (Dense) (None, 128) 100480
dropout_3 (Dropout) (None, 128) 0
dense_7 (Dense) (None, 62) 7998
=================================================================
Total params: 108,478
Trainable params: 108,478
Non-trainable params: 0
_________________________________________________________________
- name:
tff_shakespeare
- description: a federated version of the Shakespeare dataset. The data set consists of 715 users (characters of Shakespeare plays), where each example corresponds to a contiguous set of lines spoken by the character in a given play. The dataste is composed of 16,068 train examples and 2,356 test examples.
- url: https://github.com/TalwalkarLab/leaf
- source: Tensorflow Dataset
- IID: no
- task: text generation
- model: neural network from Tensorflow
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) multiple 22016
gru (GRU) multiple 394752
dense (Dense) multiple 22102
=================================================================
Total params: 438,870
Trainable params: 438,870
Non-trainable params: 0
_________________________________________________________________