Additional languages
The Rust Iterator API is one of the first things a Rust novice should learn after familiarizing themselves with the basics of the language. However, this API and its documentation can be daunting for a beginner.
For example, have you ever looked at the map method documentation and wondered why it's so complicated?
fn map<B, F>(self, f: F) -> Map<Self, F>
where
Self: Sized,
F: FnMut(Self::Item) -> B,
What's B and F? Why does this method return a Map
type?
Another example is the collect
method that somehow allows us to convert an iterator into various types of collections:
let v: Vec<_> = (0..10).collect();
let s: HashSet<_> = (0..10).collect();
How can this be possible? Rust is a statically typed language, right? How can the same method return different types?
Lets learn how the Iterator API type system works by dumbing it down to a bare minimum implementation.
The Iterator trait is the first building block of the iterator API.
Let's define a simple MyIterator trait to demonstrate its structure:
pub trait MyIterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
}
This trait defines the basic mechanism for iteration. It has a single method next
which returns the next item in the iteration, or None
if the iteration is over. The Item
associated type defines the type of the items in the iteration.
Let's implement a simple iterator for a simple slice - [T]
. This is a
dumbing down of std::slice::Iter
The SliceIterator
struct holds a reference to the vector and the current position in the iteration.
/// The lifetime parameter `'a` is used to ensure that the iterator does not outlive the data it iterates over.
pub struct SliceIterator<'a, T> {
data: &'a [T],
pos: usize,
}
impl<'a, T> SliceIterator<'a, T> {
// pub(crate) is used to make this constructor visible only to this crate
pub(crate) fn new(data: &'a [T]) -> Self {
SliceIterator { data, pos: 0 }
}
}
impl<'a, T> MyIterator for SliceIterator<'a, T> {
/// The Item type of SliceIterator is a reference to the type of the slice.
/// The lifetime parameter `'a` ensures that the returned reference does not outlive the
/// data we are iterating over.
type Item = &'a T;
fn next(&mut self) -> Option<Self::Item>
{
if self.pos >= self.data.len() {
None
} else {
let result = Some(&self.data[self.pos]);
self.pos += 1;
result
}
}
}
Let's test our iterator:
#[test]
fn slice_iterator_next_returns_next_item() {
let mut iter = SliceIterator::new(&[1, 2, 3]);
assert_eq!(iter.next(), Some(&1));
assert_eq!(iter.next(), Some(&2));
assert_eq!(iter.next(), Some(&3));
assert_eq!(iter.next(), None);
}
Note the lifetime parameter 'a
which is used to ensure that the iterator does not outlive the slice it iterates over. This is important because the iterator holds a reference to the slice, and if the slice is dropped before the iterator, the iterator will hold a dangling reference. We don't want that.
Thanks to this lifetime parameter, the following code will not compile:
let data = vec![1, 2, 3, 4, 5]; // ──| data variable lifetime 'a starts
let mut iter = SliceIterator::new(&data); // | SliceIterator::new is bound to 'a lifetime
drop(data); // __| data is dropped - lifetime ends
iter.next(); // here we violate the 'a constraint defined in SliceIterator
The iterator trait has a single required method next
, but it also has many provided methods. For example, the previously mentioned map
method.
fn map<B, F>(self, f: F) -> Map<Self, F>
where
Self: Sized,
F: FnMut(Self::Item) -> B,
The map
method takes a closure and returns a Map
struct. The Map
struct is an intermediate utility struct which implements the Iterator
trait and holds any needed state and logic for this particular iterator method. This concept is often called an Adapter
Let's expand the MyIterator
trait to include map
and filter
methods:
pub trait MyIterator {
type Item;
// ...
fn map<B, F>(self, map_fn: F) -> MyMap<Self, F>
where
Self: Sized,
F: FnMut(Self::Item) -> B,
{
MyMap::new(self, map_fn)
}
fn filter<P>(self, filter_fn: P) -> MyFilter<Self, P>
where
Self: Sized,
P: FnMut(&Self::Item) -> bool,
{
MyFilter::new(self, filter_fn)
}
}
This is a simplification of std::iter::Map
/// I is the iterator type the map method is applied on.
/// F is the type of the closure that is used to map the elements. It is constrained to be of type
/// `FnMut(Self::Item) -> B` in the MyIterator implementation
pub struct MyMap<I, F>
where
I: MyIterator,
{
iter: I,
map_fn: F,
}
impl<I, F> MyMap<I, F>
where
I: MyIterator,
{
pub(crate) fn new(iter: I, map_fn: F) -> Self {
MyMap { iter, map_fn }
}
}
impl<B, I, F> MyIterator for MyMap<I, F>
where
I: MyIterator,
F: FnMut(I::Item) -> B,
{
/// The Item type of MyMap is the type returned by the map closure
type Item = B;
fn next(&mut self) -> Option<B> {
if let Some(x) = self.iter.next() {
Some((self.map_fn)(x))
} else {
None
}
}
}
Lets examine the various generic variables we used here:
I
- The MyMap struct owns the instance of the iterator we're mapping over. I
is the type of this iterator. For example, if we are mapping over a SliceIterator
, I
will be SliceIterator
.
F
- The MyMap struct also owns the closure we use to map the items. F
is the type of this closure. For example, if the closure is |x| x * 2
, F
will be FnMut(&i32) -> i32
.
B
- The map function return type needs to be defined as well. The return value of the map function is the item type of the MyMap iterator. For example, if the closure is |x| x * 2
, B
will be i32
, and the MyMap iterator next
method have Option<i32>
as its return value.
Dumbing down of std::iter::Filter
/// I is the iterator type the filter method is applied on.
/// P is the type of the predicate function we invoke on each item
pub struct MyFilter<I, P>
where
I: MyIterator,
{
iter: I,
filter_fn: P,
}
impl<I, P> MyFilter<I, P>
where
I: MyIterator,
{
pub(crate) fn new(iter: I, filter_fn: P) -> Self {
MyFilter { iter, filter_fn }
}
}
impl<I, P> MyIterator for MyFilter<I, P>
where
I: MyIterator,
P: FnMut(&I::Item) -> bool,
{
/// The Item type of MyFilter is the same as the Item type of the inner iterator
type Item = I::Item;
fn next(&mut self) -> Option<Self::Item> {
// we iterate over the inner iterator until we find an item that passes the filter
while let Some(x) = self.iter.next() {
if (self.filter_fn)(&x) {
return Some(x);
}
}
None
}
}
Adapters like Map
and Filter
are internal to the iterator implementation, so you will rarely use them directly, nonetheless, let's see a usage example:
#[test]
fn my_map_filter_next_returns_next_item() {
let iter = SliceIterator::new(&[1, 2, 3]);
// note that the predicate closure is a reference to a reference. Check out the code to figure out why!
let filter = MyFilter::new(iter, |x: &&i32| **x % 2 == 0);
let mut map = MyMap::new(filter, |x| x * 2);
assert_eq!(map.next(), Some(4));
assert_eq!(map.next(), None);
}
Most iterator usages will end with a call to the collect method in order to collect the iterated items into a collection of a concrete type. Let's get under the hood of this one as well.
First, we expand the MyIterator
trait to include the collect
method:
pub trait MyIterator {
// ...
fn collect<B>(self) -> B
where
B: MyFromIterator<Self::Item>,
Self: Sized,
{
B::my_from_iter(self)
}
}
The collect
method defines a return position generic type B
which must implement the MyFromIterator
trait.
std::iter::FromIterator is a trait that defines how to create a collection from an iterator. Previously, we defined the SliceIterator
struct which converts a vector into an iterator. Now we need to define the opposite - a way to convert an iterator into a collection.
Let's define the MyFromIterator
trait:
pub trait MyFromIterator<T> {
fn my_from_iter<I>(iter: I) -> Self
where
I: MyIterator<Item = T>;
}
my_from_iter
allows us to convert anything that implements MyIterator
into the trait implementing collection. We could use this to convert SliceIterator
, MyMap
, MyFilter
, and any other MyIterator
into a collection.
Note that for simplicity, we are not using the IntoIterator
trait here, as used in the actual std::iter::FromIterator
trait.
For example, let's implement MyFromIterator
for Vec<T>
and HashSet<T>
. Both examples iterate over the target iterator and push each item into the constructed collection.
impl<T> MyFromIterator<T> for Vec<T> {
fn my_from_iter<I>(mut iter: I) -> Self
where
I: MyIterator<Item = T>
{
let mut vec = Vec::new();
while let Some(x) = iter.next() {
vec.push(x);
}
vec
}
}
impl<T> MyFromIterator<T> for HashSet<T>
where
// we need to add these constraints because HashSet::new() requires them
T: Eq + Hash,
{
fn my_from_iter<I>(mut iter: I) -> Self
where
I: MyIterator<Item = T>,
{
let mut set = HashSet::new();
while let Some(x) = iter.next() {
set.insert(x);
}
set
}
}
Let's look at the collect
usage example again:
let iter = SliceIterator::new(&[1, 2, 3]);
let v: Vec<_> = iter.collect();
With our simplified implementation, it actually expands to
fn collect(self) -> Vec<u64>
{
Vec::<u64>::my_from_iter(self)
}
and more specifically
Vec::<u64>::my_from_iter(iter);
Any other collection type that implements MyFromIterator
can be used in the same way:
let s: HashSet<_> = iter.collect();
let b: BTreeSet<_> = iter.collect();
let l: LinkedList<_> = iter.collect();
Each one of these collect
calls actually expands to a different my_from_iter
implementation.
let s = HashSet::<_>::my_from_iter(iter);
let b = BTreeSet::<_>::my_from_iter(iter);
let l = LinkedList::<_>::my_from_iter(iter);
Now we know that collect
is a thin sugar coating over FromIterator
and from_iter
, and in fact, does not violate Rust's strict type system.
Be sure to go over the std::iter documentation to see all the other iterator methods and adapters.
All the source code for this article can be found in the src directory.