Skip to content

Commit

Permalink
Update gathering-names.md
Browse files Browse the repository at this point in the history
update text
  • Loading branch information
aiwang24 authored May 27, 2024
1 parent 5932228 commit 499c58b
Showing 1 changed file with 9 additions and 8 deletions.
17 changes: 9 additions & 8 deletions src/gathering-names.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,17 @@ import { return_resized_img } from "./components/img_utils.js";

# Gathering the Names

Over 300,000 names are etched in the wafers representing students, faculty, staff, and alumni from 1861 up to the fall of 2023.
The Alumni Association and Institutional Research keep the lists of students and alumni, as well as lists of faculty and staff since 1991 onwards.
The records of faculty and staff before 1991 existed only in paper form.
Nearly 340,000 names are etched in the most recent wafer design, representing students, faculty, staff, and alumni from 1861 up to the fall of 2023.
The Alumni Association and Institutional Research maintain electronic databases of current students and alumni, as well as lists of faculty and staff since 1991 onwards.
The records of faculty and staff before 1991 existed only in paper form.

With the help of MIT Libraries and Institute Archives, over 6,000 pages of old paper directories were scanned and examined with optical character recognition (OCR).
Then, a combination of custom-built programs extracted the text from the digitized data, reformatted the names, and removed duplicates.
This was followed by hours of manual editing to catch OCR problems and other errors.
Even so, as our historians and archivists remind us, no record is perfect!
With the help of historian Nora Murphy at Institute Archives and Jenn Morris at the MIT Libraries, over 6,000 pages of old paper directories were located, scanned, and converted into electronic text with optical character recognition (OCR). This process alone took more than half a year. Then with a combination of custom-built programs, we extracted the text from the digitized data and spent hours manually editing to catch OCR problems and other errors.

(Image: Example paper record used in optical character recognition.)
Collating the names -- to construct a dataset spanning more than 160 years -- presented several challenges, given the variety of input sources and the need to format names in a consistent manner. Many names were listed in multiple places, sometimes spelled or formatted differently across sources. And some groups are simply hard to find in the records.

Over the course of the project, we grew to appreciate Murphy's perspective, shared early on: that even with seemingly endless data nowadays, "there are no *perfect* records..." With every iteration, community input aids our endeavor to collect as extensive a collection of names as we can to form the foundation of One.MIT.

*(Image: Example paper record used in optical character recognition.)*

:::

Expand Down

0 comments on commit 499c58b

Please sign in to comment.