In the summer of 2010, the Smithsonian Institution Libraries, with a grant from the Atherton Seidell Endowment Fund, developed a process to scan folio volumes, large fold-outs, and other materials not suitable to our existing digitization workflow. As part of this process, the Macaw tool was developed to collect page-level metadata and manage the scanned pages. The result is a complete digital version of the item ready to be shared with external systems, such as the Biodiversity Heritage Library and the Internet Archive.
Macaw performs three major tasks in the scanning process:
- Import and management of the images from the scanner or camera.
- Collection of the page-level metadata that describes the physical aspects of the page.
- Post-processing and exporting/uploading the digital book to other systems.
Existing as only one step in a larger process, Macaw is built in a modular manner to be customized to a system’s unique needs through two sets of required PHP objects. The first is meant to ingest metadata about new items from an external system. The second is one or more export modules used to share the completed item with other systems. Macaw also has a number of configuration settings that are used to integrate it into an existing server setup.
Macaw provides separate user accounts, a few administrative tools to assist in management, extensive logging for both analysis and forensics, and the ability to have a “quality assurance” user to review the work of others before approving an item for sharing to other systems.
If you are interested in running a local instance of Macaw, or would like to contribute to code development, code is available on GitHub.