Digital Library FAQ
You can search the Digital Library using the general website search box in the upper right hand corner of the page. Search results will include both Digital Library content and all of the other website content. You can use the facets (checkboxes) on the left side of the search results page to refine your results and limit them to specific content types - like book or image - as well as limit to specific subjects, dates, or authors. If you're familiar with doing advanced searches in Google, those tricks will work on our site as well.
The ability to refine the search results using those checkboxes (facets) is new. We'd love to hear any and all opinions on how it's working for you - just Contact Us.
Please see our Copyright and Permissions page for more information about copyright terms used on the site, take-down requests, and obtaining very high resolution images for reuse.
The Smithsonian Libraries has been scanning books and making them available online since 1997 but it wasn't until 2007, when we began digitizing for the Biodiversity Heritage Library (BHL), that we transitioned to "mass" scanning. Since then, we have digitized over 25,000 items containing nearly 10 million pages for the BHL from our collections in zoology, botany, agriculture, paleozoology and paleobotany, along with some history, anthropology, horticulture, and geology titles.
In 2010 we began to systematically digitize titles in our Art, History, and Technology collections. These titles can be found in the Digital Library Books Online section of this website. As of 2017, more than 8,000 items with over 2 million pages are available on our website, including rare titles in the history of science, African exploration, automotive and railway history, and art and design. Books are browseable by subject, author, and have been grouped into topic collections.
All of the books in the Digital Library are available for download as PDFs and epubs, which can be read on many dedicated ebook readers. You can also download individual page images and plain text (uncorrected OCR.) Daisy files are available from the Internet Archive, but because they are automatically generated, may not always be the best option for blind readers.
Follow the links on each book's page (under the book viewer) to download the book as a PDF, ePub, or zipped file containing all the page images in JPEG2000 format.
Downloading One Page from a Book
If you only need a small image from a book suitable for use online, you can right click (Command + click on your Mac) on the page you want, and choose "Save As" or "Open image in new window" - this will give you a low-resolution JPG of the page.
If you need a higher resolution version of the page image, you can download a full resolution JPEG2000 (JP2) image by following these steps:
1) Click on the link for "Find in: Internet Archive" located below the book.
2) Determine the filename for the page you want by saving the JPG image (see above) and noting the file name.
2) In the Item Details view (not book viewer), on the lower right part of the page is a box labeled "Download Options." Click on the link to "Show All".
3) There will be one link - for the jp2.zip file - that has the text "(View Contents)" next to it. Click on the link to View Contents.
4) You will now see a list of all the page images, in both JP2 (large) and JPG (small) sizes. Click on the JPG first to make sure it's the image you want, then right click and save the JP2 version to get the high-resolution file.
Note: not all image viewing or editing programs support JPEG2000 images. This matrix on Wikipedia details which applications can work with JP2s
Technical Specs, aka, FAQ for Library Colleagues. If your questions are not answered here, please send us an email through our Contact Us form.
What platform is the Digital Library running on?
What metadata standards do you use?
What are your image digitization standards?
What type of scanners do you use?
Do you digitize for preservation, and if so, what standards and processes are you following?
The books portion of the Digital Library is built in Drupal, as is all website content created after 2012. Digital Library book image files are stored on and served from the Internet Archive. Book metadata is harvested and stored in Drupal, and the page images are served up via a custom Drupal module using the Internet Archive page turner. Other parts of the Digital Library, particularly older virtual exhibitions, bibliographies, and special collections inventories, were all developed using ColdFusion over MSSQL databases.
Digitized book metadata follows the model set up by the Internet Archive. It includes descriptive records in MARCXML with item and page level data in separate XML files. The MARCXML is generated from our catalog MARC records at the time of scanning. In the Digital Library, we make the item level data available as RIS (for citations) and as Linked Open Data in RDFa. The page level data (enumeration, page type, etc.) is either created by the Internet Archive using their Biblio software, or created by us at time of scanning using our MACAW (Metadata Collection And Workflow) tool, which was developed in-house.
Many (but not all) images in the Image Gallery have embedded metadata following the Smithsonian standard for embedded metadata in images. Vocabularies used in descriptions include Dublin Core, LCSH, AAT, SKOS, bibo, and Romaine subject headings for our Trade Literature.
We follow the FADGI guidelines for creation of still images. Because many of our books are digitized by the Internet Archive, which does not follow those guidelines, we don't guarantee that every page image will meet all our internal standards, however all images in non-folio size books, excluding "foldouts", should be at least 300ppi.
Digitization done in-house creates 24-bit color TIFFs as master images and then uncompressed and compressed JPEG2000s as derivatives. Internet Archive uses lossy, compressed JPEG2000s as master images.
More information about our equipment can be found on the Digital Library department's page.
We typically do not digitize for preservation, only for access.