Tag Archives: The Internet Archive

New Zealand Web Harvest 2010

In January 2010 the National Library of New Zealand published a paper discussing concerns raised during their first web harvest in October 2008, specifically the notification period, the robots policy, and the location of the harvester. They worked with the Internet Archive to do their web crawls and harvesting.

Here is a link to the paper http://www.natlib.govt.nz/about-us/catalogues/library-documents/harvest-options-paper

Here is a link to current updates on their web harvest project http://www.natlib.govt.nz/about-us/current-initiatives/web-harvest-2010

Leave a comment

Filed under Web Site Archiving

Heritrix: The Internet Archive’s web crawler project

Heritrix is the Internet Archive’s open-source, extensible, web-scale, archival-quality web crawler project. It’s being used and supported by such institutions as the Library of Congress, the National and University Library of Iceland, and the National Library of Norway. It could be a helpful tool for universities getting into web archiving.

http://crawler.archive.org/index.html

Leave a comment

Filed under Web Site Archiving