Skip to content

Crawling and Connecting: UTSA’s Web Presence Curated in University Archives

June 23, 2014

When you think of the records documenting UTSA’s history Special Collections maintain, you might think of boxes of old papers sitting on a shelf in our stacks.  If you’re reading this blog, it might occur to you that this text is a type of record produced by the University—our department within UTSA Libraries created this blog as a means of sharing what we are and what we do with members of the University and the general public. Similarly, much of the content on our website is updated through time, and many of the documents we create and post online may never be printed onto paper. Increasingly, departments and groups at UTSA are embracing the ubiquity of the web, relying on HTML pages and social media accounts to spread information and content about themselves, often rapidly updating and deleting the “old” content without much thought about preservation.

The earliest capture of utsa.edu in the general Internet Archive web collection, which quickly captures sites all over the web.  This website capture may be missing important elements that would change the way it is displayed.

The earliest capture of utsa.edu in the general Internet Archive web collection, which quickly captures sites all over the web. This website capture is missing important elements that change the way the page displayed.**  http://web.archive.org/web/19970131040814/http://www.utsa.edu/

Recognizing that the University has been actively creating and publishing content online, Special Collections partnered with the Internet Archive back in 2009 to collect and preserve web content produced by UTSA as a new method of archiving material. We use the tools available through the Internet Archive’s Archive-It system to capture relevant web content, which is preserved and made publicly accessible through our Archive-It partner page.

For our University Archives collections on the web, we maintain three collections produced by or about UTSA, which we’ve broken into the following groups: Academic Departments, University Administration, and Student Organizations. These are made up of official (utsa.edu) websites, as well as a large portion of social media sites that departments, faculty, staff and UTSA organizations have adopted to post and share information about their activities. *

While this information is invaluable for documenting the goings-on of the University, capturing it involves a lot of work. It begins with our team (in this case, the University Archivist) searching for and maintaining lists of websites that document UTSA. We load these into our Archive-It collections and administer web crawls (using a tool that ‘crawls’ through the links of pages, like Google crawls the web) taking care to monitor our results and change crawl settings as needed to capture the sites to the fullest extent possible. We also provide metadata—information about each webpage—that enables users to find and access our web collections.

We invest a good amount of time into carefully crawling for material, but there is still much web content that we cannot collect. This includes content that typically needs user input to behave in a certain way. Examples include pages with streaming video, Javascript (think about sites that have drop-down menus that you must mouse over to display), password protected sites, or websites that require a user to type and send text to display information (any sort of database-driven site, such as UTSA’s Bluebook). While we do our best to capture sites such as these, we often run into issues of pages displaying incorrectly, or we have to get creative about (and invest even more time into) capturing these pages—we do this on an as-we-can basis.

An example of a carefully crawled and curated website, with capture information at the top and page metadata to make the site more discoverable to users.

An example of a carefully crawled and curated website, with capture information at the top and page metadata to make the site more discoverable to users. https://wayback.archive-it.org/1688/20101130163255/http://art.utsa.edu/

You can view past captures of UTSA’s websites, along with other web collections the Special Collections team has curated, on our UTSA Archive-It partner page. Click on a collection to see a list of our archived pages, click on the page URL, and you can see past captures via the Wayback Machine (Archive-It’s tool for displaying web pages from the past). As you travel through the internet timeline, consider just how fragile and ephemeral this type of record can be. We hope you find these web archives useful.

 

*The University Archivist is busily preparing for the spring crawls of the websites in these collections—new crawls will be completed soon after this blog publishes. If you have suggestions for UTSA websites that we might not be aware of, please let us know.  Many of our social media websites come from the UTSA University Communications & Marketing Social Media Directory.

**For a printed image of and news release about the 1996 launch of the utsa.edu website, see page 10 of the UTSA magazine Sombrilla, Fall 1996, Vol. 2 No. 1.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: