January | 2009

This is a short note I wrote for myself for the ITS Orientation on Jan 19, 2009. Some of the notes come from the feedback I sent back to the organizer, Molly Kline, who I think really takes inputs seriously and keeps improving the series.

The orientation consists of several talks from different units (HR, AIS, CSS, TNS, SOS, TLT, RCC, etc.) under ITS, as well as some visits to various hardware facilities (Digital Library, Campus Networking, Video Networking). Getting to know different departments/units within ITS is extremely important and helpful. It gave me a much more clear idea of how what I do fits in the whole picture (of the university, or even the field of higher education in general).

However, I think the way how each unit and its missions is presented matters a lot. For example, a presentation on what services the Digital Library provides (or even challenges they face), before the tour, is probably much more useful than merely a tour on the hardware facility in the library basement. I think one of the goals of introducing different ITS units to its employees would be to encourage collaboration and communication, so resources and knowledge can be shared. A more interactive format would help reach that goal.

The talk during the lunch hour by Kevin, the VP, and Jeff, the AVP, is extremely helpful for those who are on the job for a while and are ready to plan a long term growth at PSU. They talked about the importance of keeping learning. I couldn’t agree more — in this ever changing field, what’s really valuable is not a skill, but the ability to grasp new ones continuously.

It’s a very meaningful orientation and I am very glad that I went.

Title: ITS Forum: Storage/Archival Solutions
Time: 1/13/2009 1:00PM – 2:00PM in 141 Computer Bldg
Speaker: Mark Saussure ( mcs4 )

Personal notes from the training:

exabyte (5 is all words ever spoken by human being)
200 TB now at PSU

Repository software: fedora, DSpace, Wiki’s, blogs (software, etc.)
Databases

XAM (eXtensible Access Method)
PSU driving industry to adapt
XAMfs with meta data (e.g. PDF version 2)
allows search
SNIA.org
HP grid storage service: disk nodes (each nodes =~ 2tb) to help distribute searching
PSU spent 6 months to move man TB of data to a new syste; this grid system allows
live adding new drives and retiring old drives

research grant only granted if the data is made availalbe (searchable)

Personal thoughts after the training:

The training emphasizes on an ongoing XAM initiative, which basically provides an extensible meta data mechanism to stored data so the data can be easily converted later on (to a new platform, to a different format) and more searchable (meta data can provide meaningful and sophisticated searches).

While I can’t comment too much on where the storage management, I appreciate the idea of having meta data on the file system level, or at least on a level that’s higher than the raw data (or application level). Two benefits come to mind immediately:

Searchability: Through the file system, users can search files of the same topic (or taxonomy), regardless of the raw file type (picture, audio, text, PDF, etc.). One can search books, pictures, audio recordings (speeches), video recordings (news), etc. that’s related to 2008 Presidential Election, and were recorded during year 2008 (as opposed to a historian’s research that’s done in 2020). The meta data provides the actual information about the contents (the dates of the events), rather than the storage media (the file creation date).

Standardized meta format: Continuing with the same example, we can avoid the (almost impossible) coordination among the book/picture/audio/video/etc. communities to reach a consensus on the data format, since the meta data is external and file format is a non-issue here.

Extensibility: The whole trend of using XML is motivated by the needs to have a data structure that can present portable and extensible information. Choosing XML (or similar technology) as the data format gives the same benefits.

Speaking of extensibility, here is a side thought that is relevant to most of us who are not ready to adopt XAM yet:

I’ve always wondered why the format of the meta data for audio files (ID3), video, pictures, etc. hasn’t adopted XML as the format. Comparing to the average media size, the meta data, if in XML, is relatively small in size. It’ll also make meta editor so much easier to manipulate the data.

Pushing this further, perhaps if we can have a universal container format which will, by default, contains a section of meta data, and a section of the raw data. This idea, compared to storing meta data on the file system level, may show its benefits on streaming. Whatever is streamed will have a universally readable meta section.

What would that mean?

* A search engine won’t have to parse/download the whole file before indexing
* When a job seeker submits a resume as a file format, it’s automatically searchable by the employee.
* While tagging is embraced by the social network users, why can’t we have the same benefits right at the file level — I put an audio recording of a talk online, its meta data makes it accurately searchable by all communities. Instead of struggling to make a long string of tag both readable, easily typeable, and meaningful, the meta data section will solve the problem.

How about that?

edued

my education journey

Monthly Archives: January 2009

ITS Orientation on Jan 19, 2009

ITS Forum: Storage/Archival Solutions — notes and thoughts