04.21.06
Virtualisation and IRs – Part 1
When the RUBRIC Project began, one of our aims was to provide a test bed for our partners to trial instances of the three IR solutions that we are evaluating (each of which have come out of one of the FRODO projects).Under the more traditional approach, this could have involved purchasing multiple servers (we bought three), and running instances side-by-side. Or, we could have made all the partners share, by having just one of each solution available, and everyone having to play nicely together. Either way, this would have meant lots of work, trying to get all the instances to play nicely together. Then, there were licensing issues to think about too – if a solution licence is evaluated by CPU, there goes the option of having that bit of extra grunt in a machine gained by having multiple CPUs.
The ideal solution was to provide a platform for each partner that they could access and use individually, and have ownership over as their own test instance (that we would set up for them). Of course, using traditional methods, while achievable, it was looking messy.
Then we started talking about virtualization. Although this concept has been around for quite a few years already (I remember playing with an early version of VMware about 6 years ago, before the product really reached maturity) it seems that it’s only really taking on momentum now. However, for what we were wanting to achieve, it seemed perfect. This could provide a way in which we would be able to give each partner their own instance of a solution, to trial and break as much as they please, without interference from other partners or applications.
A brief note about the road we’ve been on over the last few months. Our first venture into using virtualization software was on our own desktop machines (currently Dell Latitude D810’s). By using VMware Workstation, each of the Technical Team is able to have an instance (or many) of each solution running on our own machines. Due to the beauty of snapshot functionality in Workstation, we’re able to take a snapshot of a given machine at any stage, and return the machine to that snapshot if a change we have made goes toes up. Cloning functionality allows us to make an exact copy of a machine within a few minutes, giving us a way to diverge in development within a very short time.
Along with getting to know Workstation, we also decided upon the use of VMware GSX Server, installed on a Dell 2850. The logic behind this was the prospect of being able to have multiple, stand-alone machines running concurrently on the one server. This meant that along with our project systems (including the project management software that we use called Trac), we could run a machine for each instance of an IR that we might need. There would be no need to run multiple instances of the same system on a single server.
About two months ago, after using GSX for a while, we evaluated options and decided to upgrade our infrastructure to VMware ESX Server. Due to the transformation from GSX to ESX (of a guest machine running Linux) not being a simple matter, we decided to pretty much start our ESX machines from scratch. But again, with the ability of ESX to clone machines within a matter of minutes, we were able to create a master RedHat install, and from there, clone that machine into the machines that we needed. For each of the three IR solutions we’re evaluating, a master instance has been created and this can then be cloned as required. Each clone can then have any necessary customizations done (GUI, input screens, etc). Along with adding cloning to the benefits of ESX, the ability to specify the number of CPUs that a given Virtual Machine can use allows us to adhere to licensing rules.
This has turned into being a beautiful solution to what we’re trying to achieve. Not only do we have the flexibility to offer as many instances as we need, but we’re now also able to create these new instances in just a few minutes. If a request for an instance is put in in the morning, we’re able to (usually) have a running functioning instance by the afternoon. This just would not have been possible if we were relying on physical infrastructure. Stay tuned for more on Virtualisation and IRs.
04.18.06
Institutional Repositories and others
Many people are already familiar with the concept of an Institutional Repository. For those new to this idea, it’s a centralized storage mechanism for the research output of an institution or organization. It’s searchable, browseable, and indexable, meaning that if it’s compliant to the OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting), it can be indexed and therefore searchable (and more importantly, find-able) on search engines like Google, Yahoo, MSN, AltaVista, etc. It’s an important forum for authors associated with an institution to gain exposure to their research.
But what about the number of times that we re-invent the wheel? For each of these research publications, there has been a project of some sort, whether small or large. For each project, there are (optimistically) various stages – planning, design, marketing, policy creation, and implementation. And for many people, they feel like they’re on their own in finding out how to start (which is the hardest part sometimes). What sort of planning do I need to do? How do I design this project? What background do I need? How in depth do I need to go? And so begins the process yet again. Why do we do this? Isn’t this phase just as important as the output? If these phases aren’t carried through, there _is_ no output.
Especially for people who are doing research in similar fields, the information that is gathered and produced in the earlier stages of a project could prove to be invaluable. But what normally happens to it? It gets tucked away in someone’s filing cabinet. Or, if we’re lucky, it might find its way onto a departmental drive, where other people can find it, but only if they a) know it exists, and b) know where it might (emphasise – might) live.
How about we gather all this information together. How about we put it into a system with a database backend, and even a web frontend. Let’s make that system searchable. Let’s make it browseable. And by golly, suddenly you’ve got another one of those repository things.
Not a repository to replace the Institutional Repository, but rather a supporting repository. As far as supporting research projects go, it can be a place where supporting documentation is held – project plans, marketing plans, design documents, marketing presentations, electronic versions of marketing materials, policies, meeting minutes. For want of a better, more non-political phrase, let’s call it an Organisational Repository (OR for short). If you were outside the research realm, an OR could be used for the above mentioned uses and more.
So you don’t want everything viewed by the big wide world? Ok, so let’s talk XACML. Users are placed into groups. With the help of Shibboleth, these groups can extend beyond just one institution or organization. Groups (or users) are given a role, and that role has defined access controls. To use an example brought up in a discussion here yesterday, librarians from University of Someplace can view our records of type ‘X’, but not type ‘Y’ (perhaps type ‘Y’ contains sensitive information). Because those librarians have authentication using Shibboleth, even though they’re not in our organization, our repository knows who they are and what they can access. An anonymous viewer can also have set roles.
But back to my initial rant. In speaking of reinvention of the wheel – here’s where the neat part comes in. To create an OR, we don’t _have_ to reinvent the wheel, just alter it a little. By using an already-existing piece of repository software (such as we are currently investigating on the RUBRIC Project), some modifications would create fields specific to an OR instead of an IR. Of course, generic fields would still exist – author, date, title, etc. But the types of items may include Project Plans, Design Documents, Policies, Marketing Plans, Marketing Media, and so on. Searches could be conducted not only on authors, but on the type of item, and also on the specific project.
Think of all the poor underlings whose sanity would be saved. They’d have a place to be able to go and get ideas on how to start their project. So they copied the timeframes we had in our project? Who cares! Good on them – save them some work. It’s hardly information that most of us need to keep held close to our chests. I’m sure we’ve all got much better things to do with our time.
A Beginning
Ok, so a first blog post, I’m supposed to outline all the things I’m going to write about in this blog. Well, to be quite honest, I don’t know. I know some of the things that I’ll write about, so I may as well start with those.
- Institutional Repositories. Be warned – if you hold an abhorrence of acronyms (let’s call you AOA), leave now. The realm of IRs is full of them. I work on a project which instead of having a name that we can remember, we have an acronym (RUBRIC). We work with other projects which are all called by acronym names too. It’s a fascinating realm though. Between the ASRC-RFCD and the OAI-PMH, occasionally there’s some XSLT transformed from DC to METS. Confused? Me too.
- Computers. I work in IT, therefore I must write about it. I don’t think I’d term myself an IT geek, but others may beg to differ. My excuse is that I haven’t spent my entire teenage and adult years with my head in a box, working out why a motherboard has died, or sitting in a cave drinking gallons of joe (sorry, litres). My background is actually Arts. I still don’t get why IRQ assignments are so important (I’m assured they are, but why can’t they all just get along nicely?), or why an smp kernel throws a fit when running under ESX (which it does, and is now fixed by running a non-smp kernel). I’m a big Linux fan, but don’t have a passionate dislike of MS like some others do – MS does its own thing, ‘nix does its own thing, and we all know who does it better.
- Travel. I love to travel. I wish I could do it all the time. Ever since my first major trip I have learnt that travel does wonders for people. It makes them realize that there is more to the world than their own backyard, and that others can actually make important contributions to the world, despite our Aussie-centric view that we’re ‘it’. Admittedly, we live in the nicest country I’ve ever seen, but hey. I love learning about how other people live, and learning why they see the world the way they do.
- Well, there’s no defined number 4, but I’m sure something will emerge as I start to write… stay tuned.