April 30, 2007

Working Group on Computational Science and Engineering Undergraduate Education

As we all know, science is becoming more computational every day. Computers are being used to simulate and solve an increasing number of scientific problems. SIAM’s Working Group on Computational Science and Engineering (CSE) Undergraduate Education has released a very interesting report on how schools can improve computational science training for the next generation of mathematicians, scientists and engineers.

It's a fascinating report on topics ranging from why have a CSE program to how to design such a program to some case studies to the value of internships to career possibilities. All that's missing is an extensive CSE bibliography!

One interesting thing that they note in the introduction is that CSE programs tend to be more diverse than other math or computing programs:

It is widely documented that the number and proportion of female undergraduates in computing fields has been declining in recent years. CSE, and especially CSE applied to the biological sciences, typically attract a much higher proportion of females. It is not uncommon for undergraduate applied mathematics programs to have a majority of female students, and it is very common for biology, for example. CSE therefore represents a good opportunity to attract a more diverse student body into computing.

Via Math News You Can Use!

April 27, 2007

Carnivals

The last couple of weeks:

CS Stuff

A quick roundup of some interesting CS related stuff from the last few weeks:

April 26, 2007

Dregni, Eric & Jonathan. Follies of science: 20th century visions of our fantastic future. Denver: Speck Press, 2006. 127pp.

I don't have to much to say of a deep or profound nature to say about this book. It's one of those "Whatever happened to the future with all the flying cars and robot butlers you science guys promised me!" books. And, as such, it's very successful in that it doesn't take itself too seriously. This book is just plain fun.

The chapters basically run the gamut of all the promises that futurists have made over the decades. Chapter One is about transportation, talking about the dreams of jet packs and flying cars, zeppelins and atomic airplanes. Chapter Two is about our great friends, the computer and thier great friend, robots. Utopian and dystopian, it's interesting to see our love-hate relationship with computers goes way back. Are they are friends or will they take over the world? Chapter Three is about how we will progress as a species beyond the need for war. An interesting idea, this, one that seems tragically flawed. Can machines replace us in the trenches? Will amazing super-weapons make war obsolete?

Chapter Four is on the cities of the future, gleaming, perfect and full of labour saving devices, perfectly planned, domed or doomed? Zeppeling highways, weather controlled with hanging gardens. Chapter Five is full of medical marvels: the end of pain, using radioactive skin creme, atomic farming, an extremely bizarre section on "marital aids", drinking your pee, living forever and human experimentation. Chapter Six is about that most perfect of fantasies of the future, space colonies! Rockets powered by steam, mad scientists faster than light, finding buxom alien babes on Mars! Finally, Chapter Seven is a series of predictions for our own future! Taken from Hank Lederer, we see that by 2015 we'll have foldout computer screens; 2035, implantable organs and limbs; 2030, smart paper; 2040, immortality; 2050, a food creator; 2060, poverty eliminated and 2100, space colonies!

I have to say that you really don't want this book for the explanatory text anyways. You really want it for the fantastic illustrations and the lively presentation. The book makes heavy use of old sf pulp covers and old catalogue illustrations; the captions are often quite funny as well. One complaint is that the illustrations aren't very well credited. There are a few sf art books in the bibliography (a treasure trove for amusing popular science collection development, by the way) and a mention of the use of old catalogues, but I wish there was better citing. At very least, the artists for a lot of the science fiction illustrations would have been nice.

In the end, I can't really recommend this book for academic collections, there's just not enough substance. However, public libraries would find a ready audience for a colourful book like this, as would most school libraries. It would also make a great holiday or birthday present for the science fiction loving techy gal/guy in your family.

(Book supplied by publisher.)

Wiley and fair use

The latest issue tearing across the science blogosphere is journal publisher John Wiley & Sons harassement of ScienceBlogger Shelley Batts.

Batts original post used a figure from a journal article from the Journal of the Science of Food and Agriculture entitled "Natural volatile treatments increase free-radical scavenging capacity of strawberries and blackberries" by Chanjirakul et al.

This is what Wiley sent her:

Re: Antioxidants in Berries Increased by Ethanol (but Are Daiquiris Healthy?) by Shelly Bats

http://scienceblogs.com/retrospectacle/2007/04/antioxidants_in_berries_increa.php

The above article contains copyrighted material in the form of a table and graphs taken from a recently published paper in the Journal of the Science of Food and Agriculture. If these figures are not removed immediately, lawyers from John Wiley & Sons will contact you with further action.

Regards,

[contact details removed]



This is a shocking incident of a publisher trying to bully a writer/journalist/blogger into not writing critically or skeptically about a piece of research -- and really, read Batts' original article to see how stupid this whole thing is.

Today is pilling it on day against the publisher, and we should all take part. Comment on one of Batts' posts, write your own blog post, email or call the publisher to express your disgust at these types of tactics. Remember, as librarians we have a lot of influence on publishers, more that we realize. For most journals, institutional subscriptions make up a huge part of their revenue stream and, after all, who makes those decisions. We've made a difference with the SAE's crazy DRM scheme and we make a difference here too.

If you want more information, as usual coturnix has helpfully brought together all the various posts into one big linkfest. Also typically, Janet Stemwedel has one of the most on target analyses of the situation.

Update 12.44PM: The situation has been resolved. Apparently it's all due to an overzealous junior employee, whose contact info I've removed above at Batts' request.
Dear Dr Batts

I'd like to introduce myself as the Director of Publications at the SCI.

There has been a general misunderstanding with this issue. Our official response is below, which we are happy for you to publish:

"We apologise for any misunderstanding. In this situation the publisher would typically grant permission on request in order to ensure that figures and extracts are properly credited. We do not think there is any need to pursue this matter further."

As this is a misunderstanding inadvertently caused by a junior member of staff, I would be grateful if you would remove Ms Richards contact details from your blog. She has been most distressed by some abusive emails that she has received on this matter.

Yours sincerely,

Sarah

Sarah Cooney
Director of Publications
Society of Chemical Industry
International Headquarters
14/15 Belgrave Square, London SW1X 8PS, UK

April 24, 2007

Interview with Michael Morgan of Morgan & Claypool

It's time for another in my occasional series of scitech publishing/blogger/scientist interviews. This time around I have a few questions for Michael Morgan, formerly of Morgan Kaufmann and now with tech publishing newcomer Morgan & Claypool. I first met Mike at SLA in New York City a few years ago at a party, still well before the launch of the new product, and his ideas for what became Synthesis really struck me as a terrific idea, in many ways a possible template for the future of "book" publishing in computer science and engineering. I've been happy to support it from the beginning, as I think good work deserves our support, and I'm even happier to give Mke an opportunity to talk a bit about himself and his company's new product. Thanks, Mike!



Q0. Please tell us a little about your education & career path to this point and a bit about the thought processes that lead to the forming of Morgan & Claypool.

I've been in publishing my entire career. I graduated from Connecticut College, one of the great small American liberal arts colleges. I started my publishing career at Addison-Wesley, first as a college traveller (sales representative) and then as a computer science editor. In 1984 I was invited by William Kaufmann (former president of Freeman) and Nils Nilsson (a Stanford computer scientist) to join them in founding Morgan Kaufmann Publishers. We built Morgan Kaufmann as an independent company for 14 years and then merged with Academic Press, a subsidiary of Harcourt. At Academic Press, I became VP of book publishing and also remained as president of Morgan Kaufmann. After three years, Reed Elsevier acquired Harcourt and therefore Academic Press and Morgan Kaufmann. At that point, I had been at Morgan Kaufmnann for 17 years and it seemed like a natural point to consider doing something else so I left and took some time off. After a few months, Joel Claypool, who was engineering publisher at Academic Press, suggested the key idea behind Synthesis and we started Morgan & Claypool to develop it. Both Joel and I are book publishers. We had observed the transition of journal publishing from print to electronic and saw that there was the opportunity to pursue some interesting publishing ideas with the technology and business models that had been created.


Q1. Could you tell us a little about what Synthesis is?

Synthesis is a large (and growing) collection of original, innovative content in engineering and computer science. We publish across about 30 areas now, for example: bioengineering, computer graphics, signal processing, artificial intelligence and are adding new areas on an ongoing basis. The documents in Synthesis are called "lectures" and are essentially 50-150 page peer reviewed book-like presentations of key topics in research and development written by active researchers. They are shorter and more targeted than typical books but broader and provide more of a "synthesis" than a journal article. Also, since they are created and delivered electronically they can be revised frequently. They can also include multimedia elements such as animations, code, video, audio, etc, although we haven't done much of that yet. The concept of a short targeted presentation that can be updated frequently turns out to be very powerful. It enables presentation of cutting edge, active research topics that are moving too fast for books but for which there is a need for a tutorial overview. Our target audience is researchers who need to come up to speed in an area outside of their own, graduate students and advanced undergraduates, and engineers who are looking into new ideas for application. Another great application of this model is short pedagogically oriented treatments of more mature subjects that can be used for courses or professional development. Since our license encourages unlimited classroom use of Synthesis faculty can assign a lecture to supplement traditional textbooks at no additional cost to the student.


Q2. In reference to Synthesis, who's harder to convince that the model is a good one, faculty or librarians? Have you had a lot of feedback from teaching faculty and students so far, or are you not getting much from them yet?

Although we have had very gratifying support from our library community, the most active initial excitement came from faculty. I have personally discussed this idea with hundreds of faculty in computer science and have never in my career heard such enthusiasm for a publishing idea. The Synthesis lecture fills the need for a vehicle to present a first synthesis of a new field for students and researchers in other areas. As science and engineering expand and become more interdisciplinary, there is a growing need to understand new areas. Most journal articles are not very useful for this since their purpose is to record new research and not to summarize and
synthesize the state of the art. On the other hand, the business model for traditional books makes them equally unsuitable for the presentation of material that will need updating within a year. Faculty are very aware of this gap since they live with it every day. A strong indicator of their enthusiasm is the number of prominent researchers who have volunteered to author, edit and referee lectures. These are typically people who would not take time from their research to write books but who have seen what a strong contribution a lecture can make to their field. Since most of our content has been published for only a few months we've not yet had much feedback from users on the published lectures other than from usage statistics.
Usage has been growing substantially. For some lectures, we are approaching over 1000 downloads within a few months after publication which is much higher than one would see for a journal article.


Q3. What have been some of the challenges so far, for example, keeping the lectures short, getting good metadata, recruiting authors?

Well, on the content side, our greatest challenge is getting authors to finish. We and our editors have been very picky about choosing authors and all of our lectures are written by invitation. The positive result is that we have been able to recruit some of the most prominent researchers as authors. The negative result is that these are busy people with the most demands on their time. We act as advocates for their future audience and give them every encouragement (translation: nag, plead, beg) to get the lecture to the top of their stack. Then, once they finish a first draft, the manuscript is reviewed by their peers. Our task is then to get them to put in the additional time to revise.

On the library side, I guess our biggest challenge, which is now beginning to diminish, has been gaining credibility. Librarians haven't seen too many new companies start in the last 10 years and they haven't seen many new original electronic content products, most have been digitization of existing print works or aggregations of the same. Also, they have only seen a few undertakings that were really serious about high quality content. Although Joel and I are well known in engineering and computer science, as professional book publishers we weren't known by many librarians. So, in the beginning, we needed to overcome some skepticism. We had strong early support from a group of visionary libraries and librarians who are actively involved in the engineering library community to whom we owe a great deal.

Now that Synthesis has been licensed by many if not most of the top engineering schools and is beginning to be licensed more broadly this is less of an issue for us, at least in North America.


Q4. What's the future of print books in the computing field?

I think that this is very much dependent on what is available in terms of reading devices, electronic paper and personal printers. The main current advantages of econtent are in distribution, availability, search, linking and multimedia features. However, it seems that many people prefer reading in print, especially longer documents such as books. Once we have higher resolution screens, ergonomically enjoyable portable reading devices or even personal printers that can economically print and bind at the desktop, the preference for buying print books should decline. It's likely that this will happen first in engineering and computer science.


Q5. Who do you see as your main competition at this point? Other ebook providers or free stuff on the web? Wikipedia?

There are three potential fronts for competition for Synthesis: for authors, for library funds and for interest from readers. We don't feel much competition from other publishers for authors and content. Most of our authors are interested in a Synthesis lecture because their topics are moving too quickly or are too narrow for books. Also, since we give our authors the right to reuse their lecture material later in writing a longer book for any publisher, they don't have to choose. Our biggest competition for authors is for their time. For library funds, our competition is increasingly going to be other ebook providers as publishers make more of their lists available. Our challenge will be to convince librarians that the content in Synthesis is unique and valuable and that it is not just another ebook collection comprising digitised traditional print books. In terms of attention from readers, most of our content is unique but they are increasingly overwhelmed by the amount of content available. We will need to work hard at marketing, creating awareness and enabling discovery to compete against an increasing amount of noise. Although I am a big fan of Wikipedia, we don't see much competition from it at the advanced level of our content.


Q6. I had to get one Morgan Kaufmann question in -- in all the time you were at Morgan Kaufmann, what's the one thing you're most proud of? Do you have any regrets?

I am most proud of the community of authors and list of great books that we built. I think we were successful in creating a culture of collaboration and respect for authors that produced some great work. I think it's fair to say that several MK books made substantial contributions to computer science and that most faculty in such areas as computer architecture, databases, computer human interaction, graphics, networking and AI would agree.

My one regret is that we didn't keep MK independent. We merged the company with Academic Press to provide an exit strategy for our investors which was only fair to those who had made MK possible in the first place. Many of the original MK staff, especially in editorial, are still there and continuing a tradition of great publishing. It would be very interesting to be developing Synthesis in that context. On the other hand, if we hadn't merged with AP, Joel Claypool and I might not have developed the working relationship that led to the development of Synthesis and Morgan & Claypool. Ultimately, I think that Synthesis and Morgan & Claypool have the potential to make a much more significant and unique impact for our disciplines.


Q7. Finally, what's the best and worst things about your job these days?

The best thing is to be working closely with authors and librarians to do something so worthwhile. I've always worked closely with authors but it's been very rewarding to discover this new collaborative community of librarians. As a professional book publisher you don't have a community of (non reader) customers that is so engaged and knowledgable. For example, I am writing this from a UK library conference where I have spent pretty much every waking moment of the last three days in conversation with librarians, including on the disco floor until 2:30am this morning.

Frankly, there is not much that is bad. If I had to pick something, it would be the sense of feeling stretched too thin. In the traditional book world, most of the innovation is limited to content and everything else is pretty well established. With Synthesis we think about innovation in content, delivery, user experience, discovery, business models, digital archiving, and the list goes on.

Computers in Libraries: Day 3 afternoon sessions and closing thoughts

Searching, Finding and the Information Professional by Marydee Ojala, Online Magazine.

A good, basic almost presentation. Ojala notes that infoprofessional love searching, understand structure and like to communicate about what makes searching fun. On the other hand, IPs can sometimes forget when to stop searching, information overload can lead to missed items and sometimes searching can take longer because of the proliferation of sources. Clients (and patrons?) however, don't care about searching, they want to find. The don't care about sources, want a quick answer. Findability is the flip side of finding, encompassing search engine optimziation, how a site is architected and optimized. Also, IPs are interested in Premium Content findability, how to surface the good stuff, repuation monitoring, premium content vendors need new models. Web findability, search is pervasive but inconsistent and unstable, it's possible to game the system, there are quality and educational issues. Traditional search techniques include boolean, pearl growing, building blocks; web searching is squishy boolean, algorithms taking over, not quite pure boolean.

Web is personlization (relevant to who you are, great for individuals, not so great for IPs who need general results) , optimization (SEO, white hats & black hats), semantic clusters (contextual searches, related words, Clusty), Automatic Indexing (automate process, indentify standard info, poor contextual, works best with human oversight), metadata (doesn't work to drive web traffic, useful in closed env), different databases (each SE has it's own DB with quirks, need to use multiple engines), invisible web (not as much hidden as before, can now retrieve in multiple formats & older stuff).

Non-traditional, non-textual stuff: finding it is imprecise to say the least: audio, video, images, blogs, groups, SL. Precision & recall are dubious. Display is also difficult, tag clouds.

Worst case for the future: A controled info enb, only see what search engines want you to, ad controled, high price doesn't guarantee quality, industry consolidation diminishes available information, consumerism and entertainment trump research. Best case: intuitive interfaces, info is accessible & available, producers are profitable, searchers are satisfied, searching and finding coalesce.


Innovative Tools for Reference Service by Tomalee Doan and Hal Kirkwood, Management & Economics Library, Purdue University.

Cool presentation about making a difference to students, helping them in virtual spaces and not just in physical spaces. Some innovative ideas here. Create a set of tools to provide better service, go from linear to non-linear provision of service.

First tool: a library tool bar that has a bunch of tools and interactive features and links to subscription databases. Next, a BizMap conceptual mapping of all the various business subjects and how they interrelate. Next BizFaq, an online FAQ.

These tools break away from the reliance on the library web page. MyMEL toolbar created with google search box, drop down has catalogue, news, stocks, federated search, sfx journal finder, worldcat, list of premium dbs, also access to other tools like citation styles, writing lab, university tech lab, career services, rss feeds. Uses 3rd party software: Conduit. With minimal marketing the response has been fantastic, on all dept computers, can get usage stats. Message to students: take us with you.

Also use Footprints IT helpdesk software for tracking online reference transactions.


====

Overall I have to say that this was one terrific conference, with lots of sessions that were interesting conceptually and theoreticall mixed in with sessions with the cool-I-gotta-try-that factor. A good balance. I also found people to be friendly and outgoing, easy to talk to. As usual, it was also nice to connect with some of the vendor reps there, particularly the IEEE, Bowker and Safari Books (hopefully some cool stuff to come out of those interations here). In particular, I'd never touched base directly with anyone from Safari so it was great to be able to give some one-on-one detailed feedback.

Low lights? The wearther: cold and windy. There was one session where the presenter got mixed up between WWI and WWII and got the dates wrong anyway (which war was from 1937-43 again?) -- the funny kind of thing that happens in the heat of a presentation. Another one, annoying rather than amusing, where the librarian presenting made the kind of casual bragging-about-my-innumeracy statement that just drives me crazy and then took an unnecessary cheap shot at supposedly humour-challenged nerds. Imagine if a techie bragged about his/her illiteracy in front of a large audience! There were also a couple of times when library outsiders made supposedly deep criticisms of library systems while apparently never having visited one. And then there's the overcrowding on the main funtion level, with some sessions being vastly over attended (or under spaced, if that's a phrase). I hope that next year they're able to spread the function rooms around a little more, even if it means using another level. But these are minor quibbles that in no way detracted from the overall good vibe and incredible conference experience. I really hope to make it again next year. Even the wait in the airport on the way home was fun, having a nice long chat with fellow Torontonian Amanda.

April 23, 2007

Computers in Libraries: Day 3 morning sessions

The New Library Automation Landscape by Marshall Breeding.

A depressing session, in a way. Breeding talked about the prospects for dramatic innovation among ILS companies in the near future. They don't look good as the vendors don't seem to have much cash to invest. They are caught in the middle, squeezed on both sides unable to satisfy both masters, owners that want higher profits and customers that want transformational innovation.

Breeding first notes that there are not a lot of people studying this issue. Some of the overall business trends in the lib automation industry are: increasingly consolidated, venture capital & private equity investors playing a larger role, decreasingly differentiated systems, narrowing product offerings, open source opportunities on the rise. Other business factors include a level of innovation that's falling below expectations; companies struggle to keep up with ILS enhancement & library desire for innovation; pressure within companies to reduce costs, increase revenues; pressure from libraries to give more, cheaper, faster. The industry has consolidated a lot in the last 30 years, lots of M&A.

Libraries have also consolidated on the demand side: consortia share automation cost to reduce overhead, the need to focus technical talent, pooled resources for processing, single ILS installations becoming less defensible, libraries need to leverage resources like companies.

Why should we worry who owns the industry? Because important decisions are made in the boardroom, probably not with clients at heart. The VS & PE companies are making the decision to maximize their profit so market success and tech innovation don't always drive business decisions.

The business cycle for library tech companies is from founder start up to VC support to board level representation to private equity support to strategic control to IPO and mature company. Each stage represents less input by founder and their vision. Breeding gave SirsiDynix as an example.

What's the impact of the different ownership styles? Long term vs. short term focus; who makes the decisions, ability to understand libraries as businesses/non-profits, balance needs of profit with needs of public sector organizations. What are the revenue sources for ILSs: new ILS sales, support, non-ILS systems, library services. The need to balance these sources to have the capital to invest in innovation. Need to expand activities to support R&D on things like RFID and federated search.

What are the open source alternatives? Are they viable? There's been an explosive interest in o/s alternatives the last year or so, they're emerging as practical alternatives as often the total cost of ownership isn't all that different. It's still risky but holds up Georgia Pines as an example. Using o/s is not cheaper (need more programmers internally, for example) but you do have more control. Libraries are looking for some way of decoupling the opac from the ILS but also making the various components that the libraries offer more integrated (ie. bib databases). It's a vicious cycle: vendors need cash for R&D but libraries don't want to pay more for maintenance or for new "features" that used to be included in the system.

Catalogs/Opacs for the Future. This was a two part session.

Up first, Tim Spalding of Library Thing on The Fun OPAC.

Opacs are broken in 3 fundamental ways: findabililty, usability and searchability. How about fixing them by mixing a little serendipity with a little funability! Let's think about all companies being toy companies, a library system is the "most fun you can have with your pants on!"

Focus on the catalogue, make it front and centre, we currently act like we're ashamed of it, it doesn't have dynamism. Allow inbound links, the catalogue needs persistent links, also link outwards -- the more you link the more people come to you, don't hide the exists. Also, link around, be generous with linking, make everything clickable, names, tags, a page for everything. Dress up the opac with book covers, link to amazon & wikipedia cause patrons are going there anyways. Get data out there,m people will do stuff with it, remix, mashup, analyse, rss feeds. People don't want our content, they want to create their own, create opac blog widgets, get tags, make them available to others, don't accept limitations of library systems.

Next up, Roy Tennent, California Digital Library on Catalogs for the Future.

Tennant began by exhorting us to never use the "O" word again! (uncomfortable laughter). Future? What future? Catalogs ain't got no stinking future! (uncomfortable laughter)

The demise of the local catalogue is inevitable, discovery happens at the network level, even at the local level few want to limit search to books, new finding tools will make catalogs obsolete, but libraries need their back office ILSs.

The classic ILS opac is a deeply integrated silo. The new world order is discovery decoupled from ILS, Google/Open worldcat/Primo, even WorldCat local. Google to Open WC to WC Local which has circ status. This makes sense because users want to find everything they can and they prefer to search in one spot, most ILSs lack cool new features. Some interesting features of OpenWC: faceted browsing, relevance, articles integrated, WC indentities, fiction finder, tags like UPenn library.

The next generation ILS will be refocused on getting work done, more interoperability, able to work well with other systems, expose APIs to the network. Next Gen finding tools: integrate access to a wide variety of finding tools, use info from other sources using APIs, sophisticated features like relevance ranking, will not simply be a library catalogue.

Computers in Libraries: Day 3 keynote

World Digital Library Initiative by John Van Oudenaren, Library of Congress.

A real vision for the future of the web, create a dl of significant original cultural materials from around the world. And make it accessible to everyone in all the official languages of the UN (plus Portuguese -- Brasil is a partner). Daring and ambitious, a truly worthy project, this is one worth watching over the next number of years to see how it fares, if it gets properly funded and if they can make a real stab at the logisitical nightmare of gathering, translating and making accessible all that material.

Obejctives are to promote intercultural awareness & understanding, create resources for educators & general public for a global, wireless world and to acquire rare and unique materials. The LOC's partners are Unesco, bilateral agreements with Russia, Brazil, Egypt, the tech community -- google, yahoo, apple, Stanford. Hope to launch Sept 2008.

And they're hoping not to be just a big website. Their three pillars are content acquisition, especially in developing countries where there are no digitization projects and in languages other than English; they also want to create a sustainable networlk for production & distribution and, of course, the web site itself: www.worlddigitallibrary.org.

Content acquisition: a key objectivve is to work with partners to digitize material at four centers in Cairo, Rio, Moscow and St. Petersburg and one mobile scanner at Novosibivisk (sic?). Bring light to hidden treasures, establish editorial scanning operations, pusue other methods to acquire materials including repurposing stuff already scanned.

Construction of a sustainable network: need to get buy in from partners and others, distributed centres most economical, combine machine translation with human translation, each network node can create, catalogue, translate & develope editorial and educational content.

The web site: key objective is to present cultural content in a way that appeals to young users, to have good search and discovery tools. A prototype is under dev, multilingual: English, Arabic, Chinese, French, Russian, Spanish (UN Lnaguages) and Portuguese. A high quality user experience, fast & seemless able to search & browse a large volume of material. It will be multiformat: manuscripts, maps, audio, video, with special features with subject experts and educational content for teachers. There will be social networking features like tagging, comments and a "myWDL" but they will keep a strict control on the content itself, it will all be curated. It will also be designed to accomodate developing world condition like low bandwidth.

The presentation ended with a really nice video showing the vision and some features -- this is an incredibly ambitious project. Take a look at the video, it's amazing.

Computers in Libraries: Day 2 afternoon sessions

Innovative Libraries: Best Practices and Tales from the Stacks by Jill Hurst-Wahl, Hurst Associates and Christina K. Pikas, Johns Hopkins Applied Physics Laboratory.

It's always nice to see a blog buddy do good work and win a measure of fame and (ok, not) fortune as a result. This was a truly great presentation, something Christina and Jill can be truly proud, particularly for bringing to light the kinds of less-hearalded but no less remarkable innovations and strategies that don't normally get a lot of buzz at conferences.

They started off the presentation noting that they had used web 2.0 tools to work on the proect and this presentation: IM, Google Docs, Skype and Zoho (and I hear that Google Docs is coming out with presentation software soon too...).

The background is that a failure to innovate for libraries is not an option, adapting to the new research culture, especially of millennials. In 2007, what can we learn from library leaders, get some concrete examples of how to innovate. Why is this research important: learn to manage innovation, cope with the pressure to innovate and to retain & motivate staff & secure funding. They used a qualitative methodology for the study, to find ways to learn from people that don't self-promote. The limitations were mostly of time, also worth noting is that the interviews were not recorded, only notes were taken. They interviewed the managers of 1 special, 2 acad, 2 school and 3 publics, trying to get diversity within the groups.

They noted the manager personality -- persistent in dealing with non-goal oriented, non-team, non-leadership staff members, proud of their staff and their accomplishments, somewhat considered renegades, they also considered themselves lucky, not really recognizing the degree to which they created their own luck by hiring good people and learning to work around the uncooperative. There was also a diversity of funding: some had good cash flows some not so good. In both cases, the situation actually motivated and pushed forward innovation. They also used a variety of formal & informal ways to push their plans forward, brainstorming for success. Formally, they all had the idea that everyone has an expectation to serve and they always recognized good work and innovative ideas.

Informally, they always gave their staff the freedom to play, to try things out without worrying about failure, to look to the business world/literature for new models, to look at other departments, to learn from customers to go to non-library conferences. And to take this innovative spirit to all parts of the organization, even to the shelving people to find a way to make their jobs less physically taxing. These quiet innovators live the innovative life, rarely reporting at conferences or in the literature, they don't see themselves as innovative only as doing their best in a complex environment. The play an entrepreneurial role, trying things as pilot projects, they want the library to be seen as a source for information, as experts in research.

The attitude is that there are no failures, sometimes you try things too early in their life cycle, you have unexpected consequences, you fail to get key buy-in. You try things knowing that some of them will break. The staff structure you need to do this is very self-motivated, the kind of people that you can do more with less. Mentoring and coaching are also important, as is giving something akin to Google's 20% time to work on new projects.

Conclusions? Motivation is important, finding the staff who can and will do the job. So is organizational atmosphere, everybody looking for new ideas. Emphasize training and other prof development venues. Advice? Embrace tech, have courage going forward, encourage everyone to go to training and conferences, reward staff for professional development efforts, focus on user needs.


The Social Web: On the Importance of Happy Robots by Jesse Andrews, CommerceNet & Book Burro.

A weird little presentation, but kinda fun nevertheless. Certainly the only with massive amounts of code flashing across the screen. Not a problem for me, but I did see a few people scurrying towards the door when the javascript started to fly.

So, how to get all the social software tools on the web to interoperate, to work together and pass information? You have delicious for bookmarks, flickr for photos, librarything for books, wikipedia for information, twitter & jabber for messaging, wordpress for blogging -- how to mashup these things and get them to be part of your online life.

Problem: I'm looking for a book but I don't want to visit a bunch of different sites, both bookstores and libraries. Use robots! Software robots, that is. Commercial interfaces, like Amazon, have good robot interfaces (APIs). Local bookstores, WorldCat and many libraries have bad ones. For example, how to build an IM bot for Amazon? Use twitter/Jabber to ask the library if a book is available and get the answer back. Andrews showed many examples of how to build chains of social tools, mostly using twitter as the glue, to do interesting, real world things. Check him out at Book Burro and Overstimulate.

April 21, 2007

Computers in Libraries: Day 2 morning sessions

Using a CMS to Build Community: Rhumba with Joomla by Tao Gao and Catherine Morgan, South Carolina State Library.

This was a great session with lots of great ideas on how to use a CMS to build community among patrons. The idea is that the interactivity and customizability that you can build into your system with a CMS will greatly increase user buy-in to your set and get them visiting and contributing. The CMS the South Carolina State Library used was Joomla, an open source product. The advantages to Joomla are easy of set up, use and manage; separation of content and form; extendable and open source with a strong support community. It's also a very popular CMS package these days.

There were several reasons why the SCSL wanted to redesign their website including: static html, table based layout, only some basic perl, outdated content, incoherent navigation and others. They wanted standards compliant design, intuitive navigation, separation of form and content, staff collaboration, site search, online job board, community oriented and rss feeds. The first phase was a plan. They conducted a review of current content and performed an online survey. They used survemonkey. They got people to review subject pages and got together a qualified and dedicated team. In the design phase they did an agency rebranding and an interface design and review. They learned that you have to get a graphic designer who understands the core mission of the project and that you need a good project manager.

The development phase explored CMS options and dealt with the Joomla learning curve, they also did a content audit, review and migration. They identified and incorporated desired functionality. The deployment and evolution phase included going live. They were able to evaluate the project, plan for growth and refinement and discovered that few staff reviewed the new site before it went online. The reception was great.

The demo of the web site was very good, we got to see a lot of the great new features and see how good a tool Joomla is for building community systems. Active vs. static content, adding content with a simple wsysiwyg editor, listening to user input, training users were all part of the process. Joomla was a great tool that allowed a lot of social software features such as member profiles, discussion forums, calendars, rss, blogs, groups, photo galleries, book reviews, tag clouds and others.


Project Planning the 2.0 Way by Nicole Engard, Jenkins Law Library.

This was another inspirational presentation, showing how social software tools can be used to plan and manage internal projects, the example being a law library. Old school project planning meant lots of meetings, task lists, tons of documents flying around via email, phone tag, and filled up emain inboxes. With email lists being a bit inconsistent at times, it's always hard to know who's in the loop at any given time. You end up wasting time constantly worried about document versions and getting everyone up to speed. Ultimately, all the project status, details and history are buried in long forgotten emails and word docs no one can find anymore.

So, try a blog. At the library in question, each project has a blog, every staff member can see everything and contribute and there are many fewer emails going around. Staff feel included and the project blogs are the favourite part of the intranet. Email clutter has been cut down and everything for a project is in one place. When a project is completed, the blog is put in an accessible archive part of the intranet.

Project documents are part of a wiki, which is great becuase it's full text searchable, has history and is visible to staff. Wikis + Blogs = staff engagement.

Even the IT department is able to document their systems using a wiki. The intranet uses blogs and wikis to have to do lists, projects, calendars, wish lists. All parts of the intranet have comment buttons where users can send messages to the web team about problems or issues. People can put email watches on posts or pages so that they are notified of changes for projects that are important to them. With everyone able to contribute, there is a high level of trust about projects.

April 20, 2007

Computers in Libraries: Day 2 Keynote

Using Social Media for Community Engagement by Andy Carvin, National Public Radio.

For a last minute replacement for Elizabeth Lane Lawley, Andy Carvin did a pretty good job of putting together a presentation. And a timely one it is. His theme is how do we use social media to engage the public and get them to join in and contribute to our networks, using a recent redesign of the National Public Radio site as an example. In the traditional media, you had to be part of the media establishment to be able to contribute to it, there were high barriers to entry; this was also the case for the beginnings of the web. It was more democratic, but you still needed fairly high levels of technical competance to author anything. In the Web 2.0 world, there are tools that simplify content creation, that actively encourage people to share, where youtube videos can actually have an impact on US political culture.

And there's been a dramatic shift in the demographics of content creation, not just the kids are doing it anymore. Blogs are the most prominent example, having sparked a kind of war of attrition between the mainstream media and bloggers. The MSM hates the blogger (because you can't trust citizen journalists) and the bloggers hate the MSM (for pandering to the LCD), whereas in relity there's a kind of detente, a truce, a collaborative spirit that recognized that both are needed.

But why are the media embracing web 2.0? It improves transparency and creates a public dialogue. Carven gives examples of NPR shows that have used a kind of open piloting to get public feedback in early stages, he also mentions Radio Open Source, a blog with a radio show. Other examples include the BBC's Have Your Say, CNN's iReport, the USA Today site, OhMyNews.com from South Korea which uses blogging stringers to get coverage from around the world, Global Voices from Harvard and others. He notes that no single entity any longer has a monopoly on knowledge, that the democratization has only just begun.

Computers in Libraries: Day 1 Afternoon part 2

Building an Online Virtual Community by Mark Puterbaugh, Eastern University.

This was a pretty cool presentation about EU's attempts to create a virtual world using the ActiveWorlds system, a virtual world where the library could build it's own virtual pathfinder, a virtual information commons, into the world of information. What was interesting was that they allowed some students (not just anyone, I think, as that might make the space too chaotic) to build their own buildings and create their own pathways to information. One that they demoed was on education and one was on ancient Rome. The virtual spaces were interesting in that they combined elements of social interaction, learning and a kind of playful weirdness with blimps and birds and even what seemed to be a student being burned at the stake.

Now how did they get the students to build their parts of the world? Well, they seemed to have relied on faculty to help with that as well as some judicious begging (and perhaps bribery?). It will be very interesing to see if they will be able to attract students to using this virtual world for educational uses, to go to the appropriate rooms and link to databases etc, or if they'll ignore it or mostly see it as an amusing goof, or somewhere in between -- either way, it's an important experiement.

Social Bookmarking and Folksonomies. This was a two-presenter session.

First up was The Hive Mind: Folksonomies and User-Based Tagging by Ellyssa Kroski with a very good general introduction to tagging. Nothing new here for me (or, I assume a good chunk of the audience) but it was very well done; sometimes it's nice to get a good overview of even a familiar topic, it seems to just solidify stuff in the mind. I particularly liked her presentation of the advantages and disadvantages of tagging, although understandably she seemed to want to turn disadvantages into back-handed advantages and might have been a little more balanced in presenting some downsides of the advantages too. The link above is basically the complete presentation, so I won't summarize more.

Next up was Rob Cagna of University of Pennsylvania to present on their new PennTags (http://tags.library.upenn.edu) system where users can tag catalogue records. It was developed ot overcome deficiences in browser bookmarks, such as non-portability. It lets users bookmark and annotate catalogue records and even articles in databases, creating a process and community of bookmarkers. Librarians/faculty can use PennTags as a new book list, subject pathfinders, reading lists. Project pages can be used for departmental home pages. Each user has their own PennTags page where all their content is aggregated. This was a really exciting presentation, and a great inspiration for the kind of things we can do on top of the products our vendors give us.

Friday Post

No fun today. As many of you know, I have a lot of connections to the science fiction community and I thought I would remember Jamie Bishop, the son of author Michael Bishop (wikipedia) who died tragically at Virginia Tech this week. In retrospect, it's a bit surreal that I was in Virginia as it was happening; conferences can be oddly cocooning experiences.

I would like to extend my condolences to the Bishop family and friends as well as to all those who lost someone. I've never met Michael or Jamie, although I did exchange a few postcards with Michael about 15 years ago when I bought a few signed books directly from him. Michael's long been one of my favourite authors with Brittle Innings one of my very favourite novels. It's perhaps the greatest Frankenstein's-monster-baseball novels of all time.

Computers in Libraries: Day 1 Afternoon part 1

Me, MySpace & Eye: Privacy, Security, Social Networking and Libraries by Alane Wilson, OCLC.

This was an interesting session where OCLC's Wilson presented some statistical data from an ongoing project to study the privacy implications of social software, comparing in many cases the reactions of librarians to questions to those of young people. She noted that the statistics show that the two groups have very different definitions of privacy and that privacy in the new age often means anonymity. One interesting paradox of privacy that she noted was that we often think other people should reveal more than we do ourselves, so that we can know about them in way that makes us feel better without having to divulge similar information about ourselves. I look forward to when the full report is published so we can see the statistical analysis in full and not have to squint at teeny tiny charts on ppt slides.

Millennials and the Library by Marshall Breeding, Director for Innovative Technologies and Research for the Jean and Alexander Heard Library at Vanderbilt University.

A very interesting and important presentation, highlighting the supposed radical generational change presented by millennials and the challenges they present to libraries and librarians in devising systems, collections and services. A lot could be debated here, a lot of assumptions that may or may not be valid, a lot of sweeping statements.

Breeding began by checking to see what cohort was the most numerous in the audience and Gen X & Boomers were the most numerous, yet the majority of the academic library population is millennials. Millennials can be characterized by an almost inate ability for technology, multitasking, a comfort with diverse digital media and a love of interactivity. Breeding did caution us not to overgeneralize generational differences (although, he proceded to overgeneralize generational differences non-stop for the rest of the presentation), especially in light of the growing tech sophistication of older generations.

He cited the Forrester study noting that millennials are creative, impatient skeptical, not impressed with status or auhtority, like to process information immediately and visually and like group work, they like to construct knowledge from experience, the old "sage on the stage" method of teaching is no longer as valid for them. Their approach to studying and learning is multitasking, doing many things while studying, like to be able to access stuff anytime, anywhere.

Now, the needs of the millennials do not conflict with older generations and are in tune with the strategic direction of most libraries anyways. The future of libraries is at stake and doing nothing is not an option.

Our collections must diversify in media, print is still good, but graphics are better, they love to use A/V materials and remix them. Our collections must embrace ejournals, ebooks, podcasts, video, news archives, datasets. These collections are our best opportunity to have an impact, but we must promote & provide access in an immediate, collaborative and intuitive way. Commercial sites like Google have heightened user expectations for a generation with good web skills, low tolerance for clunkyness, confident in their ability and reluctant to ask for help because they can always find something.

The status quo is not an option. The look and feel of our systems do not meet expectations, require the use of myriad interfaces, are overly complex, unintuitive and have different kinds of things in different places. The current tools we use, such as the opac, aggregators, openurl are loosely coupled. The distributed query model of meta/federated search is problematic. But change is underway: there are lots of people thinking about it, dissatisfied with current opacs, that want to break from the current mold, decouple the opac from the backoffice. The next generation of systems will have a more comprehensive discovery interface, better delivery tools, more powerful search and more elegant presentation. A comprehensive search service, get rid of product/discipline silos, cooperate better.

Web 2.0 is a good start, social and collaborative, blogs, wikis tagging, bookmarking, ranking, web services, xml, apis, ajax, microformats, opensearch. We need new opac tools with decoupled interfaces, more expected of catalogue data, alternative search engines, expanded discovery tools with comprehensive interfaces. Redefine the catalogue, challenge traditional notions, digital resources cannot be afterthought, don't force myriad interfaces. The web is the standard interface, rapid response, rich visuals, drill down results, faceted browsing, bread crumbs, ratings & rankings. Doscovery is important, we need new models, to take advantage of non-library search tools, like google, google scholar, wikipedia and others. Global discovery of local resources. We need to welcome the millennial generation.

April 17, 2007

Computers in Libraries: Day 1 Morning sessions

Library 2.0: Building Communities, Connections and Strategies by Ken Roberts, Hamilton Public Library.

It was a little odd to come so far to hear such an inspirational story from my own back yard. Hamilton is just a little west of Toronto. When the city was amalgamated by the provincial government a few years ago, it presented problems and opportunities for the library. The highlighted project, a community portal for the city, had many community partners including other libraries in the area and mostly community organizations. The goal was to make the library the information destination for community information, to generate traffic and interest for those organizations. Kind of a Google for Hamilton.

The idea was to integrate with municipal services, using a CMS, rss and an events database. There were risks, for example getting uptake from community organizations to list their events. It is a success -- 70-80% of all library visits are online. The pathfinders and other subject guides are also online, there are also online bookclubs. One of the most important partner was the CFL team Hamilton Tiger Cats, they gave away 22K football tickets, they also used billboard advertisements.

The most important aspect is that this project represented partnership at it's best; if your library is a good partner in these things, it's reputation will precede it. You have to recognize different partner organizational cultures, to commit to common goals; you also have to find a sustainable model for these partnerships, a way to go forward on tight resources and differing levels of commitment. There are also challenges: some partners have more clout than other, like the city government; a changing, evolving environment, sustaining what's built and find a way to move forward.

What you need to succeed: build trust and respect, have shared values, engage the right people, start small and thing big, integrate with your core mission, test test test, celebrate success and excuse mistakes. And what's next: get into Second Life, wifi, changing roles for librarians, IM reference.


Social Search Engines had two speakers. Gary Price presented on Issues to Consider in Social Search. One important issue is when you build social search systems you have to make sure that a significant number of people participate, to create the social connections worth searching. The have to contribute consistently over a long period of time. The dark side of social search is the danger of spam, commercial organizations gaming the system and the quality of tags that users create. And how new is the idea of social search anyways, for example librarians have long used tools like the Librarians Index to the Internet. Other examples of social search engines are custom search engines, questions answering services and services like Intute and Globaledge. The rest of Price's presentation was mostly a laundry list of various social search tools, which are easy to check from the link above. A good presentation, covering a lot of ground in a short period of time.

Next up was Steve Mansfield on Humanizing Search: Internet Searching...Evolved. Mansfield is the CEO of social search company Prefound, so you have to take his comments in that context.

So, what is social search: it's answer-bound, social bookmarking, people search, directory search, personal page search. A key point is that from PreFound's perspective, the definition of social search is still open because there is no dominant, big player to get in the public consciousness first, to define social search the way Google defined web search. He hopes that social search will be more than just social bookmarking sites, that it will encompass ranking of links, relating links together in types of bundles, that is will reduce search chaff by introducing an element of human judgment. But how to get that human element into the search process? For example, humans can relate text, image, audio and video together in way automated crawlers cannot.

Real social search is ranked, links to a group of human created related links. Issues include spamming & gaming of tags and we have to develop technology to overcome this. The demo of the PreFound product was pretty interesting, with a lot of potential both as a individual product and as a representative of a product class. The main challenge, to me, seems to be to find a way to get enough people to convert their own personal leisure and procrastination time to building the human connections to make social search work, or to find a business model that will allow them to pay people to do it.

Computers in Libraries: Day 1 Opening Keynote

Opening keynote: Web 2.0 & the Internet World by Lee Rainie.

Rainie is the director of the Pew Internet & American Life project; he talked about the information they have been gathering about the use of social software amongst internet users, both young and not-so-young.

An interesting point that he made is that data is the new "intel inside" of the internet, in other words, the thing that supercharges the applications and services. The web is a platform, harnessing collective intelligence, software as beyond the level of the individual device, rich and free user experiences. This was a very interesting way to start off the conference, getting us to think about how the millennial generation and others are using the social web, how it's evolving and changing, how we're changing it and it's changing us.

The 6 hallmarks of Web 2.0 for libraries:


  1. The internet has become the computer: rise of broadband and wireless which makes the internet a richer, more social destination
  2. Millions of people creating sharing content: the social web as switchboard for teen and other social life
  3. More people are accessing content created by others
  4. People sharing what they know and feel: sites like Ratemyprofessors and Amazon book reviews
  5. People contributing knowledge and processing power to the world: peer to peer, open source, projects like SETI@Home
  6. People customizing content: customizing news sources with rss.

The 5 issues libraries and all other participants in the social web must address:

  1. Navigation must go from linear to non-linear, need features like breadcrumbs
  2. Context, learning to see connections
  3. Focus: find a way to continue practicing reflection & deep thinking in a multitasking world, find a way to manage continuous partial attention and still be creative and profound
  4. Skepticism: learning to evaluate information
  5. Ethical behaviour: understanding the evolving rules of cyberspace, privacy vs. disclosure, private needs colonizing the public sphere, public spaces

April 16, 2007

Computers in Libraries: first impressions

Frankly, it feels a bit weird to be in Virginia today, and I'm not really in the mood for an extensive report on day one. Probably tomorrow. Luckily my hotel has free internet access in the rooms. I've never been one for liveblogging myself, so I'll just be doing daily summary posts.

My first impression is that this is a really good conference, with a lot of great sessions. All the programming is on one level of the conference hotel, which is great. However, given that there are over 2000 attendees, it's just plain too crowded. When a session ends and everyone is switching rooms, there is literally human gridlock in the hallways. The exhibition hall was also quite crowded this evening at the opening reception, but that was probably just do to the fact that there was free food and drink available. It'll probably be a lot better later on. I've already touched base with the fine folk at the IEEE (Hi Mike! Nice to see you again.) and Safari, two of my favourite products.

I must say that I also appreciate the 45 minute sessions. I often think that 75 or 90 minute sessions are too long, both for the presenter and attendees, and that there's always a bit of down time for both in the middle. Forty-five seems like a good balance--enough time to really get an idea across but not so much to be taxing. It also means that you can fit 5 or 6 good sessions (plus a keynote) into each day. Given how hard it is to retain a lot of detail on a bunch of sessions squeezed into a few short days, getting a somewhat shallower view of more interesting ideas is probably a better idea.

Finally, I'd like to note that it's really cool to here the tap tap tap of so many laptops at the back of the room during the session, not to mention the funny little windows startup song. Bloggers are here in force, as can be expected, and I'm looking forward to catching up on the sessions I couldn't attend, (Hi Christina!) Cellphones ringing? Not so cool, although there have been very few so far.

Check out the rest of the coverage here.

April 13, 2007

Vise, David A. and Mark Malseed. The Google story. New York: Delta, 2006. Updated Edition. 326pp.

Ah, Google. The 800 pound gorilla. The elephant in the room. The bull in the china shop. Really, the kings of the online world. And to think, just a few short years ago, nobody had ever heard of them. Myself, I remember starting to use Google in 1999 or so, when the buzz around library school was this cool new search engine that had way better relevance ranking and a sparse, clean design. By 2000 or 2001 I remember thinking to myself that their product was so impressive that they must have been a huge, thousand employee megacorporation. Little did I know that for most of those early years, Google was still a small, intimate, human-scaled company that very much reflected its founders, Stanford PhD students Sergey Brin and Larry Page.

Google has been around so long, at least in Internet time, and has been so prominent and omnipresent in the media with so much detailed reporting on blogs and in newspapers and magazines, that I tend to think I know the story. But do I? Are there things that I don't know about the giant? As it turns out, yes, there were a lot of things I didn't know about Google and Vise & Malseed's book does a great job of filling in the blanks. And a lot of aspects of Google's story, both bits I knew and bits I didn't, have significant lessons for the library world.

Of course, this is really a business book, not a tech book or a history of science book or even a library science book, so how did I end up with it on my sabbatical science book reading list? I remember when the book came out in hardcover there was quite a bit of press and I thought I would probably want to read it eventually, it and The search : the inside story of how Google and its rivals changed everything by John Battelle. Although the business library ordered it, I never got around to checking it out, figuring I would just buy the paperback when it came out. So, it comes out in paperback last fall, but since I never check the business section of the book store I never noticed. A few weeks ago, while we were at the airport waiting for our flight to New York for our March Break trip, I was browsing at the airport bookstore. Now, airport bookstores are pretty small; they also cater to business travelers more than regular folk so the pb version of The Google Story was fairly prominently displayed at the front of the store. And I bought it and read most of it during the trip. Which makes me wonder, doesn't classification sometimes make it harder rather than easier to find something? And sometimes, a small, focused collection can lead to more serendipity that a big huge comprehensive collection. See, even how I found the book and ended up reading it have a lesson.

But, enough of the chatter. How's the book itself? Is it worth reading? Like I said, I would like to concentrate on the parts of the Google story that were interesting or new to me and how I think those apply to the Library world.

The first thing that really struck me (in chapter 3) was that in the early years a number of companies had a chance to license Google technology and passed it up. Altavista wanted a home-grown search solution while Yahoo! wanted people to stay on their own site rather than searching and leaving. The didn't realize how important search was, so they missed a golden opportunity. Chapter 8 goes into that idea in more detail, how most people in the business world really discounted the importance of search, thinking it was a nice add-on to other core products and services. It was the Google guys that really say the truth here and stuck it out. What things are libraries missing out on due to shortsightedness?

Another thing that really struck me in chapter 8 was the internal struggle as Google experienced explosive growth, to keep the edge and innovative spirit while somehow learning to run the company professionally and keep an eye on organizational issues. It was this 2000-2001 time frame that I mention above.

The next thing to really strike me was the AskJeeves story in chapter 11. Google licensed its ad relevancy software to AskJeeves, to the immense benefit of both companies. I thought this was interesting because two companies that you would think were rivals competing for the same search eyeballs somehow found a way to collaborate and make each of their slices of the pie bigger -- including making a bigger pie. A lesson here for us all -- cooperate and grow or compete and die? Who do we think are our enemies that should be our friends? Google?

Chapter 12 was a big one for me -- where the authors talk about Google's 20% rule. Every employee gets to work one day a week on blue sky projects, things outside the box, the stuff we see in Google Labs. Ultimately, people with ideas have to find others to work on them and to make a case for using more resources than just the 20% time to get the product out the door. But still, the culture of innovation this kind of idea fosters is amazing. Lessons? You bet. Top to bottom, if we want to succeed everyone has to think about innovation and get heard by administrators. Chapter 18 talks about the idea of having a corporate executive chef to make everyone's meals for them. Just creating a environment that's conducive to innovation, no matter what it takes.

On the other hand, a little misinformation is never a bad thing in a business book, especially one on a company with such overpowering ideals. On page 134 talking about the Google News service, the authors quote an engineer that mentions that before Google News, journalists had no way of searching other news sources for information. Of course, we librarians know this is hogwash. Lexis Nexis and its kind aren't free like Google but any journalist working for even a decent sized paper would have had access to it. Sometimes Google would like you to think that only it can provide good information, but sometimes they "ignore" inconvenient truths. Other imperfections that do get some coverage include privacy concerns with advertising in Gmail, censorship of information flowing into China and Google's role in that, some bumps in the road when they went public perhaps betraying an unhealthy arrogance on the part of Brin and Page, the cutthroat nature of the battle with Microsoft, another whiff of arrogance when they talk about Google's role in getting genomic data freely available.

Overall, though, I have to say that this is quite a good book, written in a breezy, journalistic style. Google's story is intimately connected to the story of the early part of the 21st century and we ignore its lessons at our peril: everything is driven by a crazy, intense level of nonstop innovation; search is king; connections between data points can be as important as the data itself, if not more important; share the wealth. Google is a reality, we have to deal with it's implications on our work and personal lives. Its impact is vastly for the better, but that doesn't mean we shouldn't keep an eye on them. Their pride and arrogance can lead to a fall -- putting all our eggs in one basket could be risky.

Friday Fun: Computers in Libraries Edition

In honour of my attendance at next week's Computers in Libraries conference, I though a post making fun of computers would be appropriate. While, this post isn't actually making fun of computers, it does make fun of our imaginary friends in Hollywood and what they wish computers could do, or at least are too ignorant to make sure of first.

Via BoingBoing, the Programming Blog has a post Things Computers Can Do in Movies.

A couple for your reading pleasure:

7. Note: Command line interfaces will give you access to any information you want by simply typing, “ACCESS THE SECRET FILES” on any near-by keyboard.

8. You can also infect a computer with a destructive virus by simply typing “UPLOAD VIRUS”. (See “Fortress”.)

9. All computers are connected. You can access the information on the villain’s desktop computer even if it’s turned off.

And speaking of CiL, I'll be there next week (my first one! YAY!). I'm not organized enough to predict the sessions that will catch my interest, but I can't imagine the conference is so big that if anyone wants to track me down, they won't be able to. Email me (I'm leaving Sunday aft, hopefully to the hotel by 6 or 7.) or just hunt me down. Let's have breakfast/lunch/dinner/drinks/coffee. I'm staying the the Radisson Reagan National Hotel.

I will be posting session summaries, but whether or not I do so from the conference depends mostly on how keen I'm feeling in the evenings to hunt down free wireless access (my hotel doesn't mention it) or go down to the hotel business center.

The two weeks in carnivals

A busy two weeks:

April 12, 2007

Is there a future for bibliographic databases?

As promised, I'm reposting here the full text of the guest post I did on Michael Cairns' PersonaNonData blog. I'd like to thank Michael again for the opportunity, one that I think turned out pretty well for both of us.

The post itself turned out to be fairly popular (certainly one of the most widely linked posts I've ever done) and was linked to from several places:



Here goes:


A week or so ago, Michael asked me to do a guest post here on Persona Non Data about bibliographic databases, based on some of the speculations I've made on my own blog, Confessions of a Science Librarian, about the future of Abstracting and Indexing databases.

Here's how he put it in his email:
I have read your posts on the future of information databases and bibliographies etc. over the past several months and I was wondering whether you had a specific opinion of the future of bibliographic databases such as worldcat and booksinprint? ... [O]n my blog I have skirted around the idea that the basic logic of these types of databases is beginning to erode as base level metadata is more readily available and of sufficient quality to reduce the need for these types of bibliographic databases. Assuming that is increasingly the case then these providers need to determine new value propositions for their customers. So what are they?


How could I resist? I'm not sure if I exactly answer his questions or even talked about what he'd hoped I'd talk about, but at least I've probably provoked a few more questions.

In my blog post on the future of A&I databases, I basically came to the conclusion that in the face of competition from Google Scholar and its ilk, the traditional Abstracting & Indexing databases would be increasingly hard-pressed to make a case for their usefulness to academic institutions. Students want ease of use, they concentrate on what's "good enough" not what's perfect. Over time, academic libraries will find it harder and harder to justify spending loads of money on search and discovery tools when plenty of free alternatives exist. Unless, of course, the vendors can find some way to add enough value to the data to make themselves indispensable. I used SciFinder Scholar as an example of a tool that adds a lot of value to data. I think we'll definitely start to see this transition from fee to free in the next 10 years, with considerable acceleration after that.

Now, I didn't really talk about bibliographic/collections tools like Books in Print (BiP), WorldCat (WC), Ulrich's or the Serials Directory (SD). Why not? I think it's because those tools are aimed at experts, not end users. Professionals, not civilians. Surely if a freshman only wants a couple of quick articles to quote for a paper due in a couple of hours, then we librarians and publishing professionals are looking for good, solid, quality information and we're willing to pay for it. This distinction would seem to me to be quite important, leading to quite a different kind of analysis, one I wasn't really aiming at originally. So, I didn't really think about it at the time.

So, now it's time to put the thinking cap back on and see what my crystal ball tells me.

In my professional work as a collections librarian, I am a frequent user of all the tools I mention above. I think that BiP is the one I use the most. Over the last 5 or 6 years I've built up a specialized engineering collection mostly from scratch so I've needed a lot of help and BiP has been an enormously useful tool. I use keyword searches. I also use the subject links on the item records a lot to take me to lists of similar books.

WC I use less frequently, mostly only when I want to look beyond books that are in print and want to identify older and rarer items that I'll end up having to get on the used book market. I've used this to build up various aspects of our Science and Technology Studies collection on topics like women in science. On the other hand, WC seems to have already found a big part of its value proposition with non-experts. Look at it's partnership with Google Book Search. Also look at the really innovative things it's doing with products like WorldCat Identities. It's not perfect by any means but you can see the innovative spirit working.

Ulrich's and SD I mostly use to identify pricing issues for journals I might want to subscribe to, so I don't use them that often. With the ease of finding journal homepages, this function is probably falling fast in it's uses. As for identifying the journals in a particular subject area, that's still a useful function but I wonder what the future is if that's all they offer.

For our purposes here, I'll concentrate on the one I use most: BiP. I presume a lot of what I have to say will also more or less apply to the other specialized tools aimed at pros.

So, I definitely need quality information on books to do my job, now and in the future. But if I need quality information, what will the source be? Although of course I use BiP, I also use Amazon quite a lot to find information on books I want to order; the features that they have that I like best and use most come out of the kind of data mining they can do with their ordering and access logs. When I'm looking at an interesting item, Amazon can quickly tell me what other books are similar, what other books people that have purchased the one I'm looking at have also purchased. I find this to be an extremely important tool for finding books, a great time saver and an incredibly accurate way of finding relevant items. Also, when I search Amazon, I'm actually searching the full text of a lot of books in their database. This feature gets me inside books and unleashes their contents in a way that can't be duplicated by being able to view or even search tables of contents. I also very much like the user-generated lists and reviews. On more than one occasion I've appreciated multiple user reviews of highly technical books, especially when there are negative reviews to warn me away from bad ones. The "Listmania" and "So you'd like to.." lists are great sources of recommendations. On the other hand, it has some significant problems that keep me from going to it exclusively. For example, most any search returns reams of irrelevant hits. The subject classifications that Amazon displays at the bottom of the page I also find next to useless as they are often far too broad.

For BiP, the features I appreciate the most, the ones that draw me back from Amazon, include very good linkable subject classification and good coverage of non-US imprints. When I do keyword searches, the results seem more focused and less cluttered with irrelevant items. I also like that it gives me very complete bibliographic information, including at least part of a call number. While Amazon isn't geared to let you mark then print out a bunch of items (why would they want you to be able to do this?), I appreciate being able to generate lists and print them out using BiP. On the other hand, BiP has been slow to make their interface as quick and easy to use as Google or Amazon, to make use of the tons of data they have, to mine it to find connections, to harness user input and reviews in a massive way to compete with the Amazon juggernaut. When for-fee is competing with for-free, the one that costs money has to be very clearly the best.

Another threat to BiP is Google Book Search. As I've recounted in a story on my blog, Google Book Search in an incredible tool for research, reference and even collections. Once again, the ability to search the entire text of books is an incredible tool for revealing what they're really about, to surface them and make me want to buy them. As Cory Doctorow has said, the greatest enemy of authors (and publishers) is not piracy, it's obscurity. Google Book Search is an amazing tool for a book to get known and,ultimately, to get bought. As more and more publishers realize this (and even book publishers are smart enough to realize this eventually), they'll make darn sure all their new books are full text searchable by Google (and, presumably, Amazon and others). How can BiP compete with that?

I think it's safe to say, it wouldn't take much for me to completely abandon the use of BiP and only use free tools such as Amazon and Google. What could BiP do to keep in the game? What is their value proposition for me? What is the value proposition for all bibliographic tools hoping to market themselves to library professionals now and in the future?

Some issues I've been thinking about.

  • The changing nature of publishing. What's a book? What's a journal? What does "in print" mean? Print journals vs. online? Ebooks vs. paper books? Fee vs. Free. Open Access publishing. Wikis. Blogs. To say that bibliographic databases have to be ahead of the curve on all the revolutionary changes going on today in publishing is an understatement. Look at all the trouble newspapers are in, the trouble they're having adjusting to a new business model. Well, the book world is changing as well, especially for academic customers. The needs of academic users are quite different from regular users. They don't necessarily need to read an entire book, just key sections. Search and discovery are incredibly important to these users, almost more important than the content. They also really don't care about the source of their content, what they really care about is having as few barriers between the content and themselves. How will BiP and other bibliographic databases help professionals like me navigate this mess? Easy. By continuing to provide one-stop-shopping, only for a much wider range of items. Paper books from traditional publishers, for sure, but how about all those Print on Demand publishers? Sifting through the chaff to get the rare kernal of wheat is an important task, one I know that they're already doing to some degree. But how about digital document publishers like Morgan & Claypool? O'Reilly's Digital PDFs? White papers and other documents from all kinds of publishers? How about the incredible amount of free ebooks out there? And other useful digital documents and document collections, both free and for sale (The Einstein Archives is an example)? And breaking down the digital availability of the component parts of collections like Knovel, Safari, Books 24x7 and all the others. Any tool that could help me evaluate the pros and cons of those repositories would be greatly appreciated. The landscape out there for useful information is clearly far larger than it used to be.

  • Changing nature of metadata. Never underestimate the value of good metadata; never underestimate the value of the people that produce that metadata. It seems to me that one of the core issues is who should create metadata for books and other documents and how should that metadata be distributed to the people that want it, be it commercial search engines or library/bookstore catalogues. It would be great if all content publishers created their own metadata and that it was of the highest quality and free to everyone. There's a role for bibliographic databases to collect and distribute that metadata, maybe even to create it. The library world has a good history of sharing that kind of data, but I'm not sure how that model scales to a bigger world. It seems to me that there's an opportunity here.

  • Changing nature of customers. I've publically predicted that I will hardly be buying any more print books for my library in 10 years. Libraries are changing, bookstores are changing. Our patrons and customers are the ones driving this change. As my patrons want more digital content, as they use print collections less, as they rely on free search and discovery tools rather than expensive specialized tools, I must change too. As my patrons' needs and habits change, the nature of the collections I will acquire for them will follow those changes -- or I will find myself in big trouble. Anybody that can make my life easier is certainly going to be welcome. And that will be the challenge for the various bibliographic tools -- making it easier for me to respond to the changes sweeping my world. A good bibliographic service should be able to help me populate the catalogue with the stuff I want and my patrons need. I think a lot of progress has been made on this front in products like WC, but I think to stay in the game the progress will have to be transformative. There's lots of opportunity here.

  • What's worth paying for. In other words, BiP, WC and their ilk have to be better than the free alternatives. And not just a little better. And not just better in an abstruse, theoretical way; if it takes you 20 minutes to explain why you're better, the margin may be too slim. Better as in way better on 80% of my usage rather than just somewhat better than on 20%. Better as in saving time, saving effort, saving more money than they cost, making my life easier.


To conclude, I can only say one thing. In times of intense change and uncertainty, evolutionary pressure is extremely intense. Only those products and services that can find an ecological niche, a way to satisfy enough customers, will survive. To thrive is another story. To thrive requires a redefinition of products and services, a way to jump ahead of competitors and to win new markets with something new and exciting. It's hard to tell where bibliographic databases will find their place: will they be dodo birds, or will they find a way to survive or even thrive in the coming decade. There's certainly a window to change. Nobody is going to cancel any of these core tools any time soon. But the window will close sooner rather than later.

Meebo me, friend me

As those of you actually visiting this blog in the virtual flesh will notice, I've added a Meebo widget to my sidebar. I promise to try and spend a least of couple of hours a day logged in to Meebo, so if you do see me online when you visit, say Hi and maybe we can discuss a recent post. I've never been much of an IMer, so we'll see how it goes, but I'm certainly willing to give it a try. I'm also considering adding Meebo to some work-related projects I'm thinking about, so I'm anxious to see how it goes.

As well, I've also activated a long-dormant FaceBook account (I joined a year or so ago but never did anything). My public profile is here and the internal one is here. No friends yet so if you're on FaceBook, share a little friendship love and give me a poke or nudge or whatever. (It's going to take a little while for an old fogey like me to get a handle on the lingo and rather arcane and convoluted architecture of the site). I'm also thinking in terms of work-related applications here, so I'm hoping to become pretty comfortable by the time I'm back to the real work world in August. Also, if I'm violating some incredible FaceBook ettiquette thingy on my profile, please let me know.

April 11, 2007

The State of the Live Web, April 2007

The latest of Technorati CEO Dave Sifry's updates on the state of the blogosphere. This time he's expanding his coverage and calling it a State of the Live Web.

Some highlights:


  • 70 million weblogs
  • About 120,000 new weblogs each day, or...
  • 1.4 new blogs every second
  • 3000-7000 new splogs (fake, or spam blogs) created every day
  • Peak of 11,000 splogs per day last December
  • 1.5 million posts per day, or...
  • 17 posts per second
  • Growing from 35 to 75 million blogs took 320 days
  • 22 blogs among the top 100 blogs among the top 100 sources linked to in Q4 2006 - up from 12 in the prior quarter
  • Japanese the #1 blogging language at 37%
  • English second at 33%
  • Chinese third at 8%
  • Italian fourth at 3%
  • Farsi a newcomer in the top 10 at 1%
  • English the most even in postings around-the-clock
  • Tracking 230 million posts with tags or categories
  • 35% of all February 2007 posts used tags
  • 2.5 million blogs posted at least one tagged post in February

It's definately worth reading the whole report. This humble blog is now hovering at around 48,000th place among those 70 million blogs, which is pretty good. Unfortunately, a good bit of that is due to a recent rash of linking from rather vile splogs. My true ranking is probably around 70-80K, where it was a few weeks ago.