July 6, 2007

Summer blogging break

It's that time again. I'll see you all back here sometime around the first week of August. Have a fun, safe and restful summer.

I expect that I will be checking my email and the blog comment moderation queue about once per week.

I've fallen behind a bit on my book reviewing, with two finished books to write up while I'm off: The Trouble with Physics by Lee Smolin and the just finished Trials Of The Monkey: An Accidental Memoir by Matthew Chapman. I haven't bothered with a poll to help choose my summer reading this year, but I thought I would share the titles I've chosen to get me started:

  • The Chinatown Death Cloud Peril by Paul Malmont
  • Days Of Infamy by Harry Turtledove
  • Stolen by Kelley Armstrong
  • Clear and Present Danger by Tom Clancy
  • The Map that Changed the World: William Smith and the Birth of Modern Geology by Simon Winchester
  • Ambient Findability by Peter Morville
  • Dreaming in Code: Two Dozen Programmers, Three Years, 4,732 Bugs, and One Quest for Transcendent Software by Scott Rosenberg

You'll note that there are some science-y books among the bunch, a bit of a break with tradition for me. We'll see how that goes. In any case, I do expect to report on my summer reading, both here for the science books and the other blog for the novels. The reviews should make up the first bunch of posts when I get back.

A year of stats

I've been using Google Analytics to track my blog traffic for a little over a year now and so I've finally come to a point where I have 12 months of pretty good data – July 2006 to June 2007. This seems like a good moment to take a look back and see what's happened over the last year, especially since it more or less covers my sabbatical leave that started last August.

Before June 2006 I only used the extreme tracking service to monitor traffic. It seems to count things a bit differently from Google so I hesitate to do direct comparisons. However, during the year July 2005 to June 2006 I got a total of 8,727 hits, for a monthly average of 729. This past year, Google gives me a total of 26,928 page views for a monthly average of 2,244. As I'm writing this, I've already surpassed last year's total for July.

That's quite a dramatic increase; most of that is due to both an increased posting frequency (from one or two per week to 4 or 5 per week) as well as a concerted effort to post better. I really made an effort to do more than just quicky, newsy posts and concentrate of offering real commentary and analysis of important scitech library issues. As well, the My Job in 10 Years series proved to be quite popular as has the occasional interview series, raising the profile of the blog quite a bit. Needless to say, I'm very pleased with the increase. (And I would like thank all my visitors over the years for their time and attention.)

Blogging better has meant that I was mentioned more often in other blogs, which in turn meant more traffic. Interestingly, a majority of links from other blogs seems to have come from science blogs rather than liblogs. My niche, partaking of both the liblog world and the science blog world, is a small one but one that I'm quite happy with. Some of the major supporters of the blog out there include Walt Crawford, Coturnix, PersonaNonData, Jane, CuriousCat, TWiL and, of course, all the scitech library bloggers (whom I'm not going to attempt to enumerate, but you know who you are).

Another thing that made me want to do this post is Walt Crawford's post from a few weeks ago Getting your fifteen minutes where he talks about his own traffic (about 85K for May) and Steven M. Cohen's for the same period: about 1 million. Mine was about 3,500 for the same period. Reflecting on the differences I note that I'm not bothered by them; I certainly don't begrudge either of them their success. I do what I do, hoping to please myself and generate some traffic as a side effect. On the other hand, it did get me thinking about taking a closer look at my stats. An interesting note is that while my hits are quite a bit lower than Walt's, our current Technorati rankings are in the same ballpark: he's around 44K and I'm around 50K. The highest ranking liblogs (such as Librarian.net or Information Wants to be Free), Technorati-wise, are under 1K. Whatever that means, and I'm not sure Technorati tells us much that is useful. For what it's worth, Scintilla has me listed as the 123rd most popular science blog.

Below is a chart of the last twelve months of pageviews and unique visits. As you can see, there was quite a dramatic increase starting with January this year, right after the post on A&I Databases. It's more or less platformed the last couple of months. June would normally have been a slower than average month, what with all the conferences, but two 10 Years Series posts boosted the numbers.

So, here are top 10 lists from July 1, 2006 to June 30, 2007, with a bit of commentary on some of the interesting ones.

Top 10 Posts

  1. My Job in 10 Years: Collections: Further Thoughts on Abstracting & Indexing Databases. My most popular post ever by a significant margin. It's very gratifying as it's also one of the ones I'm most pleased with and that I worked hardest on.
  2. Best and worst science books. A post mostly pointing to various of John Horgan's science books list with some of my own commentary and lists. An odd post to make number two, but a lot of people seem to want to know about good and bad science books.
  3. Giving good presentations using PowerPoint. Another mixture of links to other blogs and my own commentary. A popular topic.
  4. The life of a CS grad student.
  5. My Job in 10 Years: Conclusion.
  6. Facebook is public not private. Most of the hits are from people trying to figure out how to view private details on Facebook. A bit creepy.
  7. My Job in 10 Years: Physical and Virtual Spaces.
  8. Friday Fun: Build your own Sherman tank. A hoot. A lot of people seem to want to build their own tank and my post linking to some instructional videos has struck a cord.
  9. Interview with Jane of See Jane Compute. I'm happy that this interview was so popular. Lots of the links were from either Jane's blog or Scientiae.
  10. An Interview with Alison Farmer. A mystery. Lots of people see to search on the name Alison Farmer. I'm not sure if they're looking for the one I mentioned or some other Alison Farmer.

A couple of honourable mentions: the tags for the 10 Years Series and my Computers in Libraries session summaries both got enough hits to make the top 10 but I decided to bump them in favour of real posts.

Top 10 Referrers

  1. Bloglines.
  2. LisNews. Including TWiL.
  3. Computational Complexity. Mostly trackback links from posts I've linked to. Lance Fortnow's final post is a huge referrer for me. The Web is a strange place.
  4. Scienceblogs.
  5. Google. I think this mostly refers various Google services like Reader & Gmail.
  6. Technorati. Other people checking up on who's linking to them.
  7. Libdex Library Weblogs.
  8. The Official Google Blog. Mostly links from trackbacks.
  9. See Jane Compute. A good number are from Jane's link to the interview.
  10. Curious Cat Science and Engineering Blog.

Interesting mix of referrers, especially the balance between library and scitech sources.

Top 10 Keywords

  1. Science librarian. I'm the number one hit on Google for this search! Unfortunately, at only 165 hits for the year, there's not that many people doing the search...
  2. Confessions of a Science Librarian
  3. Mamdouh Shoukri. The new president at York University. I did a post welcoming him when it was announced and it's attracted a fair number of hits. Now that Dr. Shoukri has actually started, It might generate a few more hits. (Oh, by the way, if you're reading this Dr. Shoukri, Welcome to York and good luck with your new job. I hope to show you around the Library in the fall!)
  4. Best science books.
  5. Alison Farmer.
  6. John Dupuis. People looking for me! Or one of the other John Dupuis's out there. I find these searches a little creepy. For what it's worth, I'm also the number one Google hit for my own name.
  7. Best science books 2006.
  8. Nerac. I did an interview with Mike Mahoney of Nerac and I think that attracts some hits.
  9. Librarian science. A variation on the theme.
  10. Confessions. The people that find me using this search, I always figure they're quite disappointed when they see the actual content ;-)

Some of these have various permutations and combinations (ie. Librarian sciences, confessions science librarian) lower ranked in the list. I haven't bothered combining any of them here, as I feel the raw list gives a good feel for what's going on. One day I may do a post on the strangest keywords.

Top 5 Book Reviews

I'm only going to do the top 5 here, as I haven't reviewed enough book over the last year to make a list of 10 meaningful. Note that the list is a strange amalgam of stats from this blog and the other blog, so take it with an even larger grain of salt than usual.

  1. Three Science writing anthologies. Reviews of the latest editions of the Years Best American Science Writing, Year's Best American Science and Nature Writing and the first science blogging anthology, The Open Laboratory.
  2. David Suzuki: The Autobiography.
  3. Balanced Libraries: Thoughts on Continuity and Change by Walt Crawford
  4. King of Infinite Space: by Siobhan Roberts.
  5. Republican War on Science by Chris Mooney.

July 3, 2007

Interview with Timo Hannay, Head of Web Publishing, Nature Publishing Group

Welcome to the most recent installment in my occasional series of interviews with people in the scitech world. This time around the subject is Timo Hannay, Head of Web Publishing at Nature Publishing Group, publishers of Nature and other associated journals as well as web products such as Connotea, Nature Reports, Nature Network, Scintilla, PostGenomic, Nature Precedings and others. Way back in May I was contacted by Natasha Ighodaro of Nature to see if I would be interested in interviewing someone to talk about some of their new web products. Eventually, she put me in touch with Timo. As it happened, Nature was in the middle of rolling out a bunch of web products, so it took a while to actually get the interview down on pixels. In any case, I'm very happy with the results and very grateful to Timo for submitting to such a long interview and for responding with such Candor. Enjoy!

Q0. Timo, please tell us a little about yourself, your background and how you ended up as Head of Web Publishing at Nature.

It’s quite a long story, so here’s a slightly abridged version: I’m a scientist with an undergraduate degree in biochemistry from Imperial College, London and an doctorate in neurophysiology from the University of Oxford. (My specialty was synaptic plasticity.) I finished my doctorate in 1994, followed by a year of postdoctoral research at Waseda University in Tokyo in 1994-5. Back in those days I was also a freelancer for The Economist, and through a colleague at their Tokyo office I got to know the people at Nature Japan too.

I’d always been a big fan of Nature — my dad bought me a subscription when I was about 18 (which I’ll admit is pretty geeky) and when I was at Oxford my first paper was published in the journal. When I met the people at Nature they were just launching Nature Medicine, and in my spare time I started covering medical research stories for them from Japan. I then lost touch for a bit when I went back to London to join McKinsey & Co.

I worked as a consultant in the UK and Japan for about three years, which was an intense and brilliant introduction to the world of business. But too many of the companies we were serving were in sectors that didn’t especially interest me. So, through a series of happy accidents, I ended up joining Nature’s Tokyo office, working full-time on business development. I had been into computers since I was a kid, and by then I was especially interested in the web. It also happened to be the case that doing more stuff online, and doing it better, was the biggest business develop opportunity for Nature in the Asia-Pacific region at that time. So that’s what I focused on: developing their Japanese website, and adding Chinese and Korean sites. In late 2000 I moved to Nature’s London office to work on the main site, Nature.com. As part of that move, Howard Ratner, Nature’s CTO, put me in charge of a new team (of about 3 people) called New Technology.

We experimented internally with things like RSS, SVG, RDF and other three-letter acronyms, but as a technical team the scope for us to turn these into new services or businesses was somewhat limited. Sometime around late 2004 or early 2005 Annette Thomas, Nature’s managing director and now my boss, decided to create a Web Publishing department with a remit to experiment with the web in a much more user-facing way. Since then the team has grown to something over 20 people. I love what I do because it’s at the intersection of my main interests: science, technology and business. I’m only sad that I don’t have much opportunity use my Japanese any more. ;-)

Q1. Some of Nature's recent journal publishing decisions have been quite controversial among librarians. Nature Physics is a good example. Do we really need another Physics journal?

I have very little to do directly with our journals because my focus is explicitly on non-traditional online products and services. So I can only give my personal option, which is that if there wasn’t a need for any of our new journals then people wouldn’t submit their papers or subscribe to them. I honestly believe that we do a much better job than most other scientific publishers, and that we create better products. That’s why they’re successful. The Nature Reviews series is a great example. Until they came out, the typical editorial and production standards for review journals were, in my opinion, very low. Nature Reviews set a new standard. Our other titles do the same in their respective fields, and considering how heavily read and impactful they are, they’re also extremely good value. Cynics may think that I’m only saying this because I work at Nature and they pay my salary, but in truth it’s the other way round: I choose to work here because I believe that Nature does great things (and I certainly didn’t move from management consulting to scholarly publishing in order to improve my bank balance ;-).

Q2. First Connotea, Nature Network, the Nature Blog, Second Nature (Nature in Second Life): you seem to be getting into Web 2.0/social software in a big way. What's Nature's strategy for these types of initiatives in the longer term?

I think it’s important to realise that we don’t just work on participative "Web 2.0"-type services. We also do a lot in the area of scientific databases (see http://www.nature.com/databases) and podcasts (http://www.nature.com/podcast). But to concentrate on the Web 2.0 stuff: we’re basically trying to identify ways in which scientists can use the web as a collaborative environment. The web isn’t just a broadcast channel or a convenient way to ship PDFs around , it’s a completely new kind of medium in which our "readers" can connect with each other.

Since our job is facilitating scientific communication, if we can’t help scientists to make the most of the web — the most powerful communication medium that humans have ever known — then we’re not merely missing opportunities, we’re simply not doing our job. So at Nature we’re trying a bunch of different things, often inspired by interesting ideas we see outside science (Connotea is clearly based on del.icio.us, and Nature Network on things like LinkedIn and Facebook), but always tailored to what we think will be of most use to professional researchers and clinicians. We’re trying to test the boundaries of what we can do, and we’re not afraid to fail, though of course we always do our best to succeed.

In line with many web-based companies, but in contrast to the scholarly journals business, our services typically launch in a fairly basic form, then we develop them in response to usage patterns and feedback. Now that we have quite a few different services, you can also expect to see them start connecting together more.

Second Nature is a bit different. We’ve been following Second Life for two or three years now, and I think it has the same kind of disruptive potential that the web had in the mid-90s. (Whether it realizes that potential depends on a lot of things, so it’s far from certain.) It could become a profoundly important medium for scientific communication and education, and we want to be there early, understanding its strengths and limitations, and working with early adopters among researchers and educators to find out how we can add value. So far it’s been a positive and eye-opening experience; I’m optimistic about the long-term prospects.

Q3. How has the uptake been for Connotea, Nature Network and Second Nature? Is there a critical mass yet to make these social environments compelling to scientists and others? How will these social networks tie into the core journal publishing business?

Connotea has a user base somewhere in the tens of thousands (the exact number depends on how active a person has to be to qualify as a "user"). Nature Network is much newer so is still in the thousands. I don’t know the visitor numbers for Second Nature, but in terms of active contributors I guess we have a couple of dozen people engaged in creating things on our virtual land, which now extends over three islands. Connotea has enough usage to create interesting second-order effects. For example, it can do quite a good job at recommending things to you based in what you’ve bookmarked. Nature Network activity has grown extremely rapidly in the 4 or 5 months since launch and is approaching a level at which we would expect see that virtuous circle in which usage (e.g., in the form of forum posts) drives yet more usage (e.g., other people coming in to read the posts). There are numerous ways in which these could tie into our journals — Connotea-generated lists of recommended reading, links to articles authored by people in your personal network, etc. — but we’re much more focused in making these services useful in their own right.

If we achieve that then we should also be able to turn them into successful standalone businesses, even though that usually won’t be through the traditional route of selling subscriptions. We’re also trying to open up these services for others to use. For example, Connotea has an API (application programming interface) that we’ve used to create tagging and "related article" functionality for the institutional repository software, EPrints. Other people have used it to do similar things with their own web and desktop applications. The Connotea code is also open-source, so there are quite a few private instances, for example behind institutional firewalls. Some of those people have also contributed code back to the open-source code, which is great because we can’t possibly develop all the requested features on our own.

Q4. And speaking of Web 2.0, peer review is a core value in science. There's a lot of experimentation going on out there with alternatives to peer review, even Nature has stuck it's toe into the water. Where do you think this is headed -- no big deal or long-awaited revolution?

My personal view is that peer review is headed for a revolution at some point, but the timing is extremely difficult to predict because it depends mainly not on technology but on various interdependent and imponderable social factors. It could be in a year or in twenty years. Having said that, there are many people at Nature who are much more knowledgeable than me about these things and who think we’re going to keep more or less the current model of peer review for the foreseeable future.

The reason I think they may be wrong is that I basically buy the "wisdom of crowds" argument: there are plenty of examples of the web causing new, open and collaborative approaches to replace traditional, closed and proprietary ones -- from open-source software to Wikipedia. You don’t always get a better result to begin with, which is why skeptics find it easy to be dismissive (as they were with both open-source software and Wikipedia in the early days). But as they evolve, and particularly as more people join in, they get better until the results match or even exceed the traditional approaches, often at much lower cost. (Anyone who’s read Clay Christensen's work will recognize this as an important part of his "innovator’s dilemma" argument).

I also believe that the web is particularly well suited to a "publish then filter" approach rather than the traditional "filter then publish" approach that was required when publishing was necessarily a physical-world process. As you can tell, my belief is based on rather abstract reasoning, and by looking for analogies outside science, so even I’m not completely convinced by it. But I’m convinced enough to know that we ought to be pushing the boundaries, because peer review is completely central to what we do, and if there’s a better way to do it then we ought to be the ones to find it. But at least in science, no one has found it yet.

Q5. Tell us a little about your new product Nature Reports, how it was developed and what need you see it filling in the scientific information marketplace. Who do you see as its main audience?

I can’t take any personal credit for the Nature Reports series, but I can tell you a bit about it. It consists of three sites — on Avian Flu, Climate Change and Stem Cells — that aim to serve a couple of purposes. First, they provide sources of scientific information on topics that affect us all, and that are all too often the subjects of spin or misinformation. We want to provide a place for non-experts to go where they know that the information is scientifically up-to-date and unbiased. They go into more depth than the mainstream media, but not so much that any interested and intelligent person can’t follow.

Secondly, we want the Nature Reports sites to become places where scientists, policy-makers, business people, and other interested parties can come together to learn from each other. Particularly in the three areas currently covered by Nature Reports, science does not and cannot operate in a vacuum. It must be willing to give and receive information in a way that will help us all — together -- to make wise decisions on questions that could affect our world for generations to come. For example, there’s no way that scientists can decide on their own what we should and shouldn’t be doing with stem cells, because those decisions are ultimately social and moral ones, but science needs to inform, and be informed by, the debate.

Q6. What do you think the future of print journal publishing is in 5 years? 10 years?

The vast majority of journals are already accessed mainly online. Many forward-thinking organizations are morphing their libraries into places to work and meet, not primarily places where documents are stored. Some libraries are even becoming entirely virtual. And that’s even before you consider the rise of scientific databases, which are just as important as journals in many fields and are all online. So I think we’re already in a world where scientific information is primarily digital. Within 10 years, I think most journals won’t any longer exist in paper form because there won’t be any point. (Though people will continue to print individual items for reading.) Nature and one or two other journals will be exceptions because they are effectively also magazines that contain news and commentary as well as research. Many people (including me) still prefer to read the "front half" of these publications in print, but eventually they too will migrate to e-readers of some sort. However, predicting the timing of that development has already caught out a lot of people who are much cleverer than me, so I won’t try here. ;-)

Q7. How about journal publishing itself? In 5 or 10 years will we be able to recognize whatever it is that journals have evolved into? Is the very nature of scientific publishing headed for some sort of transformation?

I think the concept of the scientific "paper" will remain intact (even if that name will seem increasingly anachronistic). There’s real value in this unit of publication, which tells a story by explaining how something previously unknown has become know through a particular set of experiments. But beyond that, there’s a lot of potential for change. Smaller units of discovery will be published -- whether through blogs or databases or whatever -- because the barriers to publishing them are now so low. This, in turn, will create the need for new services to find and collate this information, preferably in a personalized way, and new measures of scientific impact that take into account such contributions, which will be much smaller and more numerous than published papers.

Journals will become better linked, easier to search, and more dynamic. Many databases will take more seriously the need for curation, peer review, citability and archiving. In this way, journals and databases will be harder and harder to tell apart, and I think the distinction between them will ultimately become meaningless. In cases where journals don’t add much editorial value -- whether through filtering or otherwise improving the content -- the concept of the journal itself may start to erode as readers become ever more concerned with the paper they are reading rather than where it came from.

Q8. Can you tell us a little about Science Foo? It looks like a lot of fun -- not something we normally associate with science publishing.

Science Foo Camp is certainly one of the most fun and cool things I’ve ever done at work. It’s based on a meeting format invented about 5 years ago by O’Reilly Media, the influential technical book publisher run by Tim O’Reilly. They run an annual event for techno-geeks called Foo Camp. (“Foo” is a word computer programmers use to denote some arbitrary value or name -- like “x” in algebra -- but in this case also stands for "Friends of O’Reilly".) Basically, Tim and his colleagues invite 200-300 interesting people to their HQ in Sebastopol, CA for a weekend of self-organised demos, presentations, brainstorming, contraption-building and musical jamming (basically whatever people find interesting).

The great thing about it is the quality and variety of people there: software billionaires, technically precocious teenagers, engineers, scientists, writers — you name it. The only criteria are that O’Reilly consider them to be doing interesting stuff, and they want to introduce them to others. So it’s a bit like a giant, manic, weekend-long dinner party for geeks. They’ve become something of a legend in techno-land. Anyway, I attend a lot of O’Reilly conferences (it’s where I steal most of my best ideas ;-) and have known Tim for several years. Last spring, at his Emerging Technology Conference in San Diego, following a conversation he had had with Linda Stone (a brilliant ex-Apple and –Microsoft person with a keen interest in science and medicine), Tim suggested to me that we organize a Science Foo Camp. I thought it was a great idea.

We then spent a few weeks looking for a suitable venue, during which time Tim asked Eric Schmidt at Google, who loved the idea too. That was in late May or early June last year, and we decided to hold the event in August, so we had only two months to get lots of interesting scientific people to the Googleplex. We were really worried that it would be too short notice to get the kind of people we were seeking. We were also worried that scientists from diverse fields might not have as much to discuss with each other as people from the technical realm. We needn’t have been concerned: it was a great success. Attendees raved about it and several went away with not just new ideas but new collaborations. One thing that worked really well — aside from the great venue and format — is that we included some non-scientists in the mix. These ranged from technology people with a strong interest in science to sci-fi writers and others with cultural links to science but from outside research. I think those people helped to foster a truly interdisciplinary mindset. We’re doing the same again this year, though this time we have had a bit more time to plan it. I really hope that we’ll be able to do this every year from now on.

Q9. Who do you think your biggest competitor is? Open Access journals, other society or commercial publishers or even just the notion that everything is available for free on the web?

None of the above. ;-) To be honest, I don’t spend much time thinking about any of those. Open access will come about mainly through funder-mandated self-archiving, not author- or sponsor-funded journals. Of course we compete with other established publishers too, but they are a relatively known quantity. Your point about everything being free is related to an issue that I think is critical for publishers of all stripes: how to create viable business models that don’t involve charging for content (whether readers or authors). That’s not because I believe it’s necessarily going to become impossible to do charge readers, but it won’t always be the optimal (or even a viable) business model, especially for collaborative online services, so we need other options. In short, we need to get much better at monetizing traffic.

But to answer your question, I think our biggest competitor is the unknown grad student in his (or her) dorm room hatching a plan to turn scientific communication upside down in the same way that Napster, Google and Wikipedia disrupted other industries. Such people are a threat precisely because of their obscurity and lack of any historical baggage. You no longer need a lot of money, or even necessarily a strong brand, to succeed online. Good ideas and implementation are much more important. That drives almost everything we do. I hope that we’ll come up with the best ideas and implementations first, not mainly because of a commercial desire to out-compete others, but because that’s how we can best support scientific discovery.

I see myself less as a scientific publisher and more as a scientist who happens to work in publishing, helping information about ideas and discoveries make their way as quickly and efficiently as possible from their originators to those who can put them to use. If I ever thought I wasn’t being effective in that role, I’d find some other way to spend my time, probably outside publishing, but almost certainly connected with science.

Q10. Nature Precedings almost seems like the boldest of Nature's recent web offerings, nudging the larger scientific community into the same direction as, say, the physicists. What was the rationale behind introducing the service, and what do you see as it's place in the Nature suite of web products?

The basic rationale is that it’s in the interests of science for researchers to share their findings with each other as early and openly as possible. As you say, this already happens in physics through arXiv.org (and Paul Ginsparg, who runs that service, has very kindly offered his advice as we’ve been setting up Nature Precedings).

There are all sorts of theories about why it doesn’t happen so much in biology and other fields, but we thought the time was right to try and kick-start it. For one thing, there seems to be an increasing acceptance and understanding of the power and value of the web in enabling open collaboration, whether through domain-specific scientific databases or much more general services like Wikipedia. We were also able to get public support from some outstanding partners: the British Library, the European Bioinformatics Institute, Science Commons, and the Wellcome Trust (with more to come, I anticipate). This is key because the barriers to adoption are much more social than technical, and no one organisation has the right mix of skills and influence to pull this off on its own.

For the same reason, we’re also reaching out to other publishers. I expect a few of them will be cautious at first, but many of them clearly appreciate what we’re doing, which is about complementing the journal system, not competing with it, and about building an open federated system, not a closed proprietary one. For our own part, Nature Precedings helps us to engage with scientists at an earlier stage of the research process, which supports our traditional journal activities.

Also, by moving early we hope to be among the first to work out how best to make this kind of service economically self-sustaining. We’ve already made clear that that won’t involve charging for access -- and we’re working with some of our partners to set up open mirror sites to guarantee that.

Q11. Scintilla, PostGenomic, Nature Reports, even Connotea, all seem closely related to me, all about organizing information and bringing it all together. Are all these services coming together eventually or are they going to get more differentiated?

To be completely honest, that’s not yet certain because it depends on how people use those services, and what they tell us about their needs. But my expectation, and our current intention, is to steadily integrate them in a way that will allow information from one application to be used within another, and for users to hop between them seamlessly. Ultimately the distinction between these different services should therefore become less and less pronounced.

I think that’s a good thing because people just want help with their scientific information needs, they don’t want to have to work out whether Connotea or Scintilla (or whatever) is the answer, and they certainly don’t want to have to visit several different sites to conduct a single task. That doesn’t mean they will all turn into one monolithic application, but it should become easier for (say) Connotea users to access Scintilla functionality, and vice versa. We’ve certainly put a lot of thought into making that kind of integration possible, but to what extent we pursue it ultimately depends not on us but our users.

July 2, 2007

Weinberger, David. Everything is miscellaneous: The power of the new digital disorder. New York: Times Books, 2007. 277pp.

David Weinberger's Everything is Miscellaneous is one of 2007's big buzz books. You know, the book all the big pundits read and obsess over. Slightly older examples include books like Wikinomics or Everything Bad Is Good for You. People read them and mostly write glowing, fairly uncritical reviews. Like I said, Weinberger is the latest incarnation of the buzz book in the libraryish world. So, is the book as praiseworthy as the buzz would indicate or is it overrated? Well, both, actually. This is really and truly a thought provoking book, one that bursts with ideas on every page, a book I really found myself engaging and arguing with constantly, literally on every page many times. In that sense, it is a completely, wildly successful book: it got me thinking, and thinking deeply, about many of the most important issues in the profession, at times arguing every point on every page. On the other hand, there were times when it seemed a bit misguided and superficial in its coverage of the library world, almost gloatingly dismissive in a way.

So, I think I'll take a bit of a grumpy, devil's advocate point of view in this review. I am usually not shy pointing out flaws in the books I review, but this will probably be the first time I'm really giving what may seem to be a very negative review.

Before I get going, I should talk a little about what the book is actually about. Weinberger's main idea is that the new digital world has revolutionized the way that we are able to organize our stuff. In the physical world, physical stuff needs to be organized in an orderly, concrete way. Each thing in it's one, singular place. Now, however, digital stuff can be ordered in a million different ways. Each person can order their digital stuff anyway they want, and stuff can be placed in infinite different locations as needed. This paradigm shift is, according to Weinberger, a great thing because it's so much more useful to be able to find what we need if we're not limited in how we organize in in the physical world. In other words, our shelves are infinite and changeable rather than limited and static. Think del.icio.us rather than books on a bookstore shelf.

Weinberger is sort of the anti-Michael Gorman (or perhaps Gorman is the anti-Weinberger?) in that the former sees all change brought about by the "new digital disorder" as almost by definition a good thing. Whereas Gorman sees any challenge to older notions of publishing, authority and scholarship as heresy, with the heretics to be quickly burnt at the stake. Now, I'm not that fond of either extreme but I am generally much more sympathetic to Weinberger's position; the idea that we need to adjust to and take advantage of the change that is happening, to resist trying to bend it to our old-fogey conceptions and to go with the flow.

So, what are my complaints? I think I'm more or less going to take the book as it unfolds and make the internal debates I had with Weinberger external and see where that takes us. Hopefully, they're not all just a cranky old guy pining for the good old days but that we can all learn something from talking about some of the spots where I felt he could have used better explanations or substituted real comparisons for the setting up and demolishing of straw men.

The first thing that bothers me is when he compares bookstores to the Web/Amazon (starting p. 8). Bookstores are cripplingly limited because books can only be on one shelf at a time while Amazon can assign as many subjects as they need plus they have amazing data mining algorithms that drive their recommendation engines, feeding you stuff you might want to read based on what you've bought in the past and/or are looking at now. First of all, most bookstores these days have tables with selected books (based on subject, award winning, whatever) scattered all over the place, highlighting books that they think deserve (or publishers pay) to be singled out. On the other hand, who hasn't clicked on one of Amazon's subject links only to be overwhelmed by zillions of irrelevant items. It works both ways -- physical and miscellaneous are different; both have advantages and disadvantages. After all, the online booksellers only get about 20% of the total business, so people must find that there's a compelling reason to go to physical bookstores.

Starting on page 16, he begins a comparison of the Dewey decimal system libraries use to physically order their books with the subject approach Amazon and other online systems use. I find this comparison more than a bit misleading, almost to the point where I think Weinberger is setting up a straw man to be knocked down. Now, I'm not even a cataloguer and I know that Dewey is a classification system, a way to order books physically on shelves. It has abundant limitations (which Weinberger is more than happy to point out ad nauseum) but it mostly satisfies basic needs. One weakness is, of course, that it uses a hopelessly out of date subject classification system as a basis for ordering. Comparing it to the ability to tag and search in a system like Amazon or del.icio.us is, however, comparing apples to oranges. Those systems aren't really classification systems but subject analysis systems. The real comparison, to be fair, to compare apples to apples, should have been Amazon to the Library of Congress Subject Headings. While LCSH and the way it is implemented are far from perfect, I think that if you compare the use of subject headings in most OPACs to Amazon, you will definitely find that libraries don't fare as poorly as comparing Amazon to Dewey and card catalogues. And page 16 isn't the only place he get the Dewey/card catalogue out for a tussle. He goes after Dewey again starting on page 47; on 55-56 he talks as if the card catalogue is the ultimate in library systems; on 57 he refers to Dewey as a "law of physical geography;" on page 58 he again compares a classification system to subject analysis. And on page 60 he doesn't even seem to understand that even card catalogues are able to have subject catalogues. The constant apples/oranges comparison continued for a number of pages, with another outbreak on page 61-2, as he once again complains that Dewey can only represent an item in one place while digital can represent in many places; really the fact that Weinberger doesn't realize that libraries use subject headings as well as classification and that an item can have more than one subject heading, well I find that a bit embarrassing for him, especially at the length he does on about it. Really, David, we get it. Digital good, physical bad. Tagging good, Dewey bad. Amazon good, libraries & bookstores bad.

It was at this point that I thought to myself that in reality, even Amazon has a classification system like Dewey, in fact they probably have a lot of them. For example, the hard drives on their servers have file allocation tables which point to the physical location of their data files. At a higher level, their relational databases have primary keys which point to various data records. Even their warehouses have classification systems, as their databases must be able to locate items on physical shelves. Compare using a subject card catalogue to find books on WWII with being dropped in the middle of a Amazon warehouse! He sets up the card catalogue as a straw man and he just keeps knocking it down and it get tiresome that way he just keeps on taking easy shots.

Weinberger also misunderstands the way people use cookbooks (p.44). Sure, if people only used cookbooks as a way of slavishly copying recipes for making dinner, then, yeah, the web would put them out of business. But, people use cookbooks for a lot of reasons: to learn techniques, to get insight into a culture and way of life, to get a quick overview of a cuisine or a style of cooking, as a source basic information for improvising, to read for fun, to get a insight into the personality and style of a chef, to get an insight into another historical period. The richness of a good cookbook isn't limited by just recipes.

I have to admit that at this point I was tempted to abandon the book altogether, to brand it as all hype and no real substance, a hoax of a popular business book perpetrated on an unexpecting librarian audience. Fortunately, I didn't. There were more annoyances, but the book got a lot stronger as it went along, more insightful and more penetrating in it's analysis. However, I think I'll stay grumpy. (hehe.)

One of the more annoying arguments (p. 144) that I often encounter in techy sources is that the nature of learning and the evaluation of learning has changed so radically that we will no longer want to bother evaluating students on what they actually know and can do themselves, but rather will only test them on what they can do in teams or can use the web to find out. In other words, not testing without cell phones and the Internet at the ready. Now, I'm not one to say that we should only test students on memorized facts and regurgitated application of rote formulas; and I think you'd be hard-pressed to find many schools that only do that. From my experience, collaboration and group work, research and consultation are all encouraged at all levels of schooling and make up a significant part of most students' evaluation. Students have plenty of opportunity to prove they can work in teams and can find the information they need in a networked environment. But, I still think that it's important for students to actually know something themselves, without consultation, and to be able to perform important tasks themselves, without collaboration. Certainly, the level of knowledge and tasks will vary with the age/grade of the students and the course of study they are pursuing. If someone is to contribute to the social construction of knowledge they, well, need to already have something to contribute. In fact, if everyone always only relied on someone else to know something, then the pool of knowledge would dry up. The book asks some important questions: what is the nature of expertise, what is an expert, how do you become an expert, are these terms defined socially or individually, how is expert knowledge advanced, how is expert knowledge communicated? A scientist who pushes the frontiers of knowledge must actually know where they are to begin with. At some level, an engineer must be able to do engineering, not just facilitate team building exercises.

And little bits of innumeracy bug me too. On page 217 he's trying to make the point that the online arXiv has way more readers than the print Nature. ArXiv has "40,000 new papers every year read by 35,000 people" and "Nature has a circulation of 67,500 and claims 660,000 readers -- about 19 days of arxiv's readers." Comparing these two sets of numbers is a totally false comparison. What you really need to do is compare the total download figures for arXiv to the total download figures for Nature PLUS an estimate for the total paper readership. For arXiv does he think all 40K papers are read by each of the 35K readers for a potential 1.4 billion article reads? The true article readership is probably much, much smaller than that. As for the print, the most recent Nature (v744i7148) has 14 articles and letters; for a guestimate for a whole year print, multiply by 52 weeks and 660,000 readers equals a potential 480 million article reads; probably not everyone reads each article, but at least most probably at least glance at each article. For the print only. He doesn't even seem to realize that Nature, like virtually every scientific journal, has an online version with a potentially huge readership, which Weinberg in no way takes into account. It's clear to me that, at least based on the numbers he gives, what I can actually say about the comparison between the readerships for Nature and arXiv is limited but that they may not be too dissimilar. Not the point he wants to make, though. Again, the real numbers he should have dug up, but did not seem to want to use, was the total article downloads for each source.

Now, I'm not implying that print is a better format for science communication than online -- I've predicted in my My Job in 10 Years series that print will more or less disappear within the next 10 years -- but please, know what you're talking about when you explore these issues. Know the landscape, compare apples to apples.

I find it frustrating that in a book Weinberg dedicates "To the Librarians" he doesn't take a bit more time to find out what librarians actually do, how libraries work in the 2007 rather than 1950. (See p. 132 for some cheap shots) But in the end, I have to say it was worth reading. If I disagreed violently with something on virtually every page, well, at least it got me thinking; I also found many brilliant insights and much solid analysis. A good book demands a dialogue of it's readers, and this one certainly demanded that I sit up and pay attention and think deeply about my own ideas. This is an interesting, engaging, important book that explores some extremely timely information trends and ideas, one that I'm sure that I haven't done justice to in my grumpiness, one that at times I find myself willfully misunderstanding and misrepresenting (misunderestimating?). I fault myself for being unable to get past it's shortcomings in this review; I also fault myself for being unable to see the forest for the trees, for being overly annoyed at what are probably trivial straw men. Read this book for yourself.

(And apologies for what must be my longest, ramblingest, most disorganized, crankiest, least objective review. I'm sure there's an alternate aspect of the quantum multiverse where I've written a completely different review.)