Tuesday, 18 December 2007

You pay for what you get

Civil servants are reeling in the wake of the horrific news that CDs containing the records of Her Majesty's Revenue and Customs (HMRC) database have been lost, and the futher news of DVLA data being lost. The full cost to tax paying members of the public may not be fully realised for years to come.


This debacle is not only an example of incredibly poor information management, but also a sign of a wider problem in the UK, that you get what you pay for. Or in this case you don't get what you pay for. 


Information management is, or rather was, at the heart of British life. Travel to former colonies like India or Australia and they'll gladly inform you of the regimented behaviour towards information that led to government structures that have served the sub-continent and prison colony well to date. Yet, those standards have dropped.


An IWR reporter remarked as we debated the issue, how come information of this value was so easy to simply download and burn to a CD?  Technology preventing such blunders is not new and is a basic function of many information management systems.


Revelations of the missing information came a day after a report on the BBC's Today programme that the Driving Standards Agency and vehicle licensing body the DVLA employees take on average three weeks sick leave a year. Missing information and low staff moral are examples of a civil service that is poorly funded and poorly managed.

It is too easy to wag the finger of blame at civil servants, when in truth a much wider debate needs to take place.  As tax payers and child benefit recipients we are angry and worried, as information professionals we are dumbfounded that such lapses could have occurred.  What of our role as citizens?  Since the 1980s we've wanted a John Lewis service, but only paid Tesco value brand prices.  If you want John Lewis quality, you pay John Lewis prices.  On the high street this modus operandi fits well with the public, as they choose when they want quality and when they want to increase their spending. So why is it that we expect our state services to manage high level information on a low level budget?

This needs to be a debate about our society and its values, literally, as well as an improvement in information management.

How will a slowdown affect ECM decision making?

With 2007 all but put to bed, a lot of people are looking forward to 2008 with not a little trepidation. And the cause of that fear is the prospect of a decline in the macro economy.


I'n no economist and have healthy scepticism for many self-proclaimed experts, but there is a lot of consistency in terms of the impact on IT buying. Down markets can be very healthy for technology buyers and very disuptive on the incumbents. In the early 1990s, for example, companies like Microsoft, Dell and Oracle were able to win chunks of market share as firms looked to take advantage of the move to client/server architectures, and shift away from mainframes.


In the wake of the dot-com collapse, software-as-a-service companies such as Salesforce.com took advantage of the squeeze on capex spending by offering subscription-based pricing that let firms get projects up and running very quickly.


In ECM, it's pretty obvious that firms offering low-cost, fast-deployment options stand to prosper if the economy struggles. That could be good news for SharePoint, Alfresco and companies offering freebie tools such as IBM Yahoo OmniFind and Microsoft's Search Server.

Friday, 14 December 2007

pdf forgeries

It's not every day an email starts like this:

PDF files have essentially become the standard within the business community because of the need for a protected file. However, the software is useless if users need to edit the document in any way.

Oh ho ho (he says, seasonally), does this mean the sender has a way to edit pdfs undetectably? This I must see. The software is called deskUNPDF Professional and it comes from Docudesk. It promises to:

convert PDFs into Word documents, view data in XML-format, or convert the files into HTML for a web presentation

It actually throws the result out in a huge variety of standard formats including images, csv and Sony Reader's lrf.


The danger appears to be that you could 'round trip' a pdf by exporting to word or an image file, fiddle around a bit and then print to pdf using one of the many pdf writers around.


Fortunately for people trying to protect their pdfs, the exercise proved less than satisfactory. Forgeries are evident. So the real value of the software is that you can move a pdf into another format.


Below, I've round-tripped an image page and a text page from the World Wildlife Fund's "Sustainability at the speed of light". It is utterly evident that I've been up to no good. And that, I believe, should be proclaimed as a valuable feature of the software.


Here's the original and the pdf output going via Word:


Compare1


It's shrunk and there's a bit of textual overlap. The Word export has strange column breaks and the different text blocks appear to be in the wrong sequence when editing.


In honour of the Kit Kat tv commercial where the pandas roller-skate when the cameraman's not looking, I thought I'd do a bit of panda substitution using the GIMP.


Compare2


I didn't add the speed streaks (and I cannot repeat the phenomenon) but they do look rather nice. However, even had my panda been better executed, this is clearly no way to forge a pdf.


Bear in mind that a lot of pdfs are protected by copyright and you need to be sure you're not going to land yourself in hot water by republishing. (Hopefully my snippets aren't going to get me into trouble.)

Tuesday, 11 December 2007

Information professionals guiding you to the best bits of the blogosphere

Ben Toth reveals how he keeps his information intake healthy and why blogging can be more valuable than social networks such as Facebook.

Q Who are you?
A Ben Toth, 48, domiciled on a farm in Herefordshire. I trained as a librarian at University College
London about 15 years ago. I used to be the director of the NHS National Knowledge Service when it was part of Connecting for Health. The best known service it runs is the National Library for Health (www.library.nhs.uk). Currently, I’m designing the enterprise architecture for the National Institute for Health Research (www.nihr.ac.uk). I’m also writing a book on Health 2.0, which will be published in parts later this year.

Q Where is your blog?
A You’ll find it at http://nelh.blogspot.com

Q Describe your blog and the categories on it
A It’s just a public notebook really. Its content tends to reflect what I’m working on, but it’s mostly about libraries, health and the web. I could use Microsoft Word to keep my notes. I could use del.icio.us. But a blog is more visible and more in the flow of the things I’m reading, which
are almost invariably on the web. A lot of the entries I make are just notings – highlight, right-click and
send to Blogger. I use tags but I’m not very strict about categorising things.

Q How long have you been blogging?
A Since about 2001. Eighteen months ago I lost all my entries and had to start again.

Q What started you blogging?
A I was helping my daughter set up a website as part of a Brownie project she was doing. I couldn’t use the National Electronic Library for Health servers and I didn’t want to manage Apache or pay someone to, so we used Tripod. Which worked, but it was difficult to use. And then I read about Evan Williams’ little project, which became Blogger, had a go with it, and haven’t looked back. It’s become a
habit, and I haven’t got tired of it yet.

Q What bloggers do you watch and link to, and why?
A These days I follow things through RSS if I can, so my blog-watching is mostly via a feed reader. The only blog I regularly visit is Dave Winer’s (www.scripting.com) because he’s taken blog writing to a level where the argument is developed through the day and so needs to be read on the page. I look at Techmeme (www.techmeme.com), but that’s not really a blog. I used to maintain a list of blogs
that I linked to through blogrolling, but I can’t see the point of doing that any more. The social web takes care of that sort of affiliation-showing much better.

Q Do you comment on other blogs?
A I don’t comment much. Sometimes I carp from the sidelines on e-healthinsider (www.e-health-insider.com), but I don’t think there’s much value in commenting or reading comments. That’s not to say that discussion isn’t valuable, but I’d rather read views as blog entries rather than comments on
someone else’s blog.

Q How does your organisation benefit from your blog presence?
A It’s the best way of keeping in touch with what’s going on, and keeping a blog maintains some
visibility to people.

Q How does blogging benefit your career?
A Blogging and RSS are really important for me professionally. They keep me up to date in a way
that nothing else can.

Q What good things have happened to you solely because you blog?
A Making professional contacts that I otherwise wouldn’t have and maintaining ones that might
otherwise have fallen off. In some ways blogging is more useful than LinkedIn and Facebook
as a social networking tool. But it’s really only a matter of time until traditional blogging gets divided
up between Facebook, Vlogging and Twitter.

Q Setting work aside, which blogs do you read just for fun?
A The Fake Steve Jobs blog was great (http://fakesteve.blogspot.com). And when I need a chuckle, I check out the Dilbert RSS feed (http://dwlt.net/tapestry/dilbert.rdf ).

What are the blogs in your sector that you trust?
A The reliably interesting starting points on library matters for me are:
www.earlham.edu/~peters/fos/fosblog.html
http://orweblog.oclc.org
www.philbradley.typepad.com
http://tomroper.typepad.com
And Jon Udell is a first-class technologist who happens to like libraries (http://blog.jonudell.net)

Time for ECM to stop selling fear

Over at CMS Watch, Tony Byrne mentions that he has heard the term “risk of incarceration” being used by reps as a spin on the older and somewhat more traditional meaning of ROI, “return on investment”. That shouldn’t surprise you too much on at least two counts.


First, enterprise content management (ECM) companies have become accustomed to pitching content management as a tool to “keep the CEO out of jail” by providing an audit trail of programs, files, messages and their associated creation, viewing, editing and other interactions. The selling spiel says: “Remember Enron? You don’t want to end up like that so you’d better have a good document management and retention strategy. Oh and if you were scared by Sarbanes-Oxley, there’s a ton more of that stuff coming down the line.”


Second, the company Byrne says he heard using the term was IBM, the company that was the originator, of course, of selling “fear, uncertainty and doubt”.


As I said at the top, I’m not at all surprised but if sarcasm is the lowest form of wit, then this is certainly among the lower echelons of effective sales and marketing. It gets an instant reaction but you might struggle to sell it twice –- as many firms are finding out the hard way.


For the last five years, ECM vendors have been touting regulatory compliance and reputation risk as threats to business. This was a crude but effective weapon in the down market after the dot-com collapse but, having made fortunes from scaring the bejeezus out of firms, the sales guys could really do with a hose down and a fresh approach.

Friday, 7 December 2007

Icelandic data refuge

We're used to offshoring work to other countries to achieve cost reductions or follow-the-sun working, or both. We're also used to having some of our computing activities and services hosted remotely through web hosts, Salesforce.com, Google, Facebook et al.


Some of these companies run huge data centres and they are concerned about continuity of energy supplies. Some are siting themselves near renewable energy sources, others moving to where they can get the stuff cheap or to locations where they can avoid declaring their energy consumption. (True, but you can do your own research on that one.)


Anyone who's a serious consumer of power is trying to find ways to get the consumption down. Hardware and software suppliers are having a great time selling virtualisation software, efficient new kit and clever new cooling systems. And, when customers have all done this and got themselves sorted out, they'll still find that the need for computing resources will grow and they'll have to find new ways to cope.


Well, with a hat tip to an announcement by Data Íslandia and Hitachi Data Systems, another possibility has surfaced. Why not move all the hosts to safe countries where natural energy abounds? The 'safe' is probably the main challenge. If you think 'solar' then some of the hottest countries also happen to be the least desirable from this perspective.


Iceland, on the other hand, has a rather unusual combination of plenty of renewable geothermal and hydro-electric energy coupled with a cool climate. It has a technically literate population and it is relatively secure. No-one seems to want to invade it, for example.


Data Íslandia specialises in providing disk- and tape-based long-term data archiving services. Yesterday's Hitachi deal is based on its data management services which will enable multinational organisations to address the management, compliance and environmental burden of exploding data volumes. Data Íslandia director, Sol Squires, says "virtualising six-month old information, which is effectively digital toxic waste, is a very poor use of resources." Customers will be able to offload stale data while still having real-time access to it.


No doubt there are a million political reasons why I'm wrong, but Iceland strikes me as an environmentally agreeable and secure place to house our national digitised libraries. Maybe our tax records too.

Thursday, 6 December 2007

IWR Information Professional of the Year Award

The IWR American Psychological Association Information Professional of the Year award has been announced and went, deservedly to Brian Kelly, UK Web Focus for the UKOLN organisation.


The award is judged by a panel of previous winners and the IWR editorial team. As editor of IWR when I judge the award I look for an individual who is pushing the limits of information, technology and making the role of the information professional as far as possible and making it an exciting role.  When looking through the final results I could see that the other judges felt the same way and Brian was an excellent choice.


Brian's role is a national Web co-ordinator, an advisory post funded by the educational body JISC and the Museums, Library and Archives Council (MLA).


In this role Brian is looking at the web as central resource for learning and research in higher education and is looking at ways to make the web a successful resource, which is a challenging role, because the web is still very young and is constantly changing. This can be seen with the recent changes dubbed Web 2.0, therefore Brian is going to be pretty busy for some time to come.


Based at the University of Bath, I know from information professionals I have dealt with in the academic sector that he is very well respected and his thoughts are often the basis for great debate within the industry. Linked to this is his blog, which is one of the most popular blogs in the sector.


I hope all IWR readers will join me in congratulating Brian for an award very much well deserved. 

Tuesday, 4 December 2007

Jimmy Wales on the role of Wikipedia in society

Jimmy Wales, chairman of Wikipedia was the keynote speech of Online Information 2007 with a presentation Web 2.0 in action: Free culture & community on the move.


Starts with Britannica editor Charles van Doren 1962, who said the encyclopaedia should be radical, but Wales claims they have been anything but.


Wales280x293 Small showing of hands for those that have edited, although Wales believes it’s a good showing, "but not as many as college kids".


I consider us to be the Red Cross of information, he says as he describes its charitable status. Have 10 full time staff and will spend about $2 to 3 million this year, which is tiny compared to the major publishers. Vast majority of the money is from small donations, which he likes because its grass routes and not dependent on advertisers.


Wales talks about the desire to extend the languages that are in use on Wikipedia, including Hindi and Afrikaans.


Wiki is free in the sense of GNU, its free to copy, modify and distribute.


Shows a video of his travels to India and how he learnt that the local communities want to use the English version, as the English language is a route out of poverty. His organisation has been out to South Africa teaching students how to edit Wikipedia. "One of the things we have learnt is that if you can get five to 10 editors working together, it can make a great difference." These groups make progress and then they look towards outreach and who they can include. Hence the organisation has set up an academy to find the founding editors. It has begun in India, with 10-20,000 articles a month being put together by academy organisations.


Wikia is his next subject, a separate organisation with 66 languages, including a 67th, Klingon. Wales goes on to demonstrate using Google search results for Muppets and how the top result is the official site, but the rest of the results are from web based conversation, ie Wikipedia pages, forums and fan sites. He demonstrates an article on the Ford motor company and how on Muppet Wiki site, there is an article on Muppet Ford ads and how this demonstrates this level of information would never have been available before.


The search engine is a political statement, in a small P sense, Wales says. The proprietary software of the main players is a mystery in that people have no control of the accountability. The Wikia search will publish its algorithm.


Wales believes that the trust of social networks and setting up trusted networks can be utilised in search. .


On the role of collaboration, he asks the audience to imagine that they are designing a restaurants, discussing the idea that we trust the people around us, we don't put people in cages in restaurants because they will be using knives.
The wiki philosophy is to allow people to do good.

ECM needs to get usability - fast

New research from Oracle and IDG suggests that firms are failing to capitalise on unstructured content. Well, with Stellent now added to its acquisitions mountain, Oracle would say that, wouldn’t it? But the data is interesting nonetheless.


According to the report, two-thirds of “senior IT decision makers” in Western Europe think they have the unstructured data issue managed, or are on the right tracks to cracking the problem. The flip side of that is that 60 per cent say they can’t make business decisions based on unstructured data because it is either too hard to find or because it is sitting among other, irrelevant data.


The average organisation surveyed had 4.28 ECMs in place (!) with many, unsurprisingly, seeking to consolidate. Oracle suggests that this “raises the question as to whether European organisations actually understand that unstructured content is an enterprise-wide issue that requires a strategic enterprise-wide solution”.


That’s a dodgy conclusion. The proliferation of ECMs (and ERPs, databases, BI systems etc) might be better accounted for by the crazy growth patterns and the pace of change in modern technology-driven business. When Oracle itself came along with client/server databases, few smart companies said “sorry, we’ve already standardised on DB2 on the mainframe”.


One other data point is worth examination: 63 per cent of European enterprises “consider email as the primary source for managing unstructured content, with 86 per cent admitting that email is used as the primary source for sharing content”.


That’s refreshingly open but it’s not as “surprising” as Oracle suggests that email is often a vehicle for decision making. The fact that many of us use email as out primary means not only of communications but also for knowledge management, contact information and much else is as much an indictment of ECM usability as anything else.


This research is clearly Oracle positioning itself as the company capable of making ECM palatable for mainstream businesses who are dissatisfied by the big incumbents. Fair enough, the more ECM matters get an airing the better.


But it also suggests to me that ECM is still in its infancy. Alfresco’s John Newton is fitting ECM with social networking integrations to reflect his belief that ECM users will move from being 10 per cent of the orgainsation to over half of users. This Oracle data backs up the hunch that ECM might have to change fast to fit in with the way users want to work, rather than asking users to adapt to what software designers say is right.

Thursday, 29 November 2007

Alfresco securely binds Facebook to ECM

You hear about organisations such as BT and the BBC adopting Facebook as the place to hang out and connect. A friend in the BBC told me they were all "addicted" to Facebook. Perhaps she should be told not to use that particular word at her next performance appraisal.


JP Rangaswami, MD of BT Design, is hugely in favour of Facebook because it creates a formality and permanence around conversations which were once the province of the water cooler. He can see what's really going on rather than have to believe what the org chart tells him. He also likes the idea that the infrastructure to run Facebook is external to BT and therefore someone else's problem.


As you know, many other companies are terrified of letting their staff loose on such social networks and actually ban them, despite the fact that many staff are familiar with them and use them elsewhere in their lives.


Software vendors either want to create their own equivalent in order to keep control or they reluctantly allow Facebook into a a sidepanel of their main applications. They probably don't want to give too much functionality away in case it undermines their business model. I wouldn't like to hazard a guess at what Microsoft is up to with its shareholding in Facebook.


But then along comes enterprise content management company Alfresco with the idea that it will not only accept Facebook, it will cheerfully integrate it into the company's repertoire. It allows registered users to publish and share documents and other information in a controlled, secure and auditable environment.


In case you've not heard of Alfresco, it started a few years ago with the intention of being a free Documentum but five times faster and ten times cheaper. The apparent conflict between 'free' and 'cheaper' is that anyone can download and run the software for free, but if they want support they can jolly well pay for it.


Since then, it has fully embraced the social networking world, blurring the boundaries between front office and back and putting content production and consumption into the hands of the many rather than the specialised few.


Company founder John Newton is a panellist at the Online Information conference next Thursday at 11:30. The debate will be about the death of proprietary content management. Wishful thinking? Or are these guys onto something?

Our tax levels cause disasters like HMRC

I was meant to be going to the House of Lords tonight. No I haven't spent the missing IWR marketing budget on a Labour party donation and offer of a peerage from Tony Blair. Tonight's rare opportunity to entered the hallowed chambers of the Lords was for the launch of Information Matters, a guide to good information management practise.


Obviously this has become a bit of a hot potato subject for the powers of Whitehall and I was not totally surprised to hear that the event has been "postponed", I am though disappointed, now I really will have donate money to some political party that will change its policies from day to day to suit its sponsors!


But cynical disbelief in political parties aside, the debacle at HMRC is not an opportunity to clobber the current Labour government, they can do that on their own. This now needs to be a debate about the quality of service we desire. The mistakes that took place at HMRC happened because of poor policy and in all likelihood, a demotivated and under appreciated and underpaid staff. These factors in any organisation will lead to a disaster.


Sadly as a nation we are demanding a John Lewis service, yet only prepared to pay a Tesco budget brand price for it. Our government and political parties fear spending public money, or worse, the public and the Daily Mail discovering that public money has been spent. Yet cuts in budgets and over stretched departments have led to this scenario and could lead to more.


It is ridiculous that a country as rich as the UK that is experiencing unparallelled levels of growth is trying to run its infrastructure, which after all is what our civil service is, on a shoestring. We have politicians tempting us with tax cuts, yet clearly they cannot balance the books with the revenue they have, how will public information be well managed and secured in a state that has even less revenue coming in?


The awful mess at the HMRC needs to spark a debate about how we want our nation to operate. Groups and parts of the media are quick to call for changes to immigration levels, but lets have a debate about the quality of our services, all of them, whether its schools and hospitals to departments looking after taxation or defence. We cannot lower taxes when our troops are being put at risk in Iraq to secure oil in ill equipped vehicles and our civil service is making basic mistakes with valuable data.


It may not be a popular move, but as a European nation that expects its authorities to provide child benefit, shouldn't we at least pay a proper level of taxation to meet those expectations?

Wednesday, 28 November 2007

Exploitation 2.0

I got a smashing email the other day from a fellow Flickr user. Apparently, they'd shortlisted a picture of mine. How exciting.


Well, turns out, not that exciting. The Schmap shortlist was for a so-so picture taken in Brighton to be published in their online guide to Brighton. So far, so good. Unfortunately, I wouldn't be paid, and Schmap, and  I'm presuming its owners, get perpetual worldwide rights to the image. Free.  If, like me, you love free stuff, that's great news. Except when you're giving stuff - free - to a company that will make money from it. Because although Schmaps are free at the point of consumption, the company makes money by selling advertising off the back of them.


Of course, to look at the web, this is great; Schmaps has clearly got its messaging spot on, and there are tons of Flickr users who think that being published - albeit without being paid for their work - is about as exciting as it gets. Some professional photographers are particularly excited, of course.



Tuesday, 27 November 2007

Diluting commitment to Open Access (OA)

At the beginning of the month the World Health Organisation (WHO), Intergovernmental Working Group on Public Health, Innovation and Intellectual Property (IGWG2) convened to discuss the issues of Open Access (OA) and reach a global consensus for developing a strategy. What emerged was a weakening on the language requiring scientific publishers to comply with an OA model.


Manon Anne Ress on the Knowledge Ecology International blog details how the draft document on global strategy, penned at the end of summer, originally stated to “promote public access to the results of government funded research, through requirements that all investigators funded by governments submit to an open access database an electronic version of their final, peer-reviewed manuscripts”


By the time the IGWG2 met again in November the word “requirements” had been replaced in the document with “strongly encouraging”.


OA champions and commentators questioned how this would affect OA and would “encouragement” even if it was “strong” actually achieve anything from publishers? Significantly, they were questioning why the change had happened and particular who had made the call.


Peter Suber, over at Open Access News asked his readers “which national delegations inserted the strong language in the first draft and which wanted to weaken it in the new draft?”


I haven’t seen any follow-ups that specifically name names there, but as Ress reports in her post; “there was some opposition to the “requirement” language by some European countries”. It’s not unfair to say with the exception of CAS and Wiley the scientific publishing world is heavily dominated by European firms.


Most of these companies are attending our sister show Online Information next week and I’ll be sure to ask that question when I catch up with them then. I hope someone is happy to elaborate on this. After all, is there a reason to be anything but open about this?

Monday, 26 November 2007

SaaS might not fit enterprise search

The rise and rise of software as a service has been such a mantra in the IT media over the last few years that it comes as something of a shock to see the SaaS model actually on the wane in enterprise search. Nevertheless, a recent report by CMS Watch says it plainly: “[the SaaS] model has been a hot topic recently [but] the SaaS model for enterprise search is on the decline”. So, what’s going on here?


CMS Watch itself lists three possible reasons: the preponderance of web-only search in SaaS offerings; the popularity and ease-of-use afforded by appliances; and the competition-squishing presence of Google in the sector.


Let me suggest two more: the fact that free is a compelling price, and the notion that SaaS might not be all things to all men.


Companies looking for a search service today will inevitably be attracted to freebie tasters, especially when the companies offering them -– Microsoft and IBM -- are as big as they come. As discussed earlier, these are highly attractive inducements that offer familiar environments to try out, and a solid upgrade path for those who want to carry on afterwards.


Second, it’s time to admit that SaaS has no Midas effect, except perhaps on marketers. The on-demand model has had a revolutionary effect on customer relationship management and sales force automation, and it is changing the way human resources operates, for example in measuring employee performance. But there are many, many other areas where it has had little or no effect. Even in the much-hyped area of productivity applications where Google and various startups have generated scads of coverage, there has been close to no impact on the hegemony of Microsoft Office, for example.


SaaS is a hugely important trend but privacy concerns, the need to delve into far-flung corners of the enterprise and ancient applications, and sundry other factors mean that search is unlikely to be a happy hunting ground for the model in the immediate future at least.

Thursday, 22 November 2007

The Jimmy Wales keynote

Jimmy "Jimbo" Wales is the keynote speaker on the first day of the Online Information conference.


He comes across as a genuine and thoughtful person with a huge commitment to open source, transparency and community involvement in projects. Wikipedia was the first result of his enthusiasms.


His keynote is entitled "Web 2.0 in action:free culture and community on the move". This suggests that he will be forward-looking and concentrate on his current long-term Wikia or Wikia Search projects (blogged here in January) rather than backward-looking and talking about Wikipedia.


If you are used to the traditional approach to computing, you'd expect a destination to be defined and the route to that destination mapped out with checkpoints along the way. Be prepared for a shock.


The approach that Wikia Search takes is to define a set of principles and the general structure of the project. This then acts as a magnet to the sort of people who are interested. They form communities depending on their specialisations and, at some point downstream, the great mass of the general public get involved, using the tools developed by the initial community.


Where wikipedia involves the world in contributing original material, the Wikia project is concerned with clothing existing information with value. The theory is that this will help refine search results and, partly through complete transparency and partly through community influence, be very difficult to game.


Just because Jimmy Wales made a great name for himself with Wikipedia doesn't mean that he can succeed again with search. But he's taking a pragmatic approach by doing the spidering and indexing just like the other engines but then using humans to refine the results by thumbs ups, thumbs downs and other more sophisticated assessments.


The project will take time to evolve and it's possible it will challenge the existing search giants in the same way that wikipedia has become a port of call for millions more users than traditional encyclopaedias. Who knows? Even Wales doesn't. He certainly never talks that way. But his thoughtful exploration of the issues around community development and participation should make for interesting and challenging listening.


See you there?

A sorry state of affairs

This country has seen better weeks than the last seven days. I’m not a massive football fan, but it was still disappointing to see none of the teams from the UK go through to the European Championships last night; oil meanwhile is nearing, if it hasn’t already broken, $100 a barrel and then there has been this week’s incredible mistake at the HM Revenue and Customs (HMRC) with your valuable information languishing somewhere (with someone?)


The shoulder of blame has been shared between a junior member of staff, who apparently sent the disks and HMRC boss Paul Gray, who quit last week when it became evident about the scale of this problem. Chancellor’s previous (PM Gordon Brown) and current (Alistair Darling) are pointing out there is no evidence that it has ended up in criminal hands, yet.


According to the BBC, the official line is that the information is "likely to still be on government property".


Reassuring.


People are quite rightly perplexed and angry as to why this could have happened. As information professionals I imagine you probably more than most. Understanding the complexities and processes that are involved and the safeguards that should be in place with information of this kind falls within your sphere of expertise. In some cases as are organisations like the HMRC. Seeing the incidents around this sorry tale unwind, must leave you shaking your heads in despair.


Anyone feel the information profession side has been let down, never mind the population?

Tuesday, 20 November 2007

Bodleian Repository Plans scuppered

Not great news this week for the Bodleian Library which has had its plans to build a new book depository rejected.


Back in September I originally blogged on how the eminent, ancient and utterly congested library got the thumbs up to build a new state of the art site near the city centre. It is intended to house 8 million books and will relive pressure on the library currently operating at 130% capacity and adding an additional 5,000 books per week.


The reasons for rejection from the local council were based on concerns of how the new repository would affect the city’s attractive skyline, the second that the site was on a flood plain.


Dr Sarah Thomas, Bodleian chief said “We worked with the Environment Agency, English Heritage and the city, and ultimately a solution will present itself”.  Local paper the Oxford Mail also reported that Thomas said it was too early to consider an appeal or alternative locations for the new building.


One thing that is not going to go away is the urgent need for a safe, secure, space. How long will it take until the library’s seams start to burst?

Monday, 19 November 2007

Facebook: the new go-to platform for ECM?

There's a famous Bob Dylan press conference of the 1960s where a hapless journalist asks the young singer to define his music "for people like me who are well over 40". Dylan, for it is he, answers that it can defined as music "for people who are well under 40". As a member of the fifth-decade club myself, I can empathise with the old hack -- my ancient critical faculties don't stretch to understanding the Facebook phenomenon.


What I can understand is that Facebook and other popular social networks have tremendous reach, and, therefore, offer tremendous opportunities and, by logical extension, carry an equal degree of risk. Enter Alfresco and its announcement that it is integrating with Facebook. I'm not too sure that this wasn't in part a clever PR stunt that exploits the fascination with the website du jour but it makes complete sense for ECM firms to be applying their management tools to content that is exposed on sites like Facebook.


As even the most conservative companies recognise that exposing content to blogs, wikis, podcasts social networks and other formats will be a necessary part of their futures, that content will need to be managed. The ECM system should become the preferred alternative to piecemeal alternatives to managing content and companies that neglect to protect that content will be in trouble.


Having said that, I'm not convinced that Forrester's Kyle McNabb is right to suggest that this is the end of ECM as we know it. Large companies will be among the last to adopt the latest social media, but that is no excuse for ECM firms not to build in necessary controls ahead of demand.

Thursday, 15 November 2007

Cut the spin, save the world

I wonder if our lords and masters (our servants really, although that's difficult to believe) ever consider the environmental consequences of their decisions? Take the national identity card. Will the storage drives need to rotate perpetually in case anyone decides to check us out? Or could the forces of law and order be happy to wait while a drive is fired up and rummaged? If the drives are running continuously, has anyone worked out how much energy would be needed to run them, the computers that access them and the systems needed to cool them?


My guess is that the people who conceive these surveillance projects do not bother themselves with such matters. Yet, even if we're not yet running out of energy, it gets more expensive by the day and, of course, most of it contributes to the carbonisation of the planet. Surely any politician worth their salt would be careful before burdening the atmosphere with more CO2?


Which brings me to the British Library and its Microsoft-sponsored digitisation project. It's been worrying the hell out of me. I've been thinking of all those computers and disk drives sustaining substantial quantities of material that's going to be looked at only rarely. It makes no sense. But then offline tape storage doesn't make sense either.


Fortunately, a company called Nexsan has provided the Library with an answer. It's invented a MAID, a Massive Array of Idle Disks. They sit around quietly stationary until they're woken up by a request for information. This approach, according to Nexsan, cuts energy use by 96 percent. It gives an example of a conventional fibre channel storage device which consumes 187KW of energy per petabyte, whether it's being accessed or not. Its own Nexsan Assureon system in Level 3 AutoMAID idle mode consumes just 6KW.


It makes you think, doesn't it? Especially if your organisation has massive amounts of 'Just in case' storage.

Information professionals guiding you to the best bits of the blogosphere - Lorcan Dempsey

Lorcan Dempsey has worked for JISC and libraries on both sides of the Irish Sea and the Atlantic. As a member of the National Information Standards Organisation, his blog on networked information and digital libraries is well followed.


Q Who are you?
A I work in Dublin, Ohio, was born in Dublin, Ireland, and spent a long time in between in the UK. I am lucky to have what I believe to be one of the most interesting jobs in the library world. I am responsible for the programmes and research area within OCLC (Online Computer Library Center). I also help shape OCLC strategic direction.

Q Where can we find your blog?
A http://orweblog.oclc.org

Q Describe your blog?
A I say that it is about “libraries, networks and services”. I suppose that over time it has become more general. At first it had more of a technical slant; now it ranges more widely. I tend to talk about how networks are reconfiguring library services and I have some recurrent threads. These include:

Making data work harder.
We invest a lot in bibliographic data and need to use it more imaginatively in our systems and services.
Moving to the network level.
No single website is the sole focus of a user’s attention. The network is the focus of attention. And a major part of our network use revolves around significant network-level services ­ Amazon, Google, IMDB, and so on. These match supply and demand in efficient ways. The real message of Web 2.0 is the emergence of this pattern of service: data hubs with strong gravitational pull generated through network effects.
Being in the flow.
The focus of attention has shifted from website to workflow. The network is not so much about finding things as getting things done, and we have increasingly rich support for “networkflow”. We may construct our personal digital identities around services in the browser or on the network (RSS aggregators, social networking sites, bookmarks, etc), and we use prefabricated workflows (course management system, customer relationship management system, and so on).

Q How long have you been blogging?
A Almost four years.

Q What started you blogging?
A After I arrived in OCLC I tended to send out a lot of emails. A colleague suggested that a blog might be a better model.

Q Do you comment on other blogs and what is the value of it?
A The comments on some blogs seem more important than on others.

Q What are the blogs in your sector that you trust?
A I keep a wide range of feeds in my aggregator and will focus on different ones from time to time. Again, I tend to be more interested in “voice” or those from whom I can learn something. From a library point of view, I look at Caveat Lector (http://cavlec.yarinareth.net) and ACRLog (www.acrlblog.org).

Alma Swan’s new blog, OptimalScholarship (http://optimalscholarship.blogspot.com) and eFoundations (http://efoundations.typepad.com) from Andy Powell and Pete Johnston, are informative and provocative. I find PlanetCode4Lib (http://planet.code4lib.org) an efficient and useful way of keeping up with a range of stuff.

Q What good things have happened to you that could only have happened because of your blogging?
A I have always contributed to the professional literature. But I find that blogging is quite liberating: it is much easier to write blog entries than longer pieces. It has made me write more quickly and to think about short communications.

Q Which blogs do you read just for fun?
A I look at John Naughton’s Memex 1.1 (http://memex.naughtons.org) and William Gibson’s blog (www.williamgibsonbooks.com/blog/blog.asp), and the pictures in YarnStorm (http://yarnstorm.blogs.com) make me smile.

Tuesday, 13 November 2007

BL backed with Gov bucks

It has been a pretty good week for the British Library (BL) having just won a renewed level of funding from the Department for Culture Media and Sport. The BL will receive a rise of 2.7% keeping it in line with inflation. There were fears originally that the Government’s Public Sector Spending Review may entail a reduction of funds to the library. Cuts would have seriously affected the level of service and provision the library would be able to offer. It’s good to hear that access to the reading rooms remains free, along with a variety of other services.


I imagine there has been a collective sigh of relief over at the nation’s world class library, not least from BL Chief Executive Lynne Brindley. The library’s CEO has previously said cuts would mean she would end up running “a second rate organisation”. Yet there is still the issue of ensuring the BL’s annual capital settlement remains adequate to continue with the BL’s massive national newspaper digitisation project.


Brindley also appears in an interview in this month’s Harvard Business Review. The esteemed title asks what lessons she has learnt during her tenure at the library so far, “I learned the importance of fitting communications to what the organisation wants and needs – otherwise you don’t get the buy-in” she says.


So far it’s a good thing she got the purse-string holders to buy-into her vision, let’s see if Brindley and the library’s supporters can get the continued funding they need.

Monday, 12 November 2007

EMC's lateral thinking pays off

The rise of enterprise content management over the last five years has seen the entry of giants into a segment once characterised by names only a specialist would have recognised.


Microsoft has made its play with an in-house approach that has delivered the hugely successful SharePoint. IBM has also done a lot of work behind the scenes with content management services, but admitted the need for more when it announced the acquisition of FileNet in 2006. Similarly, Oracle got a fair way down the line, tying in services with its database, but then acquired Stellent last year for its customers and deeper domain knowledge.


These were decent strategies that were characteristic of the seasoned companies that delivered them but perhaps the least convincing strategy came from EMC. The company had made its name as the Switzerland of storage, being an independent company that was not tied to servers in the way rivals IBM, HP and Sun were. EMC was pretty much a pure hardware company until it acquired Documentum in 2003, although it had signalled its intent by agreeing to buy storage software giant Legato Systems just months earlier.


EMC justified itself by saying storage needed intelligent software if companies were to automate the protection of files. Then, in a move that again puzzled many onlookers, EMC acquired VMware and said storage and server virtualisation needed to converge. Plenty of people scratched chins and wondered if EMC was imagining synergies that were invisible to the rest of us.


Today, with unstructured data continuing to grow at a bewildering speed, with compliance mandates showing no sign of letting up, with ECM and storage infrastructure walking in lockstep, and with VMware shaping up as the biggest hypergrowth company in technology since Google, nobody is criticising EMC’s strategy. Proof, if ever it were needed, that lateral thinking can work wonders.

Friday, 9 November 2007

Problems and Solutions? IRFS Day 2

After a storming start to yesterday’s Information Retrieval Facility Symposium (IRFS), enthusiasm was still running high this morning. The opening keynote for day two was hosted by Henk Tomas from IP Search Services; he did a great job outlining pretty much all the main issues facing information specialist and patent workers. His fellow speakers were hard-pressed not to duplicate what he had already covered. It is also a nice opportunity to give you an overview of the main challenges raised at the symposium.


Up for consideration was a thorough examination of why patent information is so important to both small and big business alike. Tomas explained that patent information can be used a means of keeping tabs on competitors, suppliers and emerging technological developments. It is also a way of hobbling others from utilising a technology to prevent advantage and can avoid a duplication of efforts or ‘reinventing the wheel’; furthermore, patents are part of a globally accepted legal system.


Tomas identified many of the big issues to address and conquer. My top three of those he mentioned are issues I have seen raised here more once.


• A massive rise in patent and non-patent literature in the last 30 years. Much of this is Asian in origin; the language differences and therefore difficulties are obvious. There is also a risk of drowning in information.


• A lack of standards in the patent world, particularly in terms of the point of information entry. A common database structure would also help for search purposes.


• Errors and inconsistencies in content sometimes made deliberately for competitive advantage.


Follow-up speaker was Minoo Philipp; she is the Patent Information Manager at chemical manufacturer Henkel and President of the Patent Documentation Group. She asked the audience, “Do we have a problem with patent searching? No, it’s finding the right information.’ ‘The problem is the structure available and also the errors” she added.


Philipp called for a global standardisation of how patent applications are made. It wasn’t something all the delegates agreed with, believing a technical solution was required instead. Philipp asked ‘wasn’t that treating the symptom rather than the disease?’


Considering the implications for standardisation could necessitate a change of each nation’s patent laws, that one solution may be a while coming.

The dark side of social networking

We spend a lot of time talking about the benefits of different social networking systems. We talk about the conversations that are enabled and the shortcuts to meaningful relationships, whether business or social. What we rarely, if ever, talk about is the dark side. The big brotherish side that can, if it wants to, track our activities in minute detail.


If big business is involved, and it is, you can be certain that this information is like gold. Of course it wants to track you. It pays very good money for the privilege of learning as much about you as possible. And a terrific way to do this is to know who you are then watch your behaviour: what websites you visit, how much time you spend on various activities, where you're connecting from, who you communicate with, whether you're a man or a woman and so on.


The instant you log in to a service - Facebook, Yahoo!, Microsoft, whatever, you're no longer anonymous. At a recent session with a company in this space, the first couple of hours were dedicated to how users could be exploited rather than served. I won't name names because, whatever the public face of these companies, the conversations behind closed doors are likely to be very similar.


But we're willing participants. These organisations provide a platform for communities to form and, because of our desire to connect, we share all manner of personal information. Not to the host, but to our online chums. Sadly, every time we contribute something or click the mouse, we freely, and unwittingly enable the host to refine our profiles and to deliver the advertisements most likely to appeal.


If we want to participate in online, public, social networking communities then it's best to assume we're regarded by many as victims rather than beneficiaries.

Thursday, 8 November 2007

Mind the Language Gap

One of this morning's IRFS Language Gap sessions asks the question of why we need a cross lingual patent retrieval system in an Asian language. Well for a start, consider that three of the top five patent filing countries hail from the region; Japan is first, followed by China in third and the Republic of Korea in fourth. The US and Europe take second and third places respectively. 


If you operate in the patent world then sooner or later you will probably have to work with filings from Asian origins. The question is how, seeing as there are some fundamental structural differences between Western Latin-based languages and their Asian counterparts. Among numerous examples, there are for instance six varieties of expression for the colour red in Korean, how a word is spaced in Chinese can also heavily affect its meaning and translation to English considerably. When applying this to the ambiguous nature of legally constructed patent documents, the challenge is considerable.


Relying on just human translation is not an option, in part due to the sheer volume of documents constantly being filed as well as existing material.


Minah Kim from the Korean Institute of Patent Information has been explaining how their cross lingual retrieval system copes with the issues but also what still needs to be done.


She called for efforts to improve quality, such as a semantically based query expansion, whereby each word in an original search is expanded to a related search term such as boat to ship to vessel to water. Time spent on a query is also an issue that needs improvement, with the average amount spent on one document retrieval being 10 seconds; that can cause problems with a 200 page document, never mind the rest.


For the short term, the plan is to consistently upgrade the systems dictionary and establish the query expansion by the end of this year. In the longer term, Kim says there must be the development of a cross lingual retrieval system for Chinese, Japanese and Korean patent documents. Addressing the language translation issues with similar models is something that Western organisations can also only benefit from.

Information Retrieval Symposium opens

For the next two days, I will be covering the latest developments at Vienna’s Information Retrieval Facility Symposium (IRFS). It is a meeting of patent experts and business leaders and is designed to address the challenges that face those who operate in the emerging industry of patent information retrieval.


The overarching theme for the convention is one of patent experts meeting information scientists and opening up a meaningful dialogue. Hopefully there will be a merging of minds, finding common ground; you get the idea.


According to joint IRF Chairman Francisco Webber, there are many complex issues of transferring knowledge between science and business, what the IRFS hopes to identify is the shortcomings in the Intellectual Property (IP) world as well as the methods and potential solutions from the Information Retrieval (IR) sector.


Keith Van Rijsbergen, Webber’s co-Chairman and professor of Computing Science at the University of Glasgow, also talked about how tools for IR specialists are somewhat lacking. There are also issues that include how users of the future will utilise the next generation of patent searching technology. This could apply to organisations gaining a competitive advantage by monitoring others patent search audit trails. In parallel, the industry will see a rise of ‘naïve’ rather than ‘expert’ searchers over the next decade.


Of the many issues still to be put to the floor, it was discussed how government and business alike are not utilising available patent information to assist with their commercial interests. Open Access is also considered to be a central aspect of the innovation cycle of IP, IR and IRF


Of the other big topics to come will be the difficulties overcoming the language gap around the globe as well as patent search technology. More to follow…


A full analysis on the future of patent search technology will feature in December’s issue of IWR

Tuesday, 6 November 2007

Microsoft drops the search bomb

Some football fans critisicse David Beckham because "all he can do" is cross the ball. All Microsoft can do, in some pundits' books, is take market sectors that were once the domain of specialists charging pricey tariffs, and make them available to all. Today, Microsoft began to do that to enterprise search.


This might be looked back on as a rather momentous day in search as Microsoft's Search Server announcement is likely to change the rules of engagement in the field. The most notable points of the announcement that make it such a disruptive move are:


One, neither the freebie product nor the free product, Search Server Express, has a ceiling on number of searchable documents.


Two, neither the paid-for nor the freebie SKU requires a dedicated server.


Three, there are out-of-the-box connectors to FileNet, Documentum and Lotus Notes.


Four, this is a pure search move that does not lean on SharePoint, Microsoft's smash-hit entry into ECM.


Five, and this might well be the biggest factor, this is Microsoft, so the brand and ability to integrate with other key infrastructure will be huge plus factors for many buyers.


The net effect of the move will be to put pressure on Google's search appliances and IBM's OmniFind but Microsoft will not stop there and it is already talking about Office '14', the next major release, packing in high-end features. Autonomy made a very smart move in buying Zantaz, investing in video search and in other moves that have taken it away from basic enterprise search, which is showing every sign of being commoditised.


Google has done a sound job and IBM made a bold move in introducing its free version of OmniFind last year but the Microsoft manoeuvre is a major land grab. We still need to see the product and pricing details but if rivals are feeling afraid, we can understand their fear.

Monday, 5 November 2007

IWR at the Information Retrieval Facility Symposium 2007

Towards the end of the week I will be blogging live from the Information Retrieval Facility Symposium (IRFS). This year the meeting will be held in (what I am told) is the very fair city of Vienna.


Billed as a convention where “Science meets Business” the experts in attendance will attempt to hammer out how best to handle the vital retrieval of digital patent information as well as its storage.


The organisers say that as the field of Information Retrieval is at such an experimental phase, this will be the driving force behind the conference sessions.


The two-day event will be split into five sessions and a series of working groups. Kicking off on opening day will be the Data Quality session with speakers from GlaxoSmithKline and the Royal School of Library and Information Science.


Session 2 will focus on Language gaps in the Information Retrieval world. Particularly in relation to original Chinese and Korean documents


Session 3 is entitled Corpus Enrichment, or rather how technology can be utilised to recognise and extract the implicit information in a document. This can include the difference between preamble, a detailed description, claims made in a patent document or alternate examples. 

Sessions 4 and 5 will examine the related tools available for information professionals and management and research respectively.


I will be covering the significant areas, developments and intellectual jousting that will be going on as soon as doors open on Thursday morning and close Friday evening.


* A full analysis on the future of patent search technology will feature in December’s issue of IWR.

Friday, 2 November 2007

Free concept search from Yahoo!

In the relentless game of public search engine leapfrog, Yahoo! may have just leapt into the lead. It has added some fresh intelligence to help the hesitant user.


No doubt other search companies will already be unpicking the Yahoo! offering to see whether they can improve on it. And it's highly likely they can. But, for the moment, Yahoo! is the benchmark with its 'Explore concepts' extension.


Yahoo! senses when a user hesitates while typing a search query and pops up auto-complete suggestions. This, of course, is not a new idea. The new bit comes when you get to the results page. If the results aren't what you expected, you can click on a little arrow to drop down a panel containing the autocomplete suggestions and an 'Explore concepts' section to the right:


Y1


In the above example 'information world' produced, not surprisingly, over a billion results. The autocomplete would have done the job for you and me, but the concepts on the right are designed to help the user who's thrashing around a bit.


As you click on each concept, new search results appear along with the concepts which relate to the new search expression.


I tried 'electron spin resonance', about which I know little. After clicking through 'free radicals' (thought I might hit a political pressure group), I was intrigued by 'magnetic moments'. Here's the first result.


Y2a



Bear in mind that this functionality is not part of some hugely expensive enterprise search system, it's freely available to the general public.


If you're like me, you've regarded Yahoo! largely as an organisations that wants to push ads at you and keep you within its semi-walled garden. A bit of a turn off, especially in Europe apparently. But developments like this and, in a totally different context, Pipes suggest that Yahoo! has realised that life is not all about take, it's about give as well.

Thursday, 1 November 2007

Wiley goes on Safari

Global publisher John Wiley & Sons is not afraid of new technology and ventures, as I recently discovered in a meeting with them at the Frankfurt Book Fair. The Bookseller reports today that Wiley has now inked a deal with Safari Books Online, an increasingly important on demand reference platform.


Wiley will add its business and technology reference books to the Safari platform and see its content aligned with other leaders like Pearson, O'Reilly and the publishing arm of software giants Microsoft. Wiley will add its For Dummies books, which it acquired from web and magazine publishers IDG, and the Bible range of computer books.


This is an important deal. Reference books are still an amazing resource for users, and a method of information delivery and publishing that still has plenty of legs in it. Like all information resources though, it is a sector that has been threatened by amateur services like Wikipedia. Reference is clearly an information set very well suited to the web. Safari is a platform that offers a genuine alternative to Wikipedia. Because the content on Safari is from credible publishing companies that check the veracity of information, use knowledgeable experts and put a great deal of effort into the writing, editing and presentation of the information, it is more credible than Wikipedia. Wiley has increased the desirability of Safari and improved reference information on the web.

Monday, 29 October 2007

Facebook again

Last week's round of investment in Facebook is enough to make any Dot Com veteran quake in their boots. It's not that Facebook isn't good - it's stunningly good. It's not that this particular dance involving Microsoft and a fresh young company hasn't been danced before, either - just take a look at High stakes, no prisoners for a good idea of some of the behind-the-scenes shenanigans that undoubtedly went on.


In short, Microsoft invested $240m, and two unnamed hedge funds plumped up another $500m, valuing Facebook at $15bn. It's a stunning number, something we haven't seen in a long time. Rightly, it scares the living daylights out of many people, myself included. It brings back memories of years past, when  things were valued on the basis of hype, on the basis of an ever-lifting market, on the basis of a new technology trouncing the status quo and ushering a new dawn of techno-wonderous utopia. But enough about the telegraph.


However, there's another way of looking at the investment, particularly from Microsoft's angle, that makes a little more sense. Remember Microsoft's recent moves into advertising? Chucking $6bn at aQuantive is a far bigger investment than a piddling $240m at Facebook. Any lay person might assume that Microsoft valued aQuantive more highly in terms of business use than Facebook; after all, it's punting  over 24 times the money on aQuantive than it is on Facebook. Of course, it's not that simple, but it's an interesting thought.


Techdirt has itself a very juicy idea of what Microsoft is up to - and it's all about aQuantive and Facebook's advertising plans. In short, the investment in Facebook is a place holder, something that makes a marketing statement about Microsoft's intentions in online advertising. I can't wait to see all of this play out.

Who do they think they are?

Today’s Guardian reports that family historians (both professional and amateur) have raised concerns over the lack access being given to paper-based archives. The report says, “There will never again be public access to the paper records.”


The problem is that the schedule for putting the paper material under lock and key and the go-live date for a replacement online version is; surprise, surprise, out of sync. It seems there has been a gap in the timetable of digitising the original material and providing members of the public access to legible resources.


As the paper records are no longer there, anyone conducting genealogical research will instead have to make-do with microfiche until the online system is completed.


According to the Guardian report, the researchers have concerns about both the legibility of material held in microfiche, and the daunting task of having to search through it. They resent being forced to use old and clunky technology until the new digitised system is up and running. “Not even God himself is going to be able to find most of this stuff” said one amateur researcher.


Sarah Williams, Editor of BBC TV show spin-off magazine, Who Do You Think You Are? said; “The sweetener was that the paper records would be replaced by a superior digital version. But to loose one before the other is ready is a highly questionable decision”


A spokesperson for the Office for National Statistics, which is responsible for the General Records Office says, “When our project to create a massive online index of 250 million births, marriages and deaths is complete, it will dramatically improve public access to information of interest to family historians.” They went on to say, “The present target is to have the online index available by mid-2009”


Until that present target is either met (or moves) then some researchers may want to polish up their microfiche skills, they are going to need them for a while yet.

Cosying up to Microsoft in a crowded bed

Anybody who has been watching the TV adapatation of Fanny Hill recently will recognise that  hopping between beds can be the fastest way to win friends and influence people. In enterprise software, that truth has long been recognised, hence Open Text's announcement that it is cosying up to Microsoft by opening up a development office in Redmond,  home, of course, to the world's largest software company.


In a statement, Open Text said: "Our relationship with Microsoft is founded on customers' need for
complementary ECM solutions that blend the strengths of Microsoft and
Open Text, bringing the power of Microsoft's productivity tools and
ubiquitous presence on the desktop, together with our ECM solutions and
vertical-market expertise."


Quite so, but the problem for Open Text is that everybody has the same idea and Microsoft's bed is very crowded these days. Everybody wants to gain a lever from Microsoft's ubiquity by integrating its software with key programs and by copying its look and feel. This has been Software Marketing & Development 101 ever since companies such as Corel and Micrografx saw there was business to be had in building applications for Windows.


A secondary driver is the fact that Microsoft is eating the lunch of ECM companies, thanks to the remarkable success of SharePoint. Firms like Documentum are reduced to hoping that firms use SharePoint at the front-end and their "grown-up" products at the back end. This, they hope, is the new realpolitik, although even this compomise might be delusional.


In ECM, you can't spit without hitting a company that claims to be in cahoots with Steve Ballmer's men. Many claim to have "special" relationships, for example in developing for certain vertical industries. It's no secret that these companies care about Microsoft more than Microsoft cares about them. The only time that will change will be the day Microsoft decides it needs to buy one of these companies. Then, at last, there really will be a special bedfellow.

Thursday, 25 October 2007

A chance to help Mariella

Dear Mariella,


Enjoyed the repeat of your Open Book programme today. I'd sneaked away from the computer for a bit and up you popped on the radio. It was interesting to get your take on the world of social computing. Like many people who aren't involved, your incomprehension was quite a treat. Afterwards I wondered if you were doing it on purpose to wind up two of your guests,  Victoria Barnsley, boss of Harper Collins,and David Freeman, founder of Meet The Author.


Since I write for information professionals who are interested in both books and technology, I thought it might be interesting to get a conversation going on the value, or otherwise, of the internet to book authors.


Of course, this blog post could just languish, like most do, or it might trigger some interesting feedback. That's the nature of the web. People take a look at stuff and, in moments, decide whether to linger or move on.


The note below is to put my readers in the picture. Feel free to join in.


All the best,


David


The programme involved two websites: Authonomy, from Harper Collins, which will give authors a place to upload 10,000 of their words so that visitors can decide whether it's any good; while Meet The Author plays recordings of authors talking about their work.


Mariella suggested to Victoria that Authonomy was "just a cynical way to get the general public to do the work for you" and "ultimately it's a way of you getting your paws onto new work and creating a degree of ownership  over it before you've had to commit to it financially in any way." Ouch.


The answer is, of course, that people have a chance to make an impression and get picked up for consideration by Harper Collins or, indeed, any other agent or publisher who happens by.


Mariella found it hard to believe that that Harper Collins would not be "upset" if another publisher snitched talent from the Authonomy site. Victoria suggested that this would prove that the site was a huge success. Mariella retorted with, "but isn't it just like a talent show for authors. Like something you'd expect to see on ITV?" She threw the same accusation at David Freeman.


Not surprisingly, both speakers more or less agreed with her. Victoria noted that tens of thousands of authors might get read who otherwise would have been ignored. David suggested that if publishers and agents liked the author's pitch, they might ask to see their work.


In the end, good writing is essential to being published. But these two sites offer much needed visibility and promotion for unknown authors, a way to emerge from the fog that surrounds agents and commissioning editors.


But in publishing, as in the rest of life, the democracy inherent in the internet is a bit hard to get to grips with. It may be a little threatening to people in conventional positions of power.


Would anyone care to comment?


PS I just checked out the 'Meet The Author' site and it operates on vanity publishing lines rather than YouTube. Authors pay for the privilege. I suspect this will not be the case with Authonomy.

Tuesday, 23 October 2007

Bibliographic benchmark for digital works is implemented by gang of four

The British Library, Library of Congress, National Library of Australia and Library and Archives Canada will synchronise their practices, in applying the new Resource Description and Access (RDA) classification system. Designed specifically for organising and retrieving library materials in a digitised age, the RDA system replaces the Anglo-American Cataloguing Rules.


It is expected the transition process will be put into practice at the close of 2009, during the run up to this period, any training, documentation or national application decisions made will be shared with the other three partner institutions.


The new joint initiative means good news for librarians, cataloguers and researchers; the system’s key features are a big increase of flexibility in cataloguing new media information. Such features of RDA mean that as it’s an online resource, the presentation of information will help cataloguers be far more flexible in their description of digital content, including compatibility with existing online catalogue material.


By utilising the RDA framework, any information added will abide by its own set of independent regulations, meaning that how that data is displayed elsewhere can be different. This ensures that there is flexibility in presenting records in a variety of online platforms. Ultimately, this should mean data generated into the system will be more malleable to up-and-coming database technologies as well as the viewing platform.

Monday, 22 October 2007

Nuxeo - ECM's best kept secret?

With all the excitement over mega-mergers and corporate governance mandates, some of the smaller companies in enterprise content management probably haven't had their fair share of publicity in the past few years. That's certainly true of Nuxeo, which remains one of the best-kept secrets in ECM.


Why so quiet? Well, Nuxeo has its roots in France and, in part, you can blame the media (OK, people like me) obsession with north American companies and the ups and downs of firms with huge revenue streams. In part, it's also probably because Nuxeo's platform is built on open source and open-source outfits tend to do things by stealth. Anyhow, whether it likes it or not, Nuxeo is due some attention.


The company has just released an update to its core Java-based ECM, adding a couple of features such as new search capabilities and stronger data import/export, but the real story about Nuxeo is its customer base that  reads like a Who's Who?  of French business and government.  The company has a London office and is striving to go beyond its local market -- it's worth getting to know.


Friday, 19 October 2007

Facebook the facilitator

Jackie Cooper PR runs an 'anonymous' (not any more) blog called The Pirate Geek. A couple of days ago, it posted a paeon to personal contact, honesty and the demise of the cult.


I arrived there from Edelman Analyst Relations man Johnny Bentwood's Technobabble 2.0 personal blog. Cheers Johnny. (Although, for the record, I should mention that Edelman owns JCPR.)


Anyway, this all happened shortly after BIMA's 'Great Facebook Debate' which I tipped you off about earlier this year (more on that in November's column) in which a bunch of knowledgeable people on stage, and an even bigger bunch of knowledgeable people in the audience, debated the merits and otherwise of Facebook.


Facebook, for those unfamiliar with it, is a place to hang out, link up with friends, see what they're up to, find and join groups of like-minded souls, do silly things like 'poke' and 'super-poke'. Shame about the translation - think 'nudge' and you'll be close. Groups can be closed or open, public or not. And a ton of plug-ins allow you to do other things. Think 'long tail'. The majority are pointless to the majority of users. Vampire bite anyone? Gift of a toilet roll? Some are useful too.


You may be astonished to learn that businesses are taking to Facebook in droves. Whether they're enlightened or mad remains to be seen. But the BBC (is that a business?) and BT (that definitely is) have thousands of users.


Because Facebook was honed in the hothouse of the university (it started in Harvard) its focus was on facilitating relationships and friendships. It's mind-numbingly easy to use, compared with, say, a wiki. It is intensely social. But it's not just about online communication. In the end, it's a facilitation mechanism for personal contact with the people you really want to be with, whether they're at work or out there in the real world.


In the workplace, especially, the consummation of stimulating online hookups has to be physical meetings because that's where the real relationships form and the real work gets done.

Thursday, 18 October 2007

Forecast the library of the future

On my travels this week I had the pleasure of visiting Sheila Webber, Department of Information Studies at the University of Sheffield. Apart from discussing her two passions of information literacy and learning in Second Life (of which there will be more to come). I had the opportunity for a whistle-stop tour of their new Information Commons building. It’s impressive; light, airy and spacious, even though it’s packed full with learning resources. Significantly, a large amount of room is given over to hardware and apparently there is even a shower for scholars should brains start to overheat. The students, quite rightly, look like they are in their element.


All this ties in nicely with a recently launched initiative called The Really Modern Library. IWR columnist David Tebbutt touched on this briefly last month, in case you haven’t heard about it, the joint project is being developed between the Institute for the Future of the Book (if:book) and the Digital Library Federation. There are a series of meetings being held throughout October, in LA, London and New York. The team behind this are currently asking for ideas and comments on their blog.


Initially, the scheme hopes to open up the debate on mass digitisation of analogue works. This the project leaders believe, means addressing the tricky task of maintaining the preservation of analogue material but also respectfully utilising its potential in a digital universe.


By considering the prospective challenges and opportunities for imagining the library of the future, the project organisers want to nurture innovation and creativity. Up for consideration will be ideas on new interfaces, designs and models of library delivery and management of information. Such ideas could range from how networked collections will be accessed and used, to developing new tools and ways of approaching analogue work in a digitised world.


Ultimately how that will benefit libraries, publishers, academia and the arts is central to the debate.


The if:book blog goes into much greater detail. They want ideas from a broad range of professions adding good suggestions to the mix. As librarians, information professionals and the like, I’m sure you will have some pretty good thoughts of your own that you know should be heard. The blog is accessible here.

Wednesday, 17 October 2007

Cause and effect

This post comes courtesy of Metafilter. I could bash on about how, if I'm looking for something  interesting or  controversial when aimlessly browsing I go to Metafilter and not a search engine. I could draw a comparison between the effectiveness of Metafilter as a search engine for really cool stuff and the primacy of a certain search engine. Or how Metafilter does the job right first time most of the time, while the likes of BoingBoing et al show merely occasional flashes of brilliance when compared to the massively parallel user model of Mefi. But I won't, because they're all a bit tenuous, to be honest.
Instead, I'd like to point you toward a posting on Metafilter; if Google were optimised for Google. Click through the page, and it's possible to see how search engines have changed the physical appearance of the web. We're all aware to a certain extent of how external influences change the design and layout of sites, but I was stunned to see the sheer volume of cruft, crap and extra verbiage added to the page in the name of SEO.

Tuesday, 16 October 2007

Blackwell's boss resigns

René Olivieri, chief operating officer at Wiley-Blackwell, the academic book and journals publisher has resigned, reports The Bookseller.


Olivieri was ceo of Blackwell when the company merged with Wiley in a surprise move last November. Since the merger Olivieri has been heading up the transition team as chief operating officer, a role he has held since May.


He has had a long and illustrious career at the Oxford based publisher, starting out as a publisher in the 1980s, before becoming an editorial direct, deputy md, and managing director. The Bookseller reports he became ceo of Blackwell Science in 2000 and stepped into the role of Blackwell Publishing ceo a year later.

Monday, 15 October 2007

Time for Oracle to show its hand in ECM

Oracle's enterprise content management has been a long time a-coming but, with the Stellent acquisiiton done and a new release of Universal Records Management under the famous red logo, the pieces are finally falling into place.


I'm still a little puzzled over where Orac;e's organic efforts have ended up. The company was leaking plenty of information about a  move into ECM well before the Stellent announcement with the project originally dubbed Tsunami, but never made a big splash into the sector. The acquisition of Stellent was something of a surprise given how much work Oracle had done internally, and also because of the releatively small scale of the purchase.


Now, it's time for Oracle to front up in ECM and do a better job than it has done so far in explaining where it sits against the likes of EMC-Documentum, IBM-FileNet and Open Text. So far, the talk has been of baking in ECM into Oracle's Fusion middleware. Sage heads will doubtless nod along but to me this is as clear as gravy. Sure, it makes sense that Oracle's ECM tools hook up with other programs from the vendor but information managers need a better perspective on how Oracle will support the product on platforms that are rivals to Oracle in other sectors.


They also need to hear about product development plans, service and support, Oracle's view on emerging standards, and all the other components that make the ECM world go round.


For some time now, Oracle  has spent more time acquiring than explaining. Its desire for scale is understandable at a time when supplier rationalisation is on many IT departments' agendas but the remarkable merger-and-acquisition rip the firm has pursued needs to be backed up with a little more beef.


This is particulalry the case with ECM and not just because this is virgin turf for Larry Ellison's company. Some watchers will have you belive that the sector is just another bunch of code to be folded into the enterprise software broth. It's not. Those who work most closely with ECM tools are often not techies but people with long experience of archiving and librarian skills. These are people who value a close relationship with the supplier more closely than users of most other elements.


They don't need hand-holding or puppy love as some vendors seem to think, but they do want to feel they have the attention of their supplier. Oracle needs to recognise that if it is seriously seeking to conquer another enterprise kingdom.

Friday, 12 October 2007

Touchgraph link visualisation

Jon Collins, a Freeform Dynamics colleague, kindly introduced me to Touchgraph, a beautiful and practical way to discover the patterns hidden inside data sets. The data can be in local or remote databases, spreadsheets, XML files and other formats.


To give a taste of the functionality, Touchgraph has provided a number of useful and free services. You can rummage Amazon books, music and movies or present the results of a Google search. To illustrate the principles, I asked it to find the links between my top 50 Facebook friends:


Touchgraph


The size, shape and colour provide an at-a-glance interpretation of clusters, relationships and relative importance. The detail appears in a table on the left of the screen (not shown here) and you are provided with a number of selection, filtering and editing tools.


In the picture above, I selected Gapingvoid's Hugh Macleod to see what mutual 'friends' we have. My own set of friends is quite limited because I reject most requests for 'friendship'. Anyway, back to the picture. The green London and San Francisco blobs are networks. You just click on one to see who belongs to it.


A Google search is probably more interesting, not to mention less narcissistic. After deleting the obvious rubbish hits, you can see the patterns hidden in your search results. It's like clustering on steroids.


Here are the results for a Google search for '"online information conference" 2007'. (Plug. Plug.)


Touchgraph2


The isolated pink cluster is 'marketing opportunities' and the isolated mauve cluster is 'committee members'. I threw out a couple of irrelevant clusters - you're bound to get them with Google.


Enjoy. See you at the conference?

Thursday, 11 October 2007

Specialist publishers ride high at Frankfurt Book Fair

At a major international publishing event like the Frankfurt Book Fair the bright lights of trade publishing and all its household star names could easily drown out the academic and scientific publishers. But this has not been the case.


Talk at the event, in all circles, is about books and technology, in particular search and eBook readers. On both subjects the specialist publishers are leading the way and the trade publishers salute them.


Amazon and Sony were expected to steal the show with their eBook
readers, they are instead conspiquous in their absence, but that has
not stopped publishers and technology providers from talking about the
devices and their potential.



I was particularly interested in a conversation I had with sceintific,
technical and medical publishers WIley where they hinted that they and
other specialists may get involved in driving the adoption of eBook readers.
Could we see the eBook reader adopt a similar model to the mobile phone
where users sign up to a subscription service, content of a particular
kind in this case, and in return they get a sleek and sexy device? Its
certainly worked for the mobile industry, which now resembled the car
world with its emphasis on styling and marketing.



But such a move could also be a blind alley, as one expert said to me,
these devices don't support the interlinking and interactivity that
content users are currently enjoying with the web.


During the fair Google, Ingram Digital Group and Amazon have all used the scientific and academic publishers as case study beacons for just what can be done with books on the web.


Geographically the Far East is the leading adopter as its markets radically develop according to Mark Carden, Ingram senior vp.


Perhaps Amazon spread rumours of a possible launch to see if there was real interest, well if the level of conversation we've heard is anything to go by, the eBook reader is in demand.