Information World Review: November 2007

Thursday, 29 November 2007

Alfresco securely binds Facebook to ECM

You hear about organisations such as BT and the BBC adopting Facebook as the place to hang out and connect. A friend in the BBC told me they were all "addicted" to Facebook. Perhaps she should be told not to use that particular word at her next performance appraisal.

JP Rangaswami, MD of BT Design, is hugely in favour of Facebook because it creates a formality and permanence around conversations which were once the province of the water cooler. He can see what's really going on rather than have to believe what the org chart tells him. He also likes the idea that the infrastructure to run Facebook is external to BT and therefore someone else's problem.

As you know, many other companies are terrified of letting their staff loose on such social networks and actually ban them, despite the fact that many staff are familiar with them and use them elsewhere in their lives.

Software vendors either want to create their own equivalent in order to keep control or they reluctantly allow Facebook into a a sidepanel of their main applications. They probably don't want to give too much functionality away in case it undermines their business model. I wouldn't like to hazard a guess at what Microsoft is up to with its shareholding in Facebook.

But then along comes enterprise content management company Alfresco with the idea that it will not only accept Facebook, it will cheerfully integrate it into the company's repertoire. It allows registered users to publish and share documents and other information in a controlled, secure and auditable environment.

In case you've not heard of Alfresco, it started a few years ago with the intention of being a free Documentum but five times faster and ten times cheaper. The apparent conflict between 'free' and 'cheaper' is that anyone can download and run the software for free, but if they want support they can jolly well pay for it.

Since then, it has fully embraced the social networking world, blurring the boundaries between front office and back and putting content production and consumption into the hands of the many rather than the specialised few.

Company founder John Newton is a panellist at the Online Information conference next Thursday at 11:30. The debate will be about the death of proprietary content management. Wishful thinking? Or are these guys onto something?

Our tax levels cause disasters like HMRC

I was meant to be going to the House of Lords tonight. No I haven't spent the missing IWR marketing budget on a Labour party donation and offer of a peerage from Tony Blair. Tonight's rare opportunity to entered the hallowed chambers of the Lords was for the launch of Information Matters, a guide to good information management practise.

Obviously this has become a bit of a hot potato subject for the powers of Whitehall and I was not totally surprised to hear that the event has been "postponed", I am though disappointed, now I really will have donate money to some political party that will change its policies from day to day to suit its sponsors!

But cynical disbelief in political parties aside, the debacle at HMRC is not an opportunity to clobber the current Labour government, they can do that on their own. This now needs to be a debate about the quality of service we desire. The mistakes that took place at HMRC happened because of poor policy and in all likelihood, a demotivated and under appreciated and underpaid staff. These factors in any organisation will lead to a disaster.

Sadly as a nation we are demanding a John Lewis service, yet only prepared to pay a Tesco budget brand price for it. Our government and political parties fear spending public money, or worse, the public and the Daily Mail discovering that public money has been spent. Yet cuts in budgets and over stretched departments have led to this scenario and could lead to more.

It is ridiculous that a country as rich as the UK that is experiencing unparallelled levels of growth is trying to run its infrastructure, which after all is what our civil service is, on a shoestring. We have politicians tempting us with tax cuts, yet clearly they cannot balance the books with the revenue they have, how will public information be well managed and secured in a state that has even less revenue coming in?

The awful mess at the HMRC needs to spark a debate about how we want our nation to operate. Groups and parts of the media are quick to call for changes to immigration levels, but lets have a debate about the quality of our services, all of them, whether its schools and hospitals to departments looking after taxation or defence. We cannot lower taxes when our troops are being put at risk in Iraq to secure oil in ill equipped vehicles and our civil service is making basic mistakes with valuable data.

It may not be a popular move, but as a European nation that expects its authorities to provide child benefit, shouldn't we at least pay a proper level of taxation to meet those expectations?

Wednesday, 28 November 2007

Exploitation 2.0

I got a smashing email the other day from a fellow Flickr user. Apparently, they'd shortlisted a picture of mine. How exciting.

Well, turns out, not that exciting. The Schmap shortlist was for a so-so picture taken in Brighton to be published in their online guide to Brighton. So far, so good. Unfortunately, I wouldn't be paid, and Schmap, and I'm presuming its owners, get perpetual worldwide rights to the image. Free. If, like me, you love free stuff, that's great news. Except when you're giving stuff - free - to a company that will make money from it. Because although Schmaps are free at the point of consumption, the company makes money by selling advertising off the back of them.

Of course, to look at the web, this is great; Schmaps has clearly got its messaging spot on, and there are tons of Flickr users who think that being published - albeit without being paid for their work - is about as exciting as it gets. Some professional photographers are particularly excited, of course.

Diluting commitment to Open Access (OA)

At the beginning of the month the World Health Organisation (WHO), Intergovernmental Working Group on Public Health, Innovation and Intellectual Property (IGWG2) convened to discuss the issues of Open Access (OA) and reach a global consensus for developing a strategy. What emerged was a weakening on the language requiring scientific publishers to comply with an OA model.

Manon Anne Ress on the Knowledge Ecology International blog details how the draft document on global strategy, penned at the end of summer, originally stated to “promote public access to the results of government funded research, through requirements that all investigators funded by governments submit to an open access database an electronic version of their final, peer-reviewed manuscripts”

By the time the IGWG2 met again in November the word “requirements” had been replaced in the document with “strongly encouraging”.

OA champions and commentators questioned how this would affect OA and would “encouragement” even if it was “strong” actually achieve anything from publishers? Significantly, they were questioning why the change had happened and particular who had made the call.

Peter Suber, over at Open Access News asked his readers “which national delegations inserted the strong language in the first draft and which wanted to weaken it in the new draft?”

I haven’t seen any follow-ups that specifically name names there, but as Ress reports in her post; “there was some opposition to the “requirement” language by some European countries”. It’s not unfair to say with the exception of CAS and Wiley the scientific publishing world is heavily dominated by European firms.

Most of these companies are attending our sister show Online Information next week and I’ll be sure to ask that question when I catch up with them then. I hope someone is happy to elaborate on this. After all, is there a reason to be anything but open about this?

Monday, 26 November 2007

SaaS might not fit enterprise search

The rise and rise of software as a service has been such a mantra in the IT media over the last few years that it comes as something of a shock to see the SaaS model actually on the wane in enterprise search. Nevertheless, a recent report by CMS Watch says it plainly: “[the SaaS] model has been a hot topic recently [but] the SaaS model for enterprise search is on the decline”. So, what’s going on here?

CMS Watch itself lists three possible reasons: the preponderance of web-only search in SaaS offerings; the popularity and ease-of-use afforded by appliances; and the competition-squishing presence of Google in the sector.

Let me suggest two more: the fact that free is a compelling price, and the notion that SaaS might not be all things to all men.

Companies looking for a search service today will inevitably be attracted to freebie tasters, especially when the companies offering them -– Microsoft and IBM -- are as big as they come. As discussed earlier, these are highly attractive inducements that offer familiar environments to try out, and a solid upgrade path for those who want to carry on afterwards.

Second, it’s time to admit that SaaS has no Midas effect, except perhaps on marketers. The on-demand model has had a revolutionary effect on customer relationship management and sales force automation, and it is changing the way human resources operates, for example in measuring employee performance. But there are many, many other areas where it has had little or no effect. Even in the much-hyped area of productivity applications where Google and various startups have generated scads of coverage, there has been close to no impact on the hegemony of Microsoft Office, for example.

SaaS is a hugely important trend but privacy concerns, the need to delve into far-flung corners of the enterprise and ancient applications, and sundry other factors mean that search is unlikely to be a happy hunting ground for the model in the immediate future at least.

Thursday, 22 November 2007

The Jimmy Wales keynote

Jimmy "Jimbo" Wales is the keynote speaker on the first day of the Online Information conference.

He comes across as a genuine and thoughtful person with a huge commitment to open source, transparency and community involvement in projects. Wikipedia was the first result of his enthusiasms.

His keynote is entitled "Web 2.0 in action:free culture and community on the move". This suggests that he will be forward-looking and concentrate on his current long-term Wikia or Wikia Search projects (blogged here in January) rather than backward-looking and talking about Wikipedia.

If you are used to the traditional approach to computing, you'd expect a destination to be defined and the route to that destination mapped out with checkpoints along the way. Be prepared for a shock.

The approach that Wikia Search takes is to define a set of principles and the general structure of the project. This then acts as a magnet to the sort of people who are interested. They form communities depending on their specialisations and, at some point downstream, the great mass of the general public get involved, using the tools developed by the initial community.

Where wikipedia involves the world in contributing original material, the Wikia project is concerned with clothing existing information with value. The theory is that this will help refine search results and, partly through complete transparency and partly through community influence, be very difficult to game.

Just because Jimmy Wales made a great name for himself with Wikipedia doesn't mean that he can succeed again with search. But he's taking a pragmatic approach by doing the spidering and indexing just like the other engines but then using humans to refine the results by thumbs ups, thumbs downs and other more sophisticated assessments.

The project will take time to evolve and it's possible it will challenge the existing search giants in the same way that wikipedia has become a port of call for millions more users than traditional encyclopaedias. Who knows? Even Wales doesn't. He certainly never talks that way. But his thoughtful exploration of the issues around community development and participation should make for interesting and challenging listening.

See you there?

A sorry state of affairs

This country has seen better weeks than the last seven days. I’m not a massive football fan, but it was still disappointing to see none of the teams from the UK go through to the European Championships last night; oil meanwhile is nearing, if it hasn’t already broken, $100 a barrel and then there has been this week’s incredible mistake at the HM Revenue and Customs (HMRC) with your valuable information languishing somewhere (with someone?)

The shoulder of blame has been shared between a junior member of staff, who apparently sent the disks and HMRC boss Paul Gray, who quit last week when it became evident about the scale of this problem. Chancellor’s previous (PM Gordon Brown) and current (Alistair Darling) are pointing out there is no evidence that it has ended up in criminal hands, yet.

According to the BBC, the official line is that the information is "likely to still be on government property".

Reassuring.

People are quite rightly perplexed and angry as to why this could have happened. As information professionals I imagine you probably more than most. Understanding the complexities and processes that are involved and the safeguards that should be in place with information of this kind falls within your sphere of expertise. In some cases as are organisations like the HMRC. Seeing the incidents around this sorry tale unwind, must leave you shaking your heads in despair.

Anyone feel the information profession side has been let down, never mind the population?

Tuesday, 20 November 2007

Bodleian Repository Plans scuppered

Not great news this week for the Bodleian Library which has had its plans to build a new book depository rejected.

Back in September I originally blogged on how the eminent, ancient and utterly congested library got the thumbs up to build a new state of the art site near the city centre. It is intended to house 8 million books and will relive pressure on the library currently operating at 130% capacity and adding an additional 5,000 books per week.

The reasons for rejection from the local council were based on concerns of how the new repository would affect the city’s attractive skyline, the second that the site was on a flood plain.

Dr Sarah Thomas, Bodleian chief said “We worked with the Environment Agency, English Heritage and the city, and ultimately a solution will present itself”. Local paper the Oxford Mail also reported that Thomas said it was too early to consider an appeal or alternative locations for the new building.

One thing that is not going to go away is the urgent need for a safe, secure, space. How long will it take until the library’s seams start to burst?

Monday, 19 November 2007

Facebook: the new go-to platform for ECM?

There's a famous Bob Dylan press conference of the 1960s where a hapless journalist asks the young singer to define his music "for people like me who are well over 40". Dylan, for it is he, answers that it can defined as music "for people who are well under 40". As a member of the fifth-decade club myself, I can empathise with the old hack -- my ancient critical faculties don't stretch to understanding the Facebook phenomenon.

What I can understand is that Facebook and other popular social networks have tremendous reach, and, therefore, offer tremendous opportunities and, by logical extension, carry an equal degree of risk. Enter Alfresco and its announcement that it is integrating with Facebook. I'm not too sure that this wasn't in part a clever PR stunt that exploits the fascination with the website du jour but it makes complete sense for ECM firms to be applying their management tools to content that is exposed on sites like Facebook.

As even the most conservative companies recognise that exposing content to blogs, wikis, podcasts social networks and other formats will be a necessary part of their futures, that content will need to be managed. The ECM system should become the preferred alternative to piecemeal alternatives to managing content and companies that neglect to protect that content will be in trouble.

Having said that, I'm not convinced that Forrester's Kyle McNabb is right to suggest that this is the end of ECM as we know it. Large companies will be among the last to adopt the latest social media, but that is no excuse for ECM firms not to build in necessary controls ahead of demand.

Thursday, 15 November 2007

Cut the spin, save the world

I wonder if our lords and masters (our servants really, although that's difficult to believe) ever consider the environmental consequences of their decisions? Take the national identity card. Will the storage drives need to rotate perpetually in case anyone decides to check us out? Or could the forces of law and order be happy to wait while a drive is fired up and rummaged? If the drives are running continuously, has anyone worked out how much energy would be needed to run them, the computers that access them and the systems needed to cool them?

My guess is that the people who conceive these surveillance projects do not bother themselves with such matters. Yet, even if we're not yet running out of energy, it gets more expensive by the day and, of course, most of it contributes to the carbonisation of the planet. Surely any politician worth their salt would be careful before burdening the atmosphere with more CO2?

Which brings me to the British Library and its Microsoft-sponsored digitisation project. It's been worrying the hell out of me. I've been thinking of all those computers and disk drives sustaining substantial quantities of material that's going to be looked at only rarely. It makes no sense. But then offline tape storage doesn't make sense either.

Fortunately, a company called Nexsan has provided the Library with an answer. It's invented a MAID, a Massive Array of Idle Disks. They sit around quietly stationary until they're woken up by a request for information. This approach, according to Nexsan, cuts energy use by 96 percent. It gives an example of a conventional fibre channel storage device which consumes 187KW of energy per petabyte, whether it's being accessed or not. Its own Nexsan Assureon system in Level 3 AutoMAID idle mode consumes just 6KW.

It makes you think, doesn't it? Especially if your organisation has massive amounts of 'Just in case' storage.

Information professionals guiding you to the best bits of the blogosphere - Lorcan Dempsey

Lorcan Dempsey has worked for JISC and libraries on both sides of the Irish Sea and the Atlantic. As a member of the National Information Standards Organisation, his blog on networked information and digital libraries is well followed.

Q Who are you?
A I work in Dublin, Ohio, was born in Dublin, Ireland, and spent a long time in between in the UK. I am lucky to have what I believe to be one of the most interesting jobs in the library world. I am responsible for the programmes and research area within OCLC (Online Computer Library Center). I also help shape OCLC strategic direction.

Q Where can we find your blog?
A http://orweblog.oclc.org

Q Describe your blog?
A I say that it is about “libraries, networks and services”. I suppose that over time it has become more general. At first it had more of a technical slant; now it ranges more widely. I tend to talk about how networks are reconfiguring library services and I have some recurrent threads. These include:

Making data work harder.
We invest a lot in bibliographic data and need to use it more imaginatively in our systems and services.
Moving to the network level.
No single website is the sole focus of a user’s attention. The network is the focus of attention. And a major part of our network use revolves around significant network-level services Amazon, Google, IMDB, and so on. These match supply and demand in efficient ways. The real message of Web 2.0 is the emergence of this pattern of service: data hubs with strong gravitational pull generated through network effects.
Being in the flow.
The focus of attention has shifted from website to workflow. The network is not so much about finding things as getting things done, and we have increasingly rich support for “networkflow”. We may construct our personal digital identities around services in the browser or on the network (RSS aggregators, social networking sites, bookmarks, etc), and we use prefabricated workflows (course management system, customer relationship management system, and so on).

Q How long have you been blogging?
A Almost four years.

Q What started you blogging?
A After I arrived in OCLC I tended to send out a lot of emails. A colleague suggested that a blog might be a better model.

Q Do you comment on other blogs and what is the value of it?
A The comments on some blogs seem more important than on others.

Q What are the blogs in your sector that you trust?
A I keep a wide range of feeds in my aggregator and will focus on different ones from time to time. Again, I tend to be more interested in “voice” or those from whom I can learn something. From a library point of view, I look at Caveat Lector (http://cavlec.yarinareth.net) and ACRLog (www.acrlblog.org).

Alma Swan’s new blog, OptimalScholarship (http://optimalscholarship.blogspot.com) and eFoundations (http://efoundations.typepad.com) from Andy Powell and Pete Johnston, are informative and provocative. I find PlanetCode4Lib (http://planet.code4lib.org) an efficient and useful way of keeping up with a range of stuff.

Q What good things have happened to you that could only have happened because of your blogging?
A I have always contributed to the professional literature. But I find that blogging is quite liberating: it is much easier to write blog entries than longer pieces. It has made me write more quickly and to think about short communications.

Q Which blogs do you read just for fun?
A I look at John Naughton’s Memex 1.1 (http://memex.naughtons.org) and William Gibson’s blog (www.williamgibsonbooks.com/blog/blog.asp), and the pictures in YarnStorm (http://yarnstorm.blogs.com) make me smile.

Tuesday, 13 November 2007

BL backed with Gov bucks

It has been a pretty good week for the British Library (BL) having just won a renewed level of funding from the Department for Culture Media and Sport. The BL will receive a rise of 2.7% keeping it in line with inflation. There were fears originally that the Government’s Public Sector Spending Review may entail a reduction of funds to the library. Cuts would have seriously affected the level of service and provision the library would be able to offer. It’s good to hear that access to the reading rooms remains free, along with a variety of other services.

I imagine there has been a collective sigh of relief over at the nation’s world class library, not least from BL Chief Executive Lynne Brindley. The library’s CEO has previously said cuts would mean she would end up running “a second rate organisation”. Yet there is still the issue of ensuring the BL’s annual capital settlement remains adequate to continue with the BL’s massive national newspaper digitisation project.

Brindley also appears in an interview in this month’s Harvard Business Review. The esteemed title asks what lessons she has learnt during her tenure at the library so far, “I learned the importance of fitting communications to what the organisation wants and needs – otherwise you don’t get the buy-in” she says.

So far it’s a good thing she got the purse-string holders to buy-into her vision, let’s see if Brindley and the library’s supporters can get the continued funding they need.

Monday, 12 November 2007

EMC's lateral thinking pays off

The rise of enterprise content management over the last five years has seen the entry of giants into a segment once characterised by names only a specialist would have recognised.

Microsoft has made its play with an in-house approach that has delivered the hugely successful SharePoint. IBM has also done a lot of work behind the scenes with content management services, but admitted the need for more when it announced the acquisition of FileNet in 2006. Similarly, Oracle got a fair way down the line, tying in services with its database, but then acquired Stellent last year for its customers and deeper domain knowledge.

These were decent strategies that were characteristic of the seasoned companies that delivered them but perhaps the least convincing strategy came from EMC. The company had made its name as the Switzerland of storage, being an independent company that was not tied to servers in the way rivals IBM, HP and Sun were. EMC was pretty much a pure hardware company until it acquired Documentum in 2003, although it had signalled its intent by agreeing to buy storage software giant Legato Systems just months earlier.

EMC justified itself by saying storage needed intelligent software if companies were to automate the protection of files. Then, in a move that again puzzled many onlookers, EMC acquired VMware and said storage and server virtualisation needed to converge. Plenty of people scratched chins and wondered if EMC was imagining synergies that were invisible to the rest of us.

Today, with unstructured data continuing to grow at a bewildering speed, with compliance mandates showing no sign of letting up, with ECM and storage infrastructure walking in lockstep, and with VMware shaping up as the biggest hypergrowth company in technology since Google, nobody is criticising EMC’s strategy. Proof, if ever it were needed, that lateral thinking can work wonders.

Friday, 9 November 2007

Problems and Solutions? IRFS Day 2

After a storming start to yesterday’s Information Retrieval Facility Symposium (IRFS), enthusiasm was still running high this morning. The opening keynote for day two was hosted by Henk Tomas from IP Search Services; he did a great job outlining pretty much all the main issues facing information specialist and patent workers. His fellow speakers were hard-pressed not to duplicate what he had already covered. It is also a nice opportunity to give you an overview of the main challenges raised at the symposium.

Up for consideration was a thorough examination of why patent information is so important to both small and big business alike. Tomas explained that patent information can be used a means of keeping tabs on competitors, suppliers and emerging technological developments. It is also a way of hobbling others from utilising a technology to prevent advantage and can avoid a duplication of efforts or ‘reinventing the wheel’; furthermore, patents are part of a globally accepted legal system.

Tomas identified many of the big issues to address and conquer. My top three of those he mentioned are issues I have seen raised here more once.

• A massive rise in patent and non-patent literature in the last 30 years. Much of this is Asian in origin; the language differences and therefore difficulties are obvious. There is also a risk of drowning in information.

• A lack of standards in the patent world, particularly in terms of the point of information entry. A common database structure would also help for search purposes.

• Errors and inconsistencies in content sometimes made deliberately for competitive advantage.

Follow-up speaker was Minoo Philipp; she is the Patent Information Manager at chemical manufacturer Henkel and President of the Patent Documentation Group. She asked the audience, “Do we have a problem with patent searching? No, it’s finding the right information.’ ‘The problem is the structure available and also the errors” she added.

Philipp called for a global standardisation of how patent applications are made. It wasn’t something all the delegates agreed with, believing a technical solution was required instead. Philipp asked ‘wasn’t that treating the symptom rather than the disease?’

Considering the implications for standardisation could necessitate a change of each nation’s patent laws, that one solution may be a while coming.

The dark side of social networking

We spend a lot of time talking about the benefits of different social networking systems. We talk about the conversations that are enabled and the shortcuts to meaningful relationships, whether business or social. What we rarely, if ever, talk about is the dark side. The big brotherish side that can, if it wants to, track our activities in minute detail.

If big business is involved, and it is, you can be certain that this information is like gold. Of course it wants to track you. It pays very good money for the privilege of learning as much about you as possible. And a terrific way to do this is to know who you are then watch your behaviour: what websites you visit, how much time you spend on various activities, where you're connecting from, who you communicate with, whether you're a man or a woman and so on.

The instant you log in to a service - Facebook, Yahoo!, Microsoft, whatever, you're no longer anonymous. At a recent session with a company in this space, the first couple of hours were dedicated to how users could be exploited rather than served. I won't name names because, whatever the public face of these companies, the conversations behind closed doors are likely to be very similar.

But we're willing participants. These organisations provide a platform for communities to form and, because of our desire to connect, we share all manner of personal information. Not to the host, but to our online chums. Sadly, every time we contribute something or click the mouse, we freely, and unwittingly enable the host to refine our profiles and to deliver the advertisements most likely to appeal.

If we want to participate in online, public, social networking communities then it's best to assume we're regarded by many as victims rather than beneficiaries.

Thursday, 8 November 2007

Mind the Language Gap

One of this morning's IRFS Language Gap sessions asks the question of why we need a cross lingual patent retrieval system in an Asian language. Well for a start, consider that three of the top five patent filing countries hail from the region; Japan is first, followed by China in third and the Republic of Korea in fourth. The US and Europe take second and third places respectively.

If you operate in the patent world then sooner or later you will probably have to work with filings from Asian origins. The question is how, seeing as there are some fundamental structural differences between Western Latin-based languages and their Asian counterparts. Among numerous examples, there are for instance six varieties of expression for the colour red in Korean, how a word is spaced in Chinese can also heavily affect its meaning and translation to English considerably. When applying this to the ambiguous nature of legally constructed patent documents, the challenge is considerable.

Relying on just human translation is not an option, in part due to the sheer volume of documents constantly being filed as well as existing material.

Minah Kim from the Korean Institute of Patent Information has been explaining how their cross lingual retrieval system copes with the issues but also what still needs to be done.

She called for efforts to improve quality, such as a semantically based query expansion, whereby each word in an original search is expanded to a related search term such as boat to ship to vessel to water. Time spent on a query is also an issue that needs improvement, with the average amount spent on one document retrieval being 10 seconds; that can cause problems with a 200 page document, never mind the rest.

For the short term, the plan is to consistently upgrade the systems dictionary and establish the query expansion by the end of this year. In the longer term, Kim says there must be the development of a cross lingual retrieval system for Chinese, Japanese and Korean patent documents. Addressing the language translation issues with similar models is something that Western organisations can also only benefit from.

Information Retrieval Symposium opens

For the next two days, I will be covering the latest developments at Vienna’s Information Retrieval Facility Symposium (IRFS). It is a meeting of patent experts and business leaders and is designed to address the challenges that face those who operate in the emerging industry of patent information retrieval.

The overarching theme for the convention is one of patent experts meeting information scientists and opening up a meaningful dialogue. Hopefully there will be a merging of minds, finding common ground; you get the idea.

According to joint IRF Chairman Francisco Webber, there are many complex issues of transferring knowledge between science and business, what the IRFS hopes to identify is the shortcomings in the Intellectual Property (IP) world as well as the methods and potential solutions from the Information Retrieval (IR) sector.

Keith Van Rijsbergen, Webber’s co-Chairman and professor of Computing Science at the University of Glasgow, also talked about how tools for IR specialists are somewhat lacking. There are also issues that include how users of the future will utilise the next generation of patent searching technology. This could apply to organisations gaining a competitive advantage by monitoring others patent search audit trails. In parallel, the industry will see a rise of ‘naïve’ rather than ‘expert’ searchers over the next decade.

Of the many issues still to be put to the floor, it was discussed how government and business alike are not utilising available patent information to assist with their commercial interests. Open Access is also considered to be a central aspect of the innovation cycle of IP, IR and IRF

Of the other big topics to come will be the difficulties overcoming the language gap around the globe as well as patent search technology. More to follow…

A full analysis on the future of patent search technology will feature in December’s issue of IWR

Tuesday, 6 November 2007

Microsoft drops the search bomb

Some football fans critisicse David Beckham because "all he can do" is cross the ball. All Microsoft can do, in some pundits' books, is take market sectors that were once the domain of specialists charging pricey tariffs, and make them available to all. Today, Microsoft began to do that to enterprise search.

This might be looked back on as a rather momentous day in search as Microsoft's Search Server announcement is likely to change the rules of engagement in the field. The most notable points of the announcement that make it such a disruptive move are:

One, neither the freebie product nor the free product, Search Server Express, has a ceiling on number of searchable documents.

Two, neither the paid-for nor the freebie SKU requires a dedicated server.

Three, there are out-of-the-box connectors to FileNet, Documentum and Lotus Notes.

Four, this is a pure search move that does not lean on SharePoint, Microsoft's smash-hit entry into ECM.

Five, and this might well be the biggest factor, this is Microsoft, so the brand and ability to integrate with other key infrastructure will be huge plus factors for many buyers.

The net effect of the move will be to put pressure on Google's search appliances and IBM's OmniFind but Microsoft will not stop there and it is already talking about Office '14', the next major release, packing in high-end features. Autonomy made a very smart move in buying Zantaz, investing in video search and in other moves that have taken it away from basic enterprise search, which is showing every sign of being commoditised.

Google has done a sound job and IBM made a bold move in introducing its free version of OmniFind last year but the Microsoft manoeuvre is a major land grab. We still need to see the product and pricing details but if rivals are feeling afraid, we can understand their fear.

Monday, 5 November 2007

IWR at the Information Retrieval Facility Symposium 2007

Towards the end of the week I will be blogging live from the Information Retrieval Facility Symposium (IRFS). This year the meeting will be held in (what I am told) is the very fair city of Vienna.

Billed as a convention where “Science meets Business” the experts in attendance will attempt to hammer out how best to handle the vital retrieval of digital patent information as well as its storage.

The organisers say that as the field of Information Retrieval is at such an experimental phase, this will be the driving force behind the conference sessions.

The two-day event will be split into five sessions and a series of working groups. Kicking off on opening day will be the Data Quality session with speakers from GlaxoSmithKline and the Royal School of Library and Information Science.

Session 2 will focus on Language gaps in the Information Retrieval world. Particularly in relation to original Chinese and Korean documents

Session 3 is entitled Corpus Enrichment, or rather how technology can be utilised to recognise and extract the implicit information in a document. This can include the difference between preamble, a detailed description, claims made in a patent document or alternate examples.

Sessions 4 and 5 will examine the related tools available for information professionals and management and research respectively.

I will be covering the significant areas, developments and intellectual jousting that will be going on as soon as doors open on Thursday morning and close Friday evening.

* A full analysis on the future of patent search technology will feature in December’s issue of IWR.

Friday, 2 November 2007

Free concept search from Yahoo!

In the relentless game of public search engine leapfrog, Yahoo! may have just leapt into the lead. It has added some fresh intelligence to help the hesitant user.

No doubt other search companies will already be unpicking the Yahoo! offering to see whether they can improve on it. And it's highly likely they can. But, for the moment, Yahoo! is the benchmark with its 'Explore concepts' extension.

Yahoo! senses when a user hesitates while typing a search query and pops up auto-complete suggestions. This, of course, is not a new idea. The new bit comes when you get to the results page. If the results aren't what you expected, you can click on a little arrow to drop down a panel containing the autocomplete suggestions and an 'Explore concepts' section to the right:

In the above example 'information world' produced, not surprisingly, over a billion results. The autocomplete would have done the job for you and me, but the concepts on the right are designed to help the user who's thrashing around a bit.

As you click on each concept, new search results appear along with the concepts which relate to the new search expression.

I tried 'electron spin resonance', about which I know little. After clicking through 'free radicals' (thought I might hit a political pressure group), I was intrigued by 'magnetic moments'. Here's the first result.

Bear in mind that this functionality is not part of some hugely expensive enterprise search system, it's freely available to the general public.

If you're like me, you've regarded Yahoo! largely as an organisations that wants to push ads at you and keep you within its semi-walled garden. A bit of a turn off, especially in Europe apparently. But developments like this and, in a totally different context, Pipes suggest that Yahoo! has realised that life is not all about take, it's about give as well.

Thursday, 1 November 2007

Wiley goes on Safari

Global publisher John Wiley & Sons is not afraid of new technology and ventures, as I recently discovered in a meeting with them at the Frankfurt Book Fair. The Bookseller reports today that Wiley has now inked a deal with Safari Books Online, an increasingly important on demand reference platform.

Wiley will add its business and technology reference books to the Safari platform and see its content aligned with other leaders like Pearson, O'Reilly and the publishing arm of software giants Microsoft. Wiley will add its For Dummies books, which it acquired from web and magazine publishers IDG, and the Bible range of computer books.

This is an important deal. Reference books are still an amazing resource for users, and a method of information delivery and publishing that still has plenty of legs in it. Like all information resources though, it is a sector that has been threatened by amateur services like Wikipedia. Reference is clearly an information set very well suited to the web. Safari is a platform that offers a genuine alternative to Wikipedia. Because the content on Safari is from credible publishing companies that check the veracity of information, use knowledgeable experts and put a great deal of effort into the writing, editing and presentation of the information, it is more credible than Wikipedia. Wiley has increased the desirability of Safari and improved reference information on the web.