Thursday 8 November 2007

Mind the Language Gap

One of this morning's IRFS Language Gap sessions asks the question of why we need a cross lingual patent retrieval system in an Asian language. Well for a start, consider that three of the top five patent filing countries hail from the region; Japan is first, followed by China in third and the Republic of Korea in fourth. The US and Europe take second and third places respectively. 


If you operate in the patent world then sooner or later you will probably have to work with filings from Asian origins. The question is how, seeing as there are some fundamental structural differences between Western Latin-based languages and their Asian counterparts. Among numerous examples, there are for instance six varieties of expression for the colour red in Korean, how a word is spaced in Chinese can also heavily affect its meaning and translation to English considerably. When applying this to the ambiguous nature of legally constructed patent documents, the challenge is considerable.


Relying on just human translation is not an option, in part due to the sheer volume of documents constantly being filed as well as existing material.


Minah Kim from the Korean Institute of Patent Information has been explaining how their cross lingual retrieval system copes with the issues but also what still needs to be done.


She called for efforts to improve quality, such as a semantically based query expansion, whereby each word in an original search is expanded to a related search term such as boat to ship to vessel to water. Time spent on a query is also an issue that needs improvement, with the average amount spent on one document retrieval being 10 seconds; that can cause problems with a 200 page document, never mind the rest.


For the short term, the plan is to consistently upgrade the systems dictionary and establish the query expansion by the end of this year. In the longer term, Kim says there must be the development of a cross lingual retrieval system for Chinese, Japanese and Korean patent documents. Addressing the language translation issues with similar models is something that Western organisations can also only benefit from.

No comments:

Post a Comment