Hitachi, Ltd. today announced that it has developed a basic artificial intelligence (AI) technology that analyzes huge volumes of Japanese text data on issues that are subject to debate, and presents in Japanese both affirmative and negative opinions on those issues together with reasons and grounds.
In this research, Hitachi applied deep learning (*) to the process of distinguishing sentences representing reasons and grounds for opinions, eliminating the need for a dedicated program to be prepared for each language and thus enabling the creation of a general-purpose system analyzing text data in any language. Previously, Hitachi developed a basic AI technology which analyzed huge volumes of English text data and presented opinions in English.(**) This time, Hitachi incorporated this technology into a new AI technology for the Japanese language to meet the needs of Japanese enterprises.
Today, the social landscape changes rapidly and customer needs are becoming increasingly diversified. Companies are expected to continuously create new services and values. Further, driven by recent advancements in information & telecommunication and analytics technologies, interest is growing in technology that can extract valuable insight from big data which is generated on a daily basis.
Hitachi has been developing a basic AI technology that analyzes huge volumes of English text data and presents opinions in English to help enterprises make business decisions. The original technology required rules of grammar specific to the English language to be programmed, to extract sentences representing reasons and grounds for opinions. This process represented a hurdle in applying system to Japanese or any other language as it required dedicated programs correlated to the linguistic rules of the target language.
By applying deep learning, this issue was eliminated thus enabling the new technology to recognize sentences that have high probability of being reasons and grounds without relying on linguistic rules. More specifically, the AI system is presented with sentences which represent reasons and grounds extracted from thousands of articles. Learning from the rules and patterns, the system becomes discriminating of sentences which represent reasons and grounds in new articles. Hitachi added an attention mechanism” which support deep learning to estimate which words and phrases are worthy of attention in texts like news articles and research reports. The “attention mechanism” helps the system to grasp the points that require attention, including words and phrases related to topics and values. This method enables the system to distinguish sentences which have a high probability of being reasons and grounds from text data in any language.
The technology developed will be core technology in achieving a multi-lingual AI system capable of offering opinion. Hitachi will pursue further research to realize AI systems supporting business decision making by enterprises worldwide.
Details of the technology developed are as given below.
(1) Created “Value Dictionary” as a standard for identifying reasons and grounds for opinions
When giving reasons or grounds for opinions on a question that is subject to debate, it is assumed that people use their own respective viewpoints. Hitachi focused on values such as health, economics and public safety, which are considered important to people and communities, and created a “Value Dictionary” that systematically organizes those values based on a database*2 – a database that records affirmative and negative opinions regarding a large number of discussion topics.
Specifically, a list of values that serve as a basis of decision making by people or communities, and the system extracts words demonstrating a strong relationship to the values based on the frequency of use in the database, designating those words either as “positive” or “negative” in relation to those values. Furthermore, the values and relevant words were systematically arranged by assigning a score according to “importance” based on the frequency of use. For example, in the case of the value “Health,” the relations with words, such as “exercise” which is positive, and “disease” and “obesity” which are negative, were systematically arranged.
(2) Metadata*3 is created by identifying correlations between issues and their values from huge volumes of text data
The system identifies the types of values encompassed in recorded issues, from among the various sentences used in large volumes of news articles, and creates database expressing whether those issues have positive or negative effects on those values. For example, from an article stating that “Noise is harmful to health,” it is determined that the issue of “noise” has the negative effect of suppressing the value “Health,” and this information is managed as database. Using this method, the system created approximately 250 million metadata (issue – value correlation data) from around 9.7 million news articles.
The system uses this huge volume of metadata as well as the Value Dictionary outlined in (1) above to select multiple values with strong correlations with a given topic from among the many news articles. By searching for sentences in all of the news articles that contain one of these values, the system extracts sentences that could potentially serve as reasons or grounds for agreement or disagreement with the topic in question.
(3) Calculated reliability of the extracted sentences
The sentences extracted using the Value Dictionary (1) and the Metadata (2) are scored based on the source of the quote, the numerical evidence and the rhetorical expressions in order to estimate whether the sentences have a strong correlation with the specified topic and value. By processing all of the sentences that could potentially serve as reasons or grounds for opinions, and evaluating scores, it is possible to select and present reliable grounds.
(4) Constructed architecture*4 to realize asynchronous distributed processing of multiple algorithms
In order to increase processing speed and present responses within a designated time period, Hitachi constructed an architecture to realize asynchronous distributed processing of multiple algorithms in the various processes, from the analysis of the main topic to the selection of values, the article search and the presentation of reasons and grounds for opinions. This architecture executes parallel distributed processing of algorithms while at the same time executing asynchronous processing to the next process, in order to extract the desired grounds within the specified period of time.
This research achievement will be presented at the 30th National Convention of the Japanese Society for Artificial Intelligence (JSAI 2016) to be held in Kitakyushu-shi, Japan from 6-9 June 2016.
*1Hitachi, Ltd. news release: “Hitachi India Developed a Technology to Extract Precisely Designated Information from Electronic Medical Records”; Published September 17, 2014
*2Use of “Debatabase”, a huge database that records affirmative and negative opinions regarding topics offered by International Debate Education Association.
*3Database that is arranged as “metadata” created by identifying correlations between issues and their values
*4Basic and conceptual design of the structure of the information system
(*) Deep Learning: A neural network machine learning model based on the mechanism of nerve cells. The structure of a neural network is comprised of 3 layers: an input layer, an intermediate layer and an output layer. In Deep Learning, the intermediate layer is increased to enable the expression of even more complex models than previously possible, achieving higher recognition rates in the field of voice and image recognition.
(**) Hitachi News Release on July 22 2015: “Hitachi Developed Basic Artificial Intelligence Technology That Enables Logical Dialogue” (link)