药物化学专利中可变部分的智能化翻译——词法的分析和归类/A Intelligent Interpretation of the Text of Chemical

2019-03-02 05:30:49

paper processing discussed tokens GSCCT



本文和以后的文章提出了一种半自动的智能化翻译方法,将药物专利摘要中表示族性结构的文本部分转化为GSCCT形式语言。自然语言处理(Natural Language Processing NLP)技术在标准化系统中得到了研究和应用。本篇文章主要是讨论对药物专利中的族性结构文本描述进行词汇分离和标记归类。提出了用来处理可变部分表达和重复部分表达的模板,它是更进一步处理的基础。词汇分离的规则通过图表的形式进行了讨论。一部分标记归类是通过词法分析进行识别的,而另一部分是查阅字典来进行识别的。输出的结果是连串的具有特殊语义的标记符号,将在以后的文章中讨论。


 












外文摘要





 



A semiautomatic Intelligent method for converting to GSCCT those parts of Pharmaceutical patent abstracts which specify generic structure is reported in this paper and follows. Techniques of natural language processing (NLP) applied in a prototype system are discussed. This paper deal with the lexical isolation and categorization of tokens from the generic structure textual description. Templates for processing of both the variable and multiplier expressions, which predominate, have been identified; they provide the basis for further analysis. Rules for the isolation of tokens are discussed and illustrated. Some categories of tokens are identified by morphological analysis, while others are dealt with by dictionary lookup. The output is a list of tokens along with a number of associated semantic features which help at the processing stage discussed in the following paper.
Key words:Natural Language Processing, Lexical Analysis, Lexical Categorization, GSCCT notations