A semiautomatic Intelligent method for converting to GSCCT those parts of Pharmaceutical patent abstracts which specify generic structure is reported in this paper and follows. Techniques of natural language processing (NLP) applied in a prototype system are discussed. This paper deal with the lexical isolation and categorization of tokens from the generic structure textual description. Templates for processing of both the variable and multiplier expressions, which predominate, have been identified; they provide the basis for further analysis. Rules for the isolation of tokens are discussed and illustrated. Some categories of tokens are identified by morphological analysis, while others are dealt with by dictionary lookup. The output is a list of tokens along with a number of associated semantic features which help at the processing stage discussed in the following paper.
Key words：Natural Language Processing, Lexical Analysis, Lexical Categorization, GSCCT notations