This article presents our approach in the elaboration of the system for processing Romanian unstructured text data. The project aims to elaborate the SoFTcrates tool, a software system for processing unstructured text data in order to create structured data output as computer linguistics resources. We described some mathematical aspects in text representation and presented some stages in unstructured text data processing. Also, the interface of the application is illustrated. In the future we will try to implement mechanisms of diversification of the founded words by means of derivation and WordNet semantic net. More over we will optimize the interface to have the possibility to find not only by a single word, but also by several words that the user consider more relevant to the text.

