COntent Inside :
Text documents often embed data that is structured in nature, and we can expose this structured data using information extraction technology. By processing a text database with information extraction systems, we can materialize a variety of structured “relations,” over which we can then issue regular SQL queries. A key challenge to process SQL queries in this text-based scenario is efficiency: information extraction is time- consuming, so query processing strategies should minimize the number of documents that they process. Real-world applications frequently rely on the information in large collections of text documents such as news articles, reports, and email messages. Text often embeds valuable structured data, such as who recommends selling which stocks, who has been hired by which corporation, or the number of people affected by a disease outbreak. In this paper, we consider the problem of effectively answering SQL queries over a collection of text documents. n the rest of the paper, we elaborate on our solution to process SQL queries over text databases, called SQOUT (for “SQL queries over unstructured text databases”). Specifically,our contributions are as follows:•We establish that it is feasible to execute SQL queriesover text on-the-fly, with IE systems (Section III).•We show how IE systems, document retrieval strategies,and data cleaning operators—components often studiedin isolation in the past—can be seamlessly integrated to form a space of execution plans for SQL queries over text databases (Section III). •We develop a cost model that exposes the tradeoff between efficiency and result quality, and enables users to flexibly adjust their preferences.
Tags : information extraction technology, information extraction systems, optimizing sql queries, execution plans, disease outbreak, text databases, retrieval strategies, unstructured text, document retrieval, text documents, text database, tradeoff, email messages, world applications, isolation
If you see unrelated pdf files with the description or copyrighted material published, please report to us, we'll correct/delete it it as soon as possible.NONE OF THOSE MATERIALS ARE HOSTED IN THIS SERVER NOR UPLOADED BY ME IN SOMEONE'S SERVERS.  Read our DISCLAIMER for more detail.
We are neither affiliated with authors and brands nor responsible for its content and change of content.
Information contained herein is provided "as is" without warranty of any kind, either expressed or implied, including any warranty of merchantability or fitness for a particular purpose. In no event shall ANYONE be held liable for any loss of profit, special, incidental, consequential, or other similar claims.
