How Often to Update Content Summary of Databases

by Jennifer R
The articles talks about how not all databases are indexed by search engines, which can increase the time it takes to complete a query as the search engine would have to scan every individual piece of data instead of running through a shorter list of indices. Metasearchers are a way of retrieving information by querying multiple databases at once. When selecting databases, a metasearcher looks at content summaries which contain statistics that describe what is in the database. The authors point out that “database selection research has largely assumed that databases are static” and that the real-life databases frequently change, which brings up the importance of how to keep content summaries appropriately refreshed. They conducted a study of 152 databases, looking at how the content changed over 52 weeks. They also examined the content summaries for changes. For modeling changes, the authors decided on a field of statistics called survival analysis where,  ” using the Cox proportional hazards regression model [Cox 1972], [they showed] that database characteristics can be used to predict the pattern of change of the summaries”. The results indicate “the quality of the content summaries deteriorates over time as the underlying databases change” and that summaries of large databases tend to change faster than those of smaller databases.

I found this interesting because we talked about single queries to a single database in class, but not about queries to multiple databases at a time. I wondered how that worked and what criteria they use. We did discuss briefly about how often databases are updated and where frequent updates would be necessary. I can see the problem that poses for summarizing a database. I would like to know whether current databases still have this issue and what has been done to address it.

 

Source: Ipeirotis, P. G., Ntoulas, A., Cho, J. & Gravano, L. (2007) Modeling and Managing Changes in Text Databases. Transactions on Database Systems, 32(3). Retrieved May 27, 2012 from http://www.cs.columbia.edu/~gravano/Papers/2007/tods07a.pdf