This project is proposed by Prof. Alan Bundy and Dr. John Levine from Edinburgh University as a PhD project for Miss Hazel Duncan.
The intention of the project is to data-mine proof corpuses using Variable Length Markov Models in order to identify patterns of tactics within the proof scripts. These patterns will be used by Genetic Programming techniques to form new tactics and hence guide proof search.
Proposal is available online. (also
The project is currently in the early stages, work is being performed with the theorem prover Isabelle to extract and format the proof scripts.
The theorem prover Mizar has a huge library of mathematical theorems, this will be used in the same way as Isabelle to allow a comparison between different provers.
Two programs have been tested as a basis for pattern discovery:Sparse Markov Transducers from Eleazar Eskin at the University of Columbia and the Teiresias algorithm from the IBM research group.
Work to date has invovled extracting the proof scripts from
Isabelle and use to find patterns using the above programs. A brief comparison between the two programs along
with some results is available.
One of the responses to the grant proposal mentione the possibility of assigning tactic steps to classes and looking for patterns of these classes. This work is in progress and is available here or via the 'Classes' button.
A postscript response to some queries given as a result of the grant proposal is available here or via the 'Queries' button.
An article was written in Mizar during a visit to Poland. A link to it will be included here when it becomes available.
The implementation for Isabelle (including the logics, their theories and the proof scripts), along with all the documentation can be found via the Isabelle home page.
The code for the Sparse Markov Transducers can be found via the SMT home page.
The Teiresias Text-word based Pattern Discovery tool along with all
downloads and documentation can be found on the Teiresias home page.
The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length. D. Ron, Y. Singer, N. Tishby, Machine Learning, 25, 1996.
Inducing Features of Random Fields, S. Della Pietra, V. Della Pietra, and J. Lafferty, IEEE Transactions on Pattern Analysis, 19(4), April 1997, pp. 380-393.
A Gentle Introduction to Iterative Scaling, Adam Berger.
Protein Family Classification Using Sparse Markov Transducers, Eskin, Eleazar, William Noble Grundy and Yoram Singer. Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology. August 20-23, 2000.
Pattern Discovery In Biological Sequences: The TEIRESIAS
Algorithm. Rigoutsos, I. and A. Floratos, Bioinformatics,
Vol.14, num 1. 1998.