Gerard A. "Gerry" Salton (8 March 1927 – 28 August 1995) was a professor of Computer Science at Cornell University. Salton was perhaps the leading computer scientist working in the field of information retrieval during his time, and "the father of Information Retrieval".[2] His group at Cornell developed the SMART Information Retrieval System, which he initiated when he was at Harvard. It was the first system to use the now popular vector space model for information retrieval.
Gerard Salton | |
---|---|
Born | Gerhard Anton Sahlmann March 8, 1927 Nuremberg, Germany |
Died | August 28, 1995 Ithaca, New York, US | (aged 68)
Education | Brooklyn College Harvard University |
Known for | the father of Information Retrieval[2] Gerard Salton Award |
Scientific career | |
Fields | information retrieval |
Institutions | Cornell University |
Thesis | An automatic data processing system for public utility revenue accounting (1958) |
Doctoral advisor | Howard Aiken |
Doctoral students |
Education and career
editSalton was born Gerhard Anton Sahlmann on in Nuremberg, Germany. He came to the United States in 1947 and was naturalized in 1952. He received a Bachelor's (1950) and Master's (1952) degree in mathematics from Brooklyn College, and a Ph.D. from Harvard in applied mathematics in 1958, the last of Howard Aiken's doctoral students, and taught there until 1965, when he joined Cornell University and co-founded its department of Computer Science.
Salton was perhaps most well known for developing the now widely used vector space model for Information Retrieval.[3] In this model, both documents and queries are represented as vectors of term counts, and the similarity between a document and a query is given by the cosine between the term vector and the document vector. In this paper, he also introduced TF-IDF, or term-frequency-inverse-document frequency, a model in which the score of a term in a document is the ratio of the number of terms in that document divided by the frequency of the number of documents in which that term occurs. (The concept of inverse document frequency, a measure of specificity, had been introduced in 1972 by Karen Sparck-Jones.[4]) Later in life, he became interested in automatic text summarization and analysis,[5] as well as automatic hypertext generation.[6] He published over 150 research articles and 5 books during his life.
Honors and awards
editSalton was editor-in-chief of the Communications of the ACM and the Journal of the ACM, and chaired Special Interest Group on Information Retrieval (SIGIR). He was an associate editor of the ACM Transactions on Information Systems. He was an ACM Fellow (elected 1995),[7] received the Award of Merit from the American Society for Information Science (1989), and was the first recipient of the SIGIR Award for outstanding contributions to study of Information Retrieval (1983) -- now called the Gerard Salton Award.
Bibliography
edit- Salton, Automatic Information Organization and Retrieval, 1968.
- Gerard Salton (1975). A Theory of Indexing. Society for Industrial and Applied Mathematics. p. 56. ISBN 9780898710151.
- --- and Michael J. McGill, Introduction to modern Information Retrieval, 1983. ISBN 0-07-054484-0
- Gerard Salton (1989). Automatic Text Processing. Addison-Wesley Publishing Company. p. 530. ISBN 978-0-201-12227-5.
- Gerard Salton at DBLP Bibliography Server
- G. Salton, A. Wong, and C. S. Yang (1975), "A Vector Space Model for Automatic Indexing," Communications of the ACM, vol. 18, nr. 11, pages 613–620. (Article in which a vector space model was presented)
- G. Salton. (1980). 'Toward a dynamic library." In F. Wilfrid Lancaster, ed.The Role of the Library in an Electronic Society: Clinic on Library Applications of Data Processing. Urbana-Champaign: University of Illinois Graduate School of Library Science.
See also
editReferences
edit- ^ Alla, James. Automatic Hypertext Construction. Cornell University. Retrieved 3 December 2023.
- ^ a b "The father of Information Retrieval" (PDF). cs.cornell.edu. Retrieved 10 March 2015.
a founding member of the department and the father of Information Retrieval.
- ^ Salton, G.; Wong, A.; Yang, C. S. (1975). "A vector space model for automatic indexing". Communications of the ACM. 18 (11): 613. doi:10.1145/361219.361220. hdl:1813/6057. S2CID 6473756.
- ^ Spärck Jones, K. (1972). "A Statistical Interpretation of Term Specificity and Its Application in Retrieval". Journal of Documentation. 28: 11–21. CiteSeerX 10.1.1.115.8343. doi:10.1108/eb026526. S2CID 2996187.
- ^ Salton, G.; Allan, J.; Buckley, C.; Singhal, A. (1994). "Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts". Science. 264 (5164): 1421–1426. Bibcode:1994Sci...264.1421S. doi:10.1126/science.264.5164.1421. PMID 17838425. S2CID 32296317.
- ^ "Gerard Salton". Cs.cornell.edu. Retrieved 2013-09-14.
- ^ "Gerard Salton ACM Fellows 1995". acm.org. Retrieved 10 March 2015.
contributions over 30 years to information organization and retrieval
External links
edit- In Memoriam
- Fractals of Change: Search Down Memory Lane
- "The Most Influential Paper Gerard Salton Never Wrote." Dubin D. This 2004 Library Trends paper (2004;52(4):748-764) by David Dubin serves as a historical review of the metamorphosis of the term discrimination value model (TDV) into the vector space model as an information retrieval model (VSM as an IR model). This paper calls into question what the Information Retrieval research community believed Salton's vector space model was originally intended to model. What much later became an information retrieval model was originally a data-centric mathematical–computational model used as an explanatory device. In addition, Dubin's paper points out that a 1975 Salton paper oft cited does not exist but is probably a combination of two other papers, neither of which actually refers to the VSM as an IR model.