Information Retrieval

Course Title: COGS 4962/6964 CSCI 4969 – Information Retrieval (Fall semester)
Location and Time: M, R 12:00—1:50pm @ TBD
Instructor: Prof. Tomek Strzalkowski
Contact Information: tomek@rpi.edu

This course will discuss theory and practice of searching and retrieval of text and bibliographic information. Topics covered include automated indexing, statistical and linguistic models, text classification, Boolean, vector space, and probabilistic approaches to indexing, language models and dense continuous vector space models, query formulation and output ranking, information routing and filtering, topic detection and tracking, as well as measures of retrieval effectiveness, including relevance, utility, miss/false-alarm. Techniques for enhancing retrieval effectiveness including relevance feedback, query reformulation, thesauri, concept extraction, and automated summarization. Experimental retrieval approaches from Text Retrieval Conferences (TREC) and modern Internet search engines (Google, Yahoo!, etc.) as well as recent advances into automated question-answering methods will be discussed.

PREREQUISITES
There are no specific course prerequisites for taking this course; however, students are expected to have a solid background in the following:

  • Data structures and algorithms (this is optional for non-CS students)
  • Elementary linear algebra
  • Basic statistics and probability
  • Elements of logic and set theory.

LEARNING OBJECTIVES
The course is aimed at graduate and advanced undergraduate students in Computer Science, Information Science, Business and related disciplines. The course is intended to prepare CS students to design and evaluate information retrieval systems. The course also aims to give students a broad understanding of inner workings of automated information retrieval systems, and how such systems interact with users and affect their productivity. The learning objectives include:

  • Understand how information is represented in a compact form that allows rapid and accurate retrieval by content.
  • Understand how natural language and other free form documents can be efficiently converted into searchable form with minimal loss of content.
  • Understand various computational models that approximate behavior of information in unstructured form.
  • Understand basic concepts of retrieval accuracy and how to measure it. Understand the principles of experimental research.
  • Understand how users interact with information retrieval systems and how to maximize their satisfaction.
  • Learn an architecture of a generic information retrieval system and how to build one from scratch.
  • Learn about many applications including ad-hoc search, internet search, cross-language search, filtering and classification, question answering, and summarization.
  • Hands-on experience in developing a working IR system.