Poliqarp is an open source search engine designed to process text corpora, among others the National Corpus of Polish created at the Institute of Computer Science, Polish Academy of Sciences.[1][2]

Features

edit
  • Custom query language[3]
  • Two-level regular expressions:
    • operating at the level of characters in words
    • operating at the level of words in statements/paragraphs
  • Good performance
  • Compact corpus representation (compared to similar projects)
  • Portability across operating systems: Linux/BSD/Win32
  • Lack of portability across endianness (current release works only on little endian devices)

References

edit
  1. ^ "Poliqarp search engine for NKJP data". nkjp.pl. Retrieved 1 December 2020.
  2. ^ "Poliqarp 1.1". nlp.ipipan.waw.pl. Retrieved 1 December 2020.
  3. ^ Janus, Daniel; Przepiórkowski, Adam (25 June 2007). "Poliqarp: an open source corpus indexer and search engine with syntactic extensions". Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics: 85–88. Retrieved 1 December 2020.
edit