Roman Kagan

26 posts
Roman started working as a programmer as a teenager when he was hired to hack Prolog at a Minsk artificial intelligence lab. Roman was one of the first developers using Java to create web applications. Since 1991, Roman has been consulting for companies including Hewlett-Packard, EDS, GM, Ford, Chrysler, Fanuc Robotics, Comerica and Polk.

What is BM25f

BM25f is an extension of the BM25 scoring function, which is a part of the family of ranking functions used in information retrieval. BM25 itself is a modern alternative to the classic TF-IDF scheme, designed to rank documents based on their relevance to a given query. Here’s a breakdown of […]

What is TF-IDF

TF-IDF stands for Term Frequency-Inverse Document Frequency. It’s a numerical statistic used to indicate the importance of a word in a document relative to a collection of documents, often called a corpus. TF-IDF is commonly used in the field of information retrieval and text mining. Here’s a breakdown: Why is […]

Google Cloud Functions

Google Cloud Functions are serverless code functions that run without you having to manage or scale the underlying infrastructure. This makes building them really easy. So let’s build an example. Here’s normal NodeJS function with two parameters – request and response. The incoming requst is automatically parsed for JSON body […]

About Scalding

Scalding is a Scala library. Scalding is easy to work with and reason about the data in distributed systems like Hadoop. It presents the data as a collection and allows to perform the computation on data in a matter that is similar to Scala API, so it appears to the […]

Code Musing

“I am sorry I have had to write you such a long letter, but I did not have time to write you a short one” Pascal, Blaise (1623 – 1662) –¬†French philosopher and mathematician. At the age of 18 he invented the first calculating machine.   So I wonder why […]