Environment setting

10 posts

What is BM25f

BM25f is an extension of the BM25 scoring function, which is a part of the family of ranking functions used in information retrieval. BM25 itself is a modern alternative to the classic TF-IDF scheme, designed to rank documents based on their relevance to a given query. Here’s a breakdown of […]

What is TF-IDF

TF-IDF stands for Term Frequency-Inverse Document Frequency. It’s a numerical statistic used to indicate the importance of a word in a document relative to a collection of documents, often called a corpus. TF-IDF is commonly used in the field of information retrieval and text mining. Here’s a breakdown: Why is […]

About Scalding

Scalding is a Scala library. Scalding is easy to work with and reason about the data in distributed systems like Hadoop. It presents the data as a collection and allows to perform the computation on data in a matter that is similar to Scala API, so it appears to the […]

Code Musing

“I am sorry I have had to write you such a long letter, but I did not have time to write you a short one” Pascal, Blaise (1623 – 1662) – French philosopher and mathematician. At the age of 18 he invented the first calculating machine.   So I wonder why […]