Tokens and lemmas are quite classical concepts of NLP. Tokens are the smallest units of a text and are usually identified with words. So, if we say I write code, we have three tokens: I, write, and code. Depending on the granularity level, things can get more complex because sometimes tokens can be identified with single letters. We don’t consider the letters, just the words. Please note that even with words, tokenization can be challenging; for example, in the sentence I’m writing code, we have many tokens – three or four. In the first case, I’m is a unique token, while in the second case, we can consider I’m as two words, with two different tokens.