Analysis of Thai dictionaries
In this post, we are looking at the size of various dictionaries and considering overlaps and differences.
In this post, we are looking at the size of various dictionaries and considering overlaps and differences.
We processed a Sept. 2025 dump of Thai wikipedia. The purpose was to produce a frequency list based on a relatively neutral corpus. Throughout this blog, the resulting frequency list will be referred to as the 'thwiki' list. 500,000 articles, north of 150+ million words/tokens. We processed it so you don't have to.
This strategy game style of hexagon map highlight the space occupied by the frequency list in the overall dictionary space.