Skip to content

Corpora

Thai wikipedia

We processed a Sept. 2025 dump of Thai wikipedia. The purpose was to produce a frequency list based on a relatively neutral corpus. Throughout this blog, the resulting frequency list will be referred to as the 'thwiki' list. 500,000 articles, north of 150+ million words/tokens. We processed it so you don't have to.