This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.
Search
Skip to Search Results
Filter
Subject / Keyword
Languages
Supervisors
Author / Creator / Contributor
Year
Collections
Item type
Departments
-
Spring 2024
Bengali and Hind are two widely spoken yet low-resource languages. The state-of-the-art in modeling such languages uses BERT and the Wordpiece tokenizer. We observed that the Wordpiece tokenizer often breaks words into meaningless tokens, failing to separate roots from affixes. Moreover,...
1 - 1 of 1