a-vs-an

Find the english language indeterminate article ("a" or "an") for a word. Based on real usage patterns extracted from the wikipedia text dump; can therefore even deal with tricky edge cases such as acronyms (FIAT vs. FAA, NASA vs. NSA) and odd symbols.

The implementations (C# and Javascript) in this project determine whether "a" or "an" should precede a word. They are efficient and accurate (using the method described in this stackoverflow response).

You can try the javascript implementation of this library online: A-vs-An.

The dataset used is based on the wikipedia-article-text dump of july 2014. Some additional preprocessing was done to remove as much wiki-markup as possible and extract only things vaguely resembling sentences using regular expressions. If the word following 'a' or 'an' started with a quote or parenthesis, the initial quote or parenthesis was ignored. The resulting prefix-list with the code to query it is less than 10KB in size; excluding the actual counts would reduce the size still further.

The implementations are efficient: on a single thread of a 4.1GHz i7-4770k a benchmark classifying all words of an english dictionary (archived local copy: 354984si.ngl) achieves about 17 million words a second; that's just 60ns per word. The javascript implementations were benchmarked on chrome 84 (80ns per lookup), firefox 32.0a1 (2014-05-22), IE 11, and opera (12 and 21), and are all about 7-10 times slower, at approximately 4-5 million classifications per second.

Contributing

Contributions welcome! Feel free to make a suggestion, create a pull request with improvements. Contributed code should be apache 2 licensed, as a-vs-an is.

Thanks in particular to @lukespice for adding .net core support!

Name		Name	Last commit message	Last commit date
Latest commit History 699 Commits
.github		.github
A-vs-An-DotNet		A-vs-An-DotNet
AvsAn-Python		AvsAn-Python
AvsAn-Test		AvsAn-Test
AvsAnDemo		AvsAnDemo
PerfTest		PerfTest
WikipediaAvsAnTrieExtractor		WikipediaAvsAnTrieExtractor
WikipediaAvsAnTrieExtractorTest		WikipediaAvsAnTrieExtractorTest
.editorconfig		.editorconfig
.gitignore		.gitignore
A-vs-An.sln		A-vs-An.sln
A-vs-An.sln.DotSettings		A-vs-An.sln.DotSettings
CheckJsSize.linq		CheckJsSize.linq
CompareNewOutputToOld.linq		CompareNewOutputToOld.linq
LICENSE		LICENSE
README.md		README.md
Stats.linq		Stats.linq
_config.yml		_config.yml

License

EamonNerbonne/a-vs-an

Folders and files

Latest commit

History

Repository files navigation

a-vs-an

Contributing

About

Resources

License

Stars

Watchers

Forks

Languages