The following is a guide to the academic word profiler, giving information on each area, starting with selecting options, then looking at understanding frequency data, understanding word form data (including range and ratio), finding academic synonyms, and identifying word list information, including academic collocations.
There is also a description of the BNC Baby corpus which is used to derive frequency and other data.
The first thing you will see is the search bar, with options (below). There are quite a few options, and deselecting ones you are not interested in will make the page less cluttered and therefore make it easier for you to focus on the information you want. The default setting is all options selected.
The first section shows the frequencies of the word in each of the four sub-corpora of the BNC Baby, i.e. academic, fiction, spoken and news. Below is an example for the word academic. In this case, the word is much more frequent in the academic corpus than in the other corpora, indicating that this is a useful word for academic study.
Hovering over any bar in the chart will show the actual value. In this case, the word academic occurs in the academic corpus with a frequency of 144.14 words per million, meaning it is quite a common word.
The second section shows information about different forms of the word (if any) as well as more detailed information about each. Below is the graph for the two forms of academic. It is clear that the adjective form is used much more often than the noun form.
Also in this section you can find more detailed information about each word form of the word. In addition to part of speech (pos) and frequency per million words (freq pmw), you can see information on range and ratio. These are explained in more detail below.
Range indicates how many texts (out of 30) the word appears in in the academic corpus. A word which appears in a wide range of texts is more likely to be a commonly used word, and therefore worth studying. A word which occurs in only one or a few texts may be less useful, although it may still have quite high frequency (if it is used many times in those texts). In the case of academic, the adjective form is used in over half of the texts (17 out of 30), while the noun is used in only four, meaning the noun form is not only less frequent but also less widely used.
Ratio refers to how frequently the word occurs in the academic compared to the fiction corpus. For example, a word with a ratio of 1.5 occurs 50% more often in the academic than the fiction corpus. In the case of academic, the adjective form occurs over nine times as often in academic texts (9.13), while the noun appears more than twice as often (2.5), suggesting that both forms are academic words.
Summing up, the adjective form of academic is high frequency, is used in a wide range of academic texts (in the BNC Baby), and occurs much more frequently in academic than fiction texts. In contrast, the noun form, although used more frequently in academic texts than fiction, is not widely used, and is therefore less useful to study.
The academic synonym section is a very unique feature, and draws together data from different sources. One of these is the open source Wordnet database, used on the dictionary page, which is used to identify synonyms of the word. This information is then combined with data from the BNC to determine whether the synonyms are academic, based on the following criteria:
The figure of 1.5 for ratio was used as this is the same measure as Gardner and Davies used when developing the Academic Vocabulary List (AVL).
To make sure you are selecting a synonym appropriately, the definition is also given, along with any example sentences in the database which use the word with that meaning. In example sentences, the original word is highlighted in pale red, while synonyms are highlighted in green.
Below is an example for the word study. This word has nine synonyms which meet the criteria: six for the noun form (survey, work, report, discipline, subject, field), and three for the verb form (analyse, examine, consider). All of these could be used as synonyms, depending on the context. When choosing synonyms, judgement needs to be exercised in terms of selecting an appropriate word based on meaning.
This section is especially useful if searching for a word which is not academic, and you want to find an academic alternative. Below is the output for good, a word which is used far less often in the academic sub-corpus than the other three sub-corpora. The words beneficial and effective would be suitable synonyms for the meanings given; the word serious, although a synonym, may not be common in academic writing with the same meaning.
The next section shows information on which word lists the word appears in. This includes a range of general lists (e.g. the GSL), academic lists (including the AWL), and technical word lists. The below example is for the word study, which appears in all three general lists, as well as the AVL, AKL, and the SVL. If you are not sure what these lists are, you can click on the hyperlinks on the profiler page to find out more.
There is also information on which list in the BNC/COCA lists the word appears in. This can be helpful in making judgements about whether the word is useful for study, based on whether it is high-frequency, mid-frequency or low-frequency. The example below shows that study is a high-frequency word.
Although the academic word profiler is used for single words, it is always important to study word combinations. The final section identifies any collocations from the Academic Collocation List (ACL) the word appears in. Below are some examples for the word study.
The frequency, range and ratio data are derived from the BNC Baby corpus. The BNC Baby is a four million word sample taken from the 100 million word BNC (British National Corpus), and is divided into four parts: academic writing, imaginative writing (i.e. fiction), spoken conversation and newspaper texts. The sample texts were chosen so that each sub-corpus was approximately equal in size, i.e. each contains approximately one million words.
The academic corpus consists of 30 texts, chosen randomly from different subject areas, and comprises journal articles as well as material from books.
The fiction corpus comprises 25 texts taken from books published between 1985 and 1994, written for an adult audience.
The spoken corpus contains 30 spoken texts, with a range of speakers in different situations. Just over half of the texts are for speakers aged 25-44, with the younger age range 0-24 comprising just under 20%, and the older age range of 45 and over comprising just under 30%. 59% of the speakers are female, 41% male.
The news section of the BNC Baby comprises a mix of national newspapers (60%) and local newspapers (40%), covering a wide range of topics, as well as a range of dates to maximise the variation of topics. The shorter nature of news articles compared to other types of text mean that number of texts is significant larger (97 texts in total).
Author: Sheldon Smith ‖ Last modified: 10 October 2022.
Sheldon Smith is the founder and editor of EAPFoundation.com. He has been teaching English for Academic Purposes since 2004. Find out more about him in the about section and connect with him on Twitter, Facebook and LinkedIn.
1
2
3