The Foulest Mouth of Them All

Posted on Oct 21, 2024

Your Task
Profanity Insanity
The Foulest Mouth of Them All…
What Have We Learnt Today?

💡 Warning This Post Contains Strong Language...Reader Discretion is advised!

Your Task

Find the foulest mouthed contestant in UK Taskmaster to date. Bonus points for finding the foulest mouth in each series.

A Side Amble in the Preamble

In the spirit of good code practices, namely avoiding duplicating code and centralising common code into a single location were possible, here is a set of preamble scripts that I will be sourcing at the beginning of each post.

These R scripts:

Configures some global output settings that I want to make these posts and graphics aesthetically pleasing.
Establishing the connection to the TdlM database file.

Contents of this preamble code can be found in this git location.

library(here)
here("static")
## [1] "C:/Users/cfhna/Documents/Git_Projects/RStudio/themedianduck/static"

preamble_dir <- here("static", "code", "R", "preamble")
preamble_file  <- "post_preamble.R"

source(file.path(preamble_dir, preamble_file))
source(file.path(preamble_dir, "database_preamble.R"))
## [1] "Database Connection, tm_db, is now ready to use."
source(file.path(preamble_dir, "graphics_preamble.R"))

Some Motivation To This Preamble

Prior to this preamble code, I copied this code from post to post and executed; this code contains approximately 40 lines. This is okay for quick, one off purposes, but is not a good practice for longer term purposes. For example, if I want to make a change to this preamble code, I would have to change each instance this code is used (as in each markdown associated with each post).

Going forward, this preamble code file will be sourced within about 5 lines of codes. If I want to change the preamble code, I only have to make this code in one location, and this will naturally propagate to wherever this code is being used.

Profanity Insanity

To answer our questions of interest, we likely need to access the following tables:

profanity: Table detailing the profanity observed in an episode.
people: High level information on contestants, Greg Davies and Alex Horne (gender, DOB, dominant hand).
series: Snapshot of overall series information.

The Rate of Profanity Caveat

It is worth reminding ourselves that the number of episodes per series has varied over Taskmaster’s brodcast run. Consequently, to compare the profanity use between series, we cannot compare the total profanity used by a contestant (or speaker in general) in a series as contestants from longer series will likely appear as more foul mouthed.

In order to allow for valid comparisons between series, we introduce a new metric, namely the profanity rate which normalises the profanity total by the number of episodes in the series.

\[\texttt{Profanity Rate for Contestant C in series S} = \frac{\sum{\texttt{Profanity by contestant C in series S}}}{\texttt{Number of episodes in series S}} \] Profanity Rate can be thought of as the average number of times contestant C will swear in an episode (of series S).

The Next Level of Profanity

To assist our foul mouthed quest, it would be useful to create new temporary subtables which combine, transform and/or aggregate data from the various database tables we outlined above.

For example, we might want to create:

an enhanced version of the profanity table which contains interpretable information on the speaker and series rather than numerical ids (they are people with names, not numbers).
an aggregate of this enhanced profanity table (which is at a series, speaker, task level granularity), such that we have profanity at a series level for a contestant.

These transformations can be done in SQL or R. Based on personal preference, the former will be used for joins and aggregations, and the latter for more technical transformations (for example calculation of new statistics).

Enhanced Profanity

The following query combines the data in the profanity, people and series level. The data still remains at low level granularity, namely, the utterance of the profanity by a particular contestant, in a task.

Output of this query stored directly as an R object name profanity_enh:

--  Stored as an R dataframe profanity_enh

SELECT 
pf.series, 
pf.episode, 
pf.task,
pf.speaker as speaker_id,
pp.name as speaker_name,
pf.roots,
pf.quote,
pf.studio,
pp.gender,
pp.hand,
pp.champion,
pp.tmi as speaker_tmi,
sp.name as series_name,
sp.episodes as num_episodes_in_series,
sp.champion as series_champion_id,
sp.special
FROM profanity pf
LEFT JOIN people pp
    ON (pf.speaker = pp.id
    AND pf.series = pp.series) OR
(pf.speaker = pp.id)
LEFT JOIN series sp
    ON pf.series = sp.id

Series Profanity

The following table takes the recently created profanity_enh dataframe and performs a number of operation which eventually results in a new dataset, series_profanity at a series, contestant level. Operations include:

Counting the number of profanities uttered in a given quote.
Aggregating data to a series and speaker level.
1. Sum of the profanities uttered.¹
2. Number of distinct episodes that the profanities are uttered over.
3. Number of episodes in the series.
Adding a new column which calculate the profanity rate.

library(reticulate)
library(dplyr)

series_profanity <- profanity_enh %>% 
    rowwise() %>%
    mutate(num_profanity = length(reticulate::py_eval(roots))) %>%
    # To count the number of profanities utter in a quote.
    group_by(series, series_name, special, speaker_id, speaker_name, speaker_tmi, gender, hand) %>%
    # Aggregating and summarising data at a series, speaker level.
    summarise(
        speaker_episode_count = dplyr::n_distinct(episode),
        sum_profanity_series = sum(num_profanity), 
        no_episodes_in_series = max(num_episodes_in_series)
    ) %>%
    mutate(profanity_per_episode = sum_profanity_series/no_episodes_in_series)

The Foulest Mouth of Them All…

We are nearly there at answering our first foul mouth question! There are few more considerations, that will form the basis our of logic to help answer our question:

we will be considering only standard series of UK Taskmaster and not specials (no New Years Treats and Champion of Champions).
we will only consider contestants and not Greg Davies or Alex Horne.
the foulest mouth contestant will have the largest profanity rate.

And with that, our foul mouthed winner is…

Table 1: Table 2: The Foulest Mouthed Contestants in UK Taskmaster
series_name	speaker_name	gender	hand	profanity_per_episode
Series 1	Romesh Ranganathan	M	R	7.667

And there we have it Romesh Ranganathan² from Series 1 is the foulest mouth contestant on UK Taskmaster, with a profanity rate of 7.667; Romesh is expected to swear about 7.667 times in an episode.

Based on my recollection of Series 1 and Romesh’s angry persona, it is not entirely suprising that he is the most foul mouthed contestant!

A Close Finish?

Some of you readers may be interested in knowing wheter it was a close finish into the profanity rate race.

We can quickly determine this may changing the top_n function to consider 5 rather than 1 say, when selecting based on profanity rate.

Table 3: Table 4: Top 5 Foulest Mouthed Contestants in UK Taskmaster
series_name	speaker_name	gender	hand	profanity_per_episode
Series 1	Romesh Ranganathan	M	R	7.667
Series 6	Asim Chaudhry	M	R	7.300
Series 6	Russell Howard	M	R	5.800
Series 2	Doc Brown	M	R	4.200
Series 3	Rob Beckett	M	R	4.200

The profanity race (Table 3) indicates that following top 5 positions in a profanity rate race:

2nd place: Asim Chaudhry from Series 6 with a profanity rate of 7.3. A somewhat close second to Romesh.
3rd place: Series 6’s Russell Howard with a 5.8 profanity rate, a noticeable drop from 1st and 2nd place.
Tied 4th place: Doc Brown and Rob Beckett, from Series 2 and 3 respectively, with 4.2 swear words expected to be uttered in an episode. Again a drop from the prior positions.

There are no big surprises in these finishing positions although Asim Chaudhry being a close second is somewhat surprising since I don’t remember him being a particularly angry or foul mouthed incident with him³. He’s a relatively mild mannered comedy actor who just wants everyone to know he is a vegan.

Another observation is that the top 5 are all male and right handed. Make of that what you will…

Bonus Task: Foulest Mouth in Each Series

To find the foulest mouth in each Taskmaster series, we can continue use the existing data and logic we have used thus far, but introduce an additional line of logic to rank within a series; the group_by function is our friend here. The rank function provides a ranking with respect to profanity rate; for descending order ranking a minus sign is introduced on the variable we want to rank according to.

within_series_profanity <- series_profanity %>%
    filter(special == 0 & !(speaker_name %in% c("Greg Davies", "Alex Horne"))) %>%
    arrange(series, -profanity_per_episode, -speaker_episode_count) %>%
    group_by(series, series_name) %>%
    mutate(profanity_rank = rank(-profanity_per_episode, ties.method = "first"), name_prof_rate = sprintf("%s (%#.3f)", speaker_name, profanity_per_episode))

Performing the filter operation of profanity_rank = 1 will provide the foulest contestant by series:

Figure 1: Foulest Contestant by Series

Some insights from Figure 1:

Intuitive foul mouthedness:
- Daisy May Cooper (Series 10, hippogate is fresh in my mind),
- Jamali Maddix (Series 11, general angry persona).
Surprising foul mouthedness:
- Joe Thomas (Series 8, a contender for awkwardness for sure but not foul mouthedness)
- Chris Ramsey (Series 13, I would have suspected Judi Love),
- David Baddiel (Series 9, Ed Gamble would be my bet),
- Jenny Eclair (Series 15, Frankie Boyle’s angry persona would have been my bet).
Romesh and Asim really were leading the profanity based pack by quite a margin, with a rate greater than 7.
- All other serieses have profanity rates no greater than 4.5.

Within Series Foul Mouthed Races

Similar to the overall foul mouthed analysis, we might be interested in seeing how close the profanity race was in each series. We may also find our surprising insights, less surprising by assessing the profanity race.

Some observations from Figure 2:

Series 2, Series 5, Series 8, Series 9 Series 13, Series 16 were all relatively close races.
- This explains some of our surprising foul mouthed contestants we previously uncovered.
Russell Howard (Series 6) is a pretty foul mouthed contestant overall compared to all other contestants.
- If he was on any other series, he likely would have been the most foul mouthed of that season.
The data sources used also include additional and substitute contestants that were not permanent contestants for that series. For example:
- Josh Widdicombe appearing in Series 2 is due to his involvement in team tasks,
- Katherine Ryan appearing in Series 9 is due to her substituting for Katy Wix for the studio portion of one episode.
- The NA contestant of Series 9 is likely Kerry Godliman’s contribution. A data join discrepancy is likely occurring here as well.
The least foul mouthed permanent contestant of a series is Mike Wozniak with a profanity rate of 0.1. This isn’t a big surprise based on the Geography Teacher persona he provides; we’ve got to set a good example to the kids!
- However there are a handful of other contestants who are potentially as wholesome. For example Victoria Coren Mitchell, Julian Clary, Lolly Adefope.
Series 16 has a low range profanity rates between 0.2 and 2.5 which I found to be surprising. It could be that truly were not foul mouthed, or there is a data issue (not all profanity has been recorded in TdlM).

Figure 2: All Series Contestants Profanity Rates

What Have We Learnt Today?

💡 There was no box mate... The Tree Wizard is the foulest mouthed contestant! He is expected to drop 7.67 swear words in an episode!

💡 Tick-tock it's hemorrhoid o'clock! The Geography teacher is the least foul mouth contestant; he is only expected to swear 0.1 times an episode! Good thing he is the assistant on Taskmaster Junior!

To count the number of profanities, we rely on the roots column and the library reticulate has been employed. An example value in the roots column might be ["little", "alex", "horne"] which is the form a python list object. reticulate and the py_eval function allows use to interpret this as a list object from within R, convert it to its R equivalent (an R vector), and manipulate it as an R object (namely taking the length of it to count number of profanity occurrences). It’s a convenient way for me to deal with these data types which may not be natural in R, but natural in another language.↩︎
aka tree wizard↩︎
I’m looking at you Ed Gamble and Daisy May Cooper for these sorts of outbursts.↩︎