“My name is Bond… James Bond."
James Bond is a fictional British secret agent created by Ian Fleming in 1953. Originally the subject of novels and short stories, James Bond has featured in a hugely successful franchise of films spanning decades. I grew up on a regular diet of James Bond films and fondly remember the excitement of the customary TV showing on Boxing Day.
The film titles are quite distinctive, and something about them makes them “feel” like a James Bond film title:
- Dr No (1962)
- From Russia with Love (1963)
- Goldfinger (1964)
- Thunderball (1965)
- You Only Live Twice (1967)
- On Her Majesty’s Secret Service (1969)
- Diamonds Are Forever (1971)
- Live and Let Die (1973)
- The Man with the Golden Gun (1974)
- The Spy Who Loved Me (1977)
- Moonraker (1979)
- For Your Eyes Only (1981)
- Octopussy (1983)
- Never Say Never Again (1983)
- A View to a Kill (1985)
- The Living Daylights (1987)
- Licence to Kill (1989)
- GoldenEye (1995)
- Tomorrow Never Dies (1997)
- The World is Not Enough (1999)
- Die Another Day (2002)
- Casino Royale (2006)
- Quantum of Solace (2008)
- Skyfall (2012)
- Spectre (2015)
- No Time to Die (2021)
I started to think about what it was that made these film titles distinctive, and if I knew the answer to that, whether I could generate random film titles befitting of a James Bond film. Here’s a short journey to creating that generator. I’ve avoided delving into any code so that non-programmers can following along too, however I’ve linked to relevant lines in a separate gist of easy-to-follow Clojure code along the way. Source code for the final generator is shared at the end.
Rather than diving in head first with Natural Language Processing and Machine Learning techniques, let’s start with some simple analysis. One intuition is that the titles employ several words repeatedly, such as “Live”, “Die”, “Never”, etc. Let’s explore that…
The first step is to split the titles into their constituent words[gist], also known tokenising. Some titles are compound words whereby two words have been joined together to form a single word, for example, “Skyfall”. These can be split into their component words for our purposes, in this example, “Sky” and “Fall”. Ordinarily you might automate the searching and splitting of compound words, but with only 26 short titles to play with, we can do this by hand.
Here’s our tokenised lower cased set of words:
dr no from russia with love gold finger thunder ball you only live twice on her majesty’s secret service diamonds are forever live and let die the man with the golden gun the spy who loved me moon raker for your eyes only octo pussy never say never again a view to a kill the living day lights licence to kill golden eye tomorrow never dies the world is not enough die another day casino royale quantum of solace sky fall spectre no time to die
There’s a total of 86[gist] words here where 68[gist] of them are unique. Incidentally, the frequencies of words appearing twice or more in our film titles is as follows[gist]:
the=5, to=3, never=3, die=3, live=2, golden=2, only=2, kill=2, day=2, with=2, no=2, a=2
Some of these words are stopwords, common words which don’t add much meaning to a title, such as “the” and “a”. If we ignore them, we’re left with[gist]:
never=3, die=3, live=2, golden=2, only=2, kill=2, day=2, no=2
So we have eight “distinctive” words[gist] out of 68 unique words. These words are repeated across several titles amounting to 18 occurrences[gist] of distinctive words out of a total of 86 words. That might not seem like a large number, but the number of titles containing the distinctive words is 13[gist]. That’s half of all our titles, so our intuition was pretty good:
- Dr No
- You Only Live Twice
- Live and Let Die
- The Man with the Golden Gun
- For Your Eyes Only
- Never Say Never Again
- A View to a Kill
- The Living Daylights)
- Licence to Kill
- Tomorrow Never Dies
- Die Another Day
- No Time to Die
We have our tokenised words, and our intuition about the importance of some of these words. Now it’s time to generate some titles…
The naive approach would be to take our set of words and throw them together randomly. The shortest titles are one word long, and the longest six[gist], so let’s create random titles of between one to six words inclusive[gist]:
- The Tomorrow Dies Are Die Finger
- On Live No The
- With Secret Again The Her Dies
- Love Me Diamonds The A Golden
- Enough For Never Only
- Dr Ball
- Gun World Dr
Clearly this is nonsensical, though we may get lucky, such as with Dr Ball! The problem is that we’re not taking into account grammatical rules and sentence structure. We could dive into analysing the sentence structure of our titles, but again, let’s start with something much simpler. We already have perfectly good existing titles with template sentence structures that we’re happy with, so let’s try substituting words between two titles. This will preserve sentence structure.
For example, starting with these two titles:
- Diamonds Are Forever
- For Your Eyes Only
We then do the following…
- Take “Diamonds are Forever” as our template title.
- Pick a random word from this title to substitute, for example, “Diamonds”.
- Then pick a random word from “For Your Eyes Only”, for example, “For”.
- Finally, substitute the word “Diamonds” with “For” to give the new title: For Are Forever.
Unfortunately this title doesn’t seem quite right! Let’s try another example…
- Take “For Your Eyes Only” as our template title.
- Pick a random word from this title to substitute, for example, “Eyes”.
- Then pick a random word from “Diamonds Are Forever”, for example, “Diamonds”.
- Finally, substitute the word “Eyes” with “Diamonds” to give the new title: For Your Diamonds Only.
Much better! But why did this work better than the first example? The answer is “Part of Speech”.
Part of Speech
Each part of speech describes a category of words which have the same grammatical properties. Most of us will have learnt about the core ones in school, for example:
- Noun: a word describing a physical object or an abstract concept, for example, frog or laughter.
- Verb: a word signifying action or doing, for example, eat or think.
- Adjective: a word that describes a noun, for example, blue or tired.
- Article: the or a.
Words from these parts of speech are combined to form full sentences, such as: The (article) blue (adjective) frog (noun) ate (verb) the (article) fly (noun).
In school we’re usually taught that there are about eight parts of speech, however for more complex analysis, some classifications have 70 or 80 parts of speech. For our purposes, let’s use the 36 parts of speech used in the Penn Treebank, a large body of text annotated with parts of speech, and used for linguistic research. In the Penn Treebank, each part of speech is given a code, for example,
NNP represents a plural noun, and
RB represents an adverb. The annotation of text is usually called “tagging”, and tagged text often looks like this:
Diamonds/NNP/ Are/VBP/ Forever/RB/ For/IN/ Your/PRP$/ Eyes/NNP/ Only/RB/
Back to our title generation, when we substitute words between titles, all we need to do is ensure that we substitute words with the equivalent part of speech. In our example, that would give:
- Eyes Are Forever
- Diamonds Are Only
- For Your Diamonds Only
- For Your Eyes Forever
Not perfect, but much better! So now we have everything we need to generalise this to a fully fledged title generator.
The Full Generator
Build a tokenised set of words based on the existing titles grouped by part of speech, with some poetic licence:
- JJ: “royale” “gold” “octo” “enough” “another” “thunder” “secret” “living” “golden”
- PRP$: “your” “her”
- VB: “die” “kill” “live”
- VBZ: “dies”
- WDT: “who”
- VPB: “let” “say”
- VBD: “loved”
- RB: “twice” “not” “forever” “again” “never” “only”
- PRP: “you” “me”
- VBP: “are” “is” “fall” “live”
- DT: “a” “no” “the”
- NN: “quantum” “finger” “raker” “gun” “russia” “spectre” “tomorrow” “majesty” “casino” “man” “moon” “eye” “never” “love” “time” “licence” “daylights” “solace” “pussy” “no” “ball” “service” “day” “kill” “view” “spy” “world” “sky”
- IN: “of” “for” “on” “and” “from” “with” “to”
- NNS: “diamonds” “eyes”
Select one of the “template” film titles at random. For example: Never/RB/ Say/VPB/ Never/NN/ Again/RB/.
Select one of the words in the template title at random, and note its part of speech. In our example, let’s choose Never which is tagged NN, a singular noun.
Finally, substitute the word Never for another word tagged NN selected at random from the set we created in Step 1. For example, Love. This gives us a generated title of: Never Say Love Again. Not bad!
Some other examples:
- Die Another Sky
- Casino Thunder
- A Time to Die
- The Secret Daylights
This method of generation is a good first start, but the following improvements could be considered:
- Not all of the generated titles seem to work, so perhaps a deeper analysis of sentence structure would help?
- For each substituted word, we could look up synonyms as a replacement. For example, introduce the word “Enough” in place of “Plenty”.
- Finally, we could substitute more than one word. Or even every word!
The full source can be found here. As a bonus, the same generation technique has been applied to Star Wars film titles, Harry Potter film/book titles, Power Rangers Dino Charge series episode titles, and Famous English proverbs and sayings.
 Photo by Irv P on Unsplash