An 11 Letter Word That Means the Art or Science of

Photograph by Claud Richmond on Unsplash

The game Wordle has won the heart of social media in the by few weeks. Wordle is basically a word game, where the player tries to guess a v-alphabetic character give-and-take in half-dozen guesses (tries), where the player progressively receives more than data almost the target word. The game is created by Josh Wardle, an artist and engineer. Wordle starts when the player submits their offset 5-alphabetic character discussion. Every time a word is submitted, feedback is provided on each letter of the submitted word, indicating if the letter of the alphabet exists in the target word, and if the spot matches that in the target give-and-take. Below is a screenshot of the instructions.

Rules of Wordle (screenshot past author)

A Skilful Strategy

Is in that location a good strategy to play the game? Plainly, prior to entering the start word, the actor has no information about the discussion and it could be one of approximately 15,000 5-letter of the alphabet English words. However, once the kickoff give-and-take is submitted, the player volition gain more data on letters involved in the target word, depending on the entered give-and-take. Is there a good strategy one time the player starts receiving feedback? Perhaps there is one. Subsequently feedback on the first give-and-take is provided, success would depend on many factors including the players vocabulary and how they can narrow down their next guess based on the feedback. However, the option of the first give-and-take is contained of the histrion's vocabulary or language skills. That is why, nosotros can mayhap talk about a strategy that would provide the all-time feedback (i with as much data as possible) after the offset discussion is submitted. Basically, a good strategy for the kickoff entered word would be i that tries to eliminate as many remaining letters equally possible. Better notwithstanding, a practiced strategy for the first entered discussion would be ane that tin can decide as many letters of the target give-and-take as possible with every bit many correct placements of those letters. In this assay, I am trying to find a strategy, or rather a word, that can serve this purpose.

A Closer Await at the Words

Based on this commodity on Wikipedia, the Webster's Tertiary New International Dictionary of the English Language contains 470,000 entries. Nonetheless, a portion of these words are obsolete or may not fall into the category of valid single words that contain only letters (no numbers or symbols). I plant a dataset of such words at this repository on Github. The file contains 370,103 English language words that are single and contain just messages. Subsequently extracting merely 5-letter of the alphabet words from this list, I was left with a list of xv,918 words. I will explore this list to hopefully gain more insight into a good strategy for the showtime give-and-take entered into Wordle. Mayhap unrelated to this piffling project, but I was curious to find the distribution of words frequency based on number of messages and the following was the event. Plainly, the frequency is unimodal with a meridian at words with nine letters. The 5-letter words found just approximately 4.iii% of all words in this list.

Frequency of words (image by author)

Adjacent, I will review two unlike strategies, the Vowel Strategy and the Frequency Strategy. I volition bear witness that the Frequency Strategy is a meliorate strategy and we volition pick the best word based on the Frequency Strategy.

The Vowels Strategy

Vowels play an import role when trying to come up with a strategy to eliminate large numbers of words each round. This is because at least one vowel exists in each syllable of the discussion. There are 5 vowels: A, E, I, O and U. Even though the letter Y tin human action as a vowel in some words, I did non consider it a vowel here. Starting the search with vowels may be a good idea because every single letter in English must have at least one vowel (well this is non 100% true, as we will find a bit later, we would be able to find eight words without any vowels, although not bringing the merit of this strategy into question).

I started my search through my list of 5-letter words by finding the number of words with 1, 2, three, four and five unique vowels. For instance, the word asana has only ane unique vowel and the word alibi has two. Turns out, in that location are 6223, 8568, 1055, 18 and 0 words with i, ii, iii, 4 and v unique vowels, respectively. For example, the words adieu and auloi (plural of Aulos, an ancient Greek wind instrument), Aequi (an ancient Italian tribe) and uraei (plural of Uraeus the upright form of an Egyptian cobra) all have 4 unique vowels. Needless to say, in that location were no 5-letter words that consisted of only vowels.

There were also 46 5-letter words, where the letter Y acted as a vowel, due east.thousand., in words ghyll (a ravine or narrow valley in the North of England) or Scyld (a legendary Danish king). There were likewise 8 words without whatsoever vowels such equally crwth, which is a a type of stringed instrument.

Considering how important vowels are in the English linguistic communication, a strategy based on vowels would be to use starting time words that contain equally many unique vowels as possible. This will help us decide the existence or absenteeism of as many vowels as possible in the target word. As mentioned to a higher place, at that place are no 5-letter words that consist of only vowels. However, there are eighteen words that consist of iv unique vowels. These words include: goodbye, aequi, aoife, audio, aueto, auloi, aurei, avoue, heiau, kioea, louie, miaou, ouabe, ouija, oukia, ourie, ousia and uraei.

One may argue that any of these 18 words would make a good first try at Wordle. Yet, allow's see if whatsoever of the v vowels are any more/less frequent in five-alphabetic character words. The following shows the frequency of appearance for each of the 5 vowels in 5-letter of the alphabet words (non counting unique appearances, i.e., for letter A, the word asana counts equally 1).

Frequency of vowels (prototype by author)

The graph above shows that the vowel U is the least frequent of the 5 vowels. Filtering out from the list of five-letter words with iv unique vowels, words that contains U every bit a vowel, nosotros are left with a list of just two words, Aoife (an Irish feminine given name) and Kioea (a Hawaiian bird that became extinct in the 19th century). A quick search through the list shows that the consonant K appeared in 1663 five-letter words, whereas the consonant F appeared in 1115. Therefore, this strategy would suggest the give-and-take Kioea. It is important to mention that this strategy completely ignores the placement of vowels in the word and only determines the existence or absenteeism of them in the target discussion. Nosotros volition come across in the next department, how the Frequency Strategy outperforms the Vowels Strategy.

The Frequency Strategy

The previous strategy only focused on the vowels. This strategy, yet will focus on all of the letters. We will evaluate the about ofttimes used messages in the alphabet and will also determine the about frequent placement of top most oft used letters in 5-letter words. Based on those, we will determine the best words to exist entered start into the game.

I institute the frequency of occurrence of each letter in the alphabet in the 5-letter words in the dataset and sorted them from largest to smallest. The following graph shows the frequencies.

Frequency of letters (image past author)

In the to a higher place graph, each occurrence of a alphabetic character in a word was counted equally 1. So I decided to look at the average frequency of letters per word to see if it was whatsoever different from the to a higher place. Looking at the average frequency of letters in 5-letter words, I did non see any difference in the club of letters, sorted from most commonly appearing to least unremarkably actualization (see below).

Average frequency of letters (image by author)

This ways the meridian most usually used letters in 5-letter words (in terms of total frequency also as average frequency) were the messages A, E, Due south, O, R, I, 50, T, etc. I decided to focus on the top six letters since the boilerplate frequency dropped significantly after the sixth letter. There are 96 words that are made upwards of only these letters (repetition allowed). Nonetheless, if we concord that the purpose of the first alphabetic character is to eliminate equally many remaining letters (or determine as many letters in the target give-and-take) as possible, perhaps nosotros should restrict repetition of messages. If we don't allow for repetition, the listing will reduce to but 12 words. These words are: aesir, aries, arise, arose, ireos, oreas, orias, osier, raise, seora, serai and serio. Which 1 of these 12 words would be the best first word in Wordle?

To answer this question, I decided to look at the frequency of appearance of each of the top six letters in each spot of the 5-letter words (first letter, second letter, etc.). The outcome is shown below.

Frequency of letters in each spot (prototype by author)

I besides calculated the average frequency of the top 6 letters in v-letter words to see if it shows whatever significant difference from the absolute frequencies merely it did non turn out to be different. The boilerplate frequencies are calculated by dividing the absolute frequencies by the number of 5-letter words, in which that item letter appears in that particular spot. The average frequency plot is presented below.

Average frequency of letters in each spot (image by author)

This shows for example, that the letter S frequently appears in v-letter words equally the fifth letter, but it is almost never actualization as the third letter. Based on this, I used a simple scoring system to assign a score to each word, which basically consists of the sum of average frequencies for the letters based on in a higher place results. This scoring organisation will presume that the 6 letters are all valued equally and will only focus on frequencies per spot. For example, the score for the letter aesir will be calculated as approximately 0.1619 + 0.2928 + 0.1162 + 0.2771 + 0.1840=ane.032, since the average frequency of the letter of the alphabet A in the first spot is 0.1619, boilerplate frequency of the alphabetic character E in the 2nd spot is 0.2928, then on. The table and figure below prove the calculated score for all 12 words in the list.

Score of top words (epitome past writer)

Based on this analysis, the give-and-take Aries (Latin word for ram) has the highest calculated score. It is shown that if used as the first word entered into Wordle, on average, the word Aries can determine the largest number of letters in the target word.

Aries is the Latin word for ram. Photo by Livin4wheel on Unsplash

Testing

To test the effectiveness of Aries to place letters in the target word, I used a random pick of 5000 words from the list of v-letter words, and calculated how many messages, on average, would exist indicated when the give-and-take Aries is used every bit the first word on Wordle. I replicated this process 10 times. The following shows that the average number of messages (per discussion), whose existence in the target word identified after Aries was used every bit showtime give-and-take, was betwixt 2.055 and 2.i. Delight note, the following consequence does non divide messages, whose spot was correctly identified and those who weren't. It simply includes all the letters that were identified in the target word. In other words, all the messages that turn Gold and Green after the word was entered.

Event of simulation for average number of letters identified when Aries was used as the outset give-and-take

I conducted the same assay for the word Kioea (which was suggested past our Vowels Strategy), and the event was an boilerplate of only one.79 letters identified. This is an indication that the Frequency Strategy was superior in indicating letters in the target word to the Vowel Strategy.

Next, I calculated the average number of letters (per word), whose actual spot in the target word was correctly identified by the give-and-take Aries. This means, not but is the letter identified, simply its spot in the target discussion is also correctly identified. In other words, this is the boilerplate number of letters that turn Green after the word is entered. For the simulation I again used 10 replications and 5000 randomly selected words in each replication. The post-obit shows the results for Aries.

Result of simulation for average number of bodily spots of letters correctly identified when Aries was used as the first word

I ran the aforementioned analysis for all the 12 words in the list of top words to see if any of them could beat Aries. As expected, the discussion Aries demonstrated the highest value for average number of letters (per target discussion), whose spots were correctly identified. For this analysis also I used 10 replications and 5000 randomly selected words in each replication and reported the average across all 10 replications.

Effect of simulation for boilerplate number of actual spots of letters correctly identified for all the words in the meridian words list

Average number of letter of the alphabet locations correctly identified (prototype by writer)

Based on the results of this study, if used as the first word, the give-and-take Aries can correctly identify the existence of approximately 2.07 messages on average and the correct spot of approximately 0.6 letters, on average, will be correctly identified.

Conclusion and Note

A caravanserai. Photograph by mostafa meraji on Unsplash

I realized after that, unfortunately, Aries is not a give-and-take on Wordle'south list of accepted words, and neither are the adjacent all-time words on the list Orias and Serio (based on the give-and-take scores identified in a higher place). The next best word on the list was serai, which is another discussion for caravanserai or inn and is indeed on Wordle's list of accepted words. The origin of the name is Farsi and Turkish, with slightly dissimilar pronunciations (saray or sarāī, also see caravanserai ). In terms of boilerplate frequency of messages and letter spots identified in our testing model, both serai and Aries have the same boilerplate frequency of messages in target word correctly identified (approximately 2.07 messages on average). However, the word serai has a slightly lower average frequency of letter spots correctly identified (approximately 0.47 compared to 0.58 for Aries). Below, you lot see serai used as first word on the Wordle of January 16, identifying the existence of 3 letters, with the spot of two of them correctly identified.

serai used as offset word on Wordle on Jan 16 (image by writer)

In determination, I am non sure if the selection of words for Wordle is a completely random procedure. You may debate that some words may have had some reference to daily global events (meet here for a list of past Wordle words in 2022). And later on all, it may not exist too much fun playing based on an analysis or strategy.

Happy Wordling anybody (although Wordling is probably not on Wordle'due south list of accepted words)!

kinzelthrear.blogspot.com

Source: https://towardsdatascience.com/a-frequency-analysis-on-wordle-9c5778283363

0 Response to "An 11 Letter Word That Means the Art or Science of"

Enregistrer un commentaire

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel