Interpretation of all president’s words since Truman

Refer to the state of the union addresses made by US presidents since WWII. To simplify the task, only looks at the first address for each president. (You can find the archive here: https://www.presidency.ucsb.edu/documents/presidential-documents-archive-guidebook/annual-messages-congress-the-state-the-union (Links to an external site.))

Look at the frequency of words
Look at topic modeling (what do they talk about)

# We start as always with importing the necessary libraries
import requests #PACKAGE THAT allows us download texts from online (e.g. I request Moby Dick's online book without downloading it)
import re
from requests import get
from bs4 import BeautifulSoup #Submodule of bs4 (I don't need the entire package). 
import nltk #natural language processing (NLP)
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from collections import Counter 
from requests import get
import pandas as pd
import plotly.express as px
import plotly.graph_objs as go
import plotly.offline as py
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation as LDA
import numpy as np
import itertools
import matplotlib.pyplot as plt
from gensim import corpora, similarities
from gensim.models import TfidfModel
from scipy.cluster import hierarchy
from nltk.stem import PorterStemmer
import pickle
import warnings
import wordninja
from selenium import webdriver
from selenium.webdriver.common.by import By
from autocorrect import Speller
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import contractions
import nltk
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.tokenize import word_tokenize
from scipy.cluster.vq import kmeans, vq
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.svm import LinearSVC

#for the code to work
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
nltk.download('vader_lexicon')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/loizoskon/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/loizoskon/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/loizoskon/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!

True

Refer to the state of the union addresses made by US presidents since WWII. To simplify the task, only looks at the first address for each president. (You can find the archive here: https://www.presidency.ucsb.edu/documents/presidential-documents-archive-guidebook/annual-messages-congress-the-state-the-union (Links to an external site.))

We ask BeautifulSoup to locate a table on the page, scan through the rows of this table, and then get the first link on each of the rows.

url = "https://www.presidency.ucsb.edu/documents/presidential-documents-archive-guidebook/annual-messages-congress-the-state-the-union"

#https://www.presidency.ucsb.edu/documents/app-categories/spoken-addresses-and-remarks/presidential/state-the-union-addresses
#
# Find the first instance of a table on the page (this will simplify work for us)
response = get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table')

# Create an empty list to keep all the links
links = []

# Look through each row of the table to identify if there are links,
# if there are any, then get the first link, if there aren't any, just skip that row
if table.findAll("tr"):
    trs = table.findAll('tr') 
    for tr in trs:
        try:
            link = tr.find('a')['href'] # Finds the first link in a row
            links.append(link) # Appends that link to the links list.
        except:
            pass

# We can also find all instances of the links on a page and that's what the code below does
# You can use the code below but I want to solve a more general version of the homework, 
# so I will use this code here:
############
trs= table.findAll('tr')
for tr in trs:
    try:
        aas = tr.find_all('a') # Finds the first link in a row
        for a in aas:
            link = a['href']
            links.append(link) # Appends that link to the links list.
    except:
        pass
# #
# #links
# #############
# # Note that we had to use the try: and except:,
# # because otherwise the code would result in an error as some rows don't have any links    
# # Uncomment the code below to see this for yourself.

# # Select the links that are needed
links_needed = links[0:22]

# for tr in trs:
#     link = tr.find('a')['href'] # Finds the first link in a row
#     links.append(link) # Appends that link to the links list.

# Links now has all the links we needed. I can save them to an excel file 
# just in case I need them

pd.DataFrame(links).to_excel("links.xlsx")

# We can now go through individual links and extract the information we want.
# If we examine the pages of each speech, we will notice each speech is stored inside
# the following tag "<div class="field-docs-content"> </div>". 
# That's what we can grab.

# We can also grab the date stored inside the following tag:
#  <div class="field-docs-start-date-time"></div>

# Finally, to make our life even more easy, let's grab the name of the President
# It's in the tag "<div class="field-title"></div>"

# I will also grab the title of the speech just in case I need to use it since the titles differed somewhat


# We start with creating empty containers

names = []
dates = []
speeches = []
titles = [] 

#for link in links:
#     try:
#         response = get(link)
#    except: # I need this because Nixon's speech in 1973 in done in an annoying way, it's a footnote actually
#        pass # so the code without it would result in an error
#     soup = BeautifulSoup(response.text, 'html.parser')
#     name =  soup.find("div", class_ ="field-title").get_text(strip=True)
#  #   date = soup.find("div", class_ ="field-docs-start-date-time").get_text(strip=True)
#  #   title = soup.find("div", class_ ="field-ds-doc-title").get_text(strip=True)
# #    #speech = soup.find("div", class_ ="field-docs-content").get_text(strip=True)
#     names.append(name)
#   #  dates.append(date)
#  #   titles.append(title)
#    speeches.append(speech)
    
#    print(names)

df = pd.DataFrame({'name': names, 'date': dates, 'title': titles, "speech": speeches})
display(df)

	name	date	title	speech

#df.to_excel("speeches.xlsx")
#print(date)

# #create a list of president names
PresidentName = ['Joseph R. Biden', 'Donald J. Trump', 'Barack Obama', 'George W. Bush',
                  'William J. Clinton', 'George Bush', 'Ronald Reagan', 'Jimmy Carter',
                  'Gerald R. Ford','Richard Nixon', 'Lyndon B. Johnson','Dwight D. Eisenhower',
                  'Harry S. Truman', 'Franklin D. Roosevelt']

# #manually extract linksof first speech of each president
link_index = [1,2,7,14,22,30,34,42,49,52,59,68,79,91]
link_index = [1,2,7,14,22,30,34,42,49,52,59,65,68,79]
newlink=[links[i] for i in link_index]

print(PresidentName)
print(newlink)
#COMMAND + SLASH

['Joseph R. Biden', 'Donald J. Trump', 'Barack Obama', 'George W. Bush', 'William J. Clinton', 'George Bush', 'Ronald Reagan', 'Jimmy Carter', 'Gerald R. Ford', 'Richard Nixon', 'Lyndon B. Johnson', 'Dwight D. Eisenhower', 'Harry S. Truman', 'Franklin D. Roosevelt']
['https://www.presidency.ucsb.edu/ws/index.php?pid=123408', 'https://www.presidency.ucsb.edu/ws/index.php?pid=102826', 'https://www.presidency.ucsb.edu/ws/index.php?pid=47232', 'https://www.presidency.ucsb.edu/ws/index.php?pid=2921', 'https://www.presidency.ucsb.edu/ws/index.php?pid=16603', 'https://www.presidency.ucsb.edu/ws/index.php?pid=29558', 'https://www.presidency.ucsb.edu/ws/index.php?pid=29542', 'https://www.presidency.ucsb.edu/documents/first-annual-message-11', 'https://www.presidency.ucsb.edu/ws/index.php?pid=29486', 'https://www.presidency.ucsb.edu/ws/index.php?pid=29475', 'https://www.presidency.ucsb.edu/ws/index.php?pid=29447', 'https://www.presidency.ucsb.edu/ws/index.php?pid=3996', 'https://www.presidency.ucsb.edu/ws/index.php?pid=4121', 'https://www.presidency.ucsb.edu/documents/address-before-joint-session-the-congress-the-state-the-union-25']

# #create a dataframe
df = pd.DataFrame()
df['PresidentName'] = PresidentName
df['Links'] = newlink
display(df.Links.values)
print(df)

array(['https://www.presidency.ucsb.edu/ws/index.php?pid=123408',
       'https://www.presidency.ucsb.edu/ws/index.php?pid=102826',
       'https://www.presidency.ucsb.edu/ws/index.php?pid=47232',
       'https://www.presidency.ucsb.edu/ws/index.php?pid=2921',
       'https://www.presidency.ucsb.edu/ws/index.php?pid=16603',
       'https://www.presidency.ucsb.edu/ws/index.php?pid=29558',
       'https://www.presidency.ucsb.edu/ws/index.php?pid=29542',
       'https://www.presidency.ucsb.edu/documents/first-annual-message-11',
       'https://www.presidency.ucsb.edu/ws/index.php?pid=29486',
       'https://www.presidency.ucsb.edu/ws/index.php?pid=29475',
       'https://www.presidency.ucsb.edu/ws/index.php?pid=29447',
       'https://www.presidency.ucsb.edu/ws/index.php?pid=3996',
       'https://www.presidency.ucsb.edu/ws/index.php?pid=4121',
       'https://www.presidency.ucsb.edu/documents/address-before-joint-session-the-congress-the-state-the-union-25'],
      dtype=object)

            PresidentName                                              Links
0         Joseph R. Biden  https://www.presidency.ucsb.edu/ws/index.php?p...
1         Donald J. Trump  https://www.presidency.ucsb.edu/ws/index.php?p...
2            Barack Obama  https://www.presidency.ucsb.edu/ws/index.php?p...
3          George W. Bush  https://www.presidency.ucsb.edu/ws/index.php?p...
4      William J. Clinton  https://www.presidency.ucsb.edu/ws/index.php?p...
5             George Bush  https://www.presidency.ucsb.edu/ws/index.php?p...
6           Ronald Reagan  https://www.presidency.ucsb.edu/ws/index.php?p...
7            Jimmy Carter  https://www.presidency.ucsb.edu/documents/firs...
8          Gerald R. Ford  https://www.presidency.ucsb.edu/ws/index.php?p...
9           Richard Nixon  https://www.presidency.ucsb.edu/ws/index.php?p...
10      Lyndon B. Johnson  https://www.presidency.ucsb.edu/ws/index.php?p...
11   Dwight D. Eisenhower  https://www.presidency.ucsb.edu/ws/index.php?p...
12        Harry S. Truman  https://www.presidency.ucsb.edu/ws/index.php?p...
13  Franklin D. Roosevelt  https://www.presidency.ucsb.edu/documents/addr...

outcome = []
for lnk in df.Links.values:
     r = requests.get(lnk)
     r.encoding = 'utf-8'
     html = r.text
     soup = BeautifulSoup(html,'html.parser')
     text = soup.get_text()
     outcome.append(text)
    
# print(pd.DataFrame(outcome))#all the speeches in tabular format

outcome1 = [x.replace("\n", "") for x in outcome]#I replace 'n' with nothing
df2 = pd.DataFrame(outcome1)#I create a new dataframe

df2[[0]]

	0
0	Address Before a Joint Session of the Congress...
1	Address Before a Joint Session of Congress on ...
2	Address Before a Joint Session of Congress on ...
3	Annual Message to the Congress on the State of...
4	Radio Address Summarizing the State of the Uni...
5	Fifth Annual Message \| The American Presidency...
6	First Annual Message \| The American Presidency...
7	First Annual Message \| The American Presidency...
8	First Annual Message \| The American Presidency...
9	Fifth Annual Message \| The American Presidency...
10	Fifth Annual Message \| The American Presidency...
11	State of the Union Message to the Congress: Ov...
12	State of the Union Message to the Congress on ...
13	Address Before a Joint Session of the Congress...

What are the 10 most common “meaningful” words used by each president since Harry Truman? What does it say about the shift in priorities in the American politics?

#Previous speech links were not scraped so we import the excel that includes them here for the analysis
speeches = pd.read_excel('state_of_the_union_speeches.xlsx')

#I use a function and include the code that want to apply for every president.
def my_function(x):
    r = requests.get(x['html'])
    r.encoding = 'utf-8' 
    html = r.text 
    #print(html[0:2000]) 

    soup = BeautifulSoup(html, "html.parser")
    president_text = soup.get_text()
    #print(president_text[0:2000])

    #THE DOCUMENT IS TOKENIZED -- SPLITTING TEXT INTO INDIVUDUAL WORDS
    tokenizer = nltk.tokenize.RegexpTokenizer('\w+')
    tokens = tokenizer.tokenize(president_text) 
    #print(tokens[0:10])

    #Defining stopwords and adding more to the list. This list is same across all 14 presidents' speeches. 
    sw = nltk.corpus.stopwords.words('english') 
    newsw = ['annual', 'number', 'help', 'thank', 'get',  'going', 'think', 'look', 'said', 
             'create', 'citizens', 'citizen', 'across', 'since', 'go', 'believe', 'say', 
             'long', 'better', 'plan', 'national', 'ask' '10', 'much', 'good', 'great', 
             'best', 'cannot', 'still', 'know', 'years', '1', 'major', 'want', 'able', 'put', 
             'capacity', 'programs', 'per', 'percent', 'million', 'act', 'provide', 'afford', 
             'needed', 'may', 'possible', 'full', '2', 'effort', 'meeting', 'address', 'ever', 
             'measures', 'ago', 'delivered', '5', 'program', 'past', 'future', 'need', 'needs', 
             'house', 'also', 'tonight', 'propose', 'toward', 'continue', 'society','country', 
             'seek', 'period', 'year', 'man', 'men', 'one', 'areas', 'begin', 'live', 'make', 
             'let', 'upon', 'well', 'office', 'meet', 'make' 'citizens', 'human', 'self', 'among', 
             'peoples', 'affairs', 'would', 'field', 'first', 'interest', 'today', 'recommendations', 
             'recomenndation', 'within', 'shall', 'administration', 'nation', 'nations', 'us', 'we', 
             'policy', 'legislation', 'time', 'new', 'many', 'several', 'few', 'government', 'world', 
             'people', 'united', 'states', 'system', 'every', 'people', 'must', '626','give', 
             'categories', '226762', '17608', '24532', '430', '38','statistics', 'analyses', 
             'miscellaneous', 'congressional', 'skip', 'content', 'documents', 'attributes', 'media', 'message', 
             'congress', 'state', 'union', 'america', 'american', 'americans', 'presidency', 'president', 
             'project', 'search', 'toggle', 'navigation', 'search', 'guidebook', 'archive', 'category', 
             'main', 'take','like','yet','j','000', 'ask', '1974', 'federal', 'http', 'www', 'usb',
             'edu', 'the', 'the', 'before', 'joint', 'session', 'the','american', 'america',
            'year','congress','let','time', 'nation', 'new', 'people']
    sw.extend(newsw)
    #print(sw[0:30])

    president_words = [token.lower() for token in tokens] 
    words_ns = [word for word in president_words if word not in sw] 
    #print(words_ns[:20])
    president_ns = " ".join(words_ns)

    #Determining the most commoon words 
    count = Counter(words_ns)
    top_ten_president = count.most_common(10)
    top_10_string = ','.join([str(x) for x in top_ten_president])
    print_list = [top_10_string]
    return print_list

#for i in range(13):
# #    fun(df2)
top_list = []
for index, row in speeches.iterrows():
    top = my_function(row)
    top_list.append(top)

speeches['top ten'] = top_list

display(speeches)
print(speeches)

	president	year	party	html	top ten
0	Hoover	1929	republican	https://www.presidency.ucsb.edu/documents/annu...	[('public', 33),('law', 25),('service', 23),('...
1	FD_Roosvelt	1934	democrat	https://www.presidency.ucsb.edu/documents/annu...	[('industrial', 9),('work', 8),('recovery', 7)...
2	Truman	1949	democrat	https://www.presidency.ucsb.edu/documents/annu...	[('prosperity', 12),('production', 12),('power...
3	Eisenhower	1957	republican	https://www.presidency.ucsb.edu/documents/annu...	[('free', 16),('security', 16),('economy', 12)...
4	Kennedy	1961	democrat	https://www.presidency.ucsb.edu/documents/annu...	[('economic', 16),('development', 10),('peace'...
5	Lyndon	1965	democrat	https://www.presidency.ucsb.edu/documents/annu...	[('freedom', 12),('life', 9),('progress', 8),(...
6	Nixon	1974	republican	https://www.presidency.ucsb.edu/documents/addr...	[('peace', 27),('energy', 17),('war', 8),('pro...
7	Ford	1975	republican	https://www.presidency.ucsb.edu/documents/addr...	[('energy', 25),('oil', 20),('tax', 17),('econ...
8	Carter	1978	democrat	https://www.presidency.ucsb.edu/documents/the-...	[('inflation', 17),('economic', 14),('tax', 13...
9	Reagan	1985	republican	https://www.presidency.ucsb.edu/documents/addr...	[('freedom', 20),('tax', 16),('growth', 14),('...
10	Bush	1989	republican	https://www.presidency.ucsb.edu/documents/addr...	[('budget', 17),('work', 12),('hope', 10),('dr...
11	Clinton	1997	democrat	https://www.presidency.ucsb.edu/documents/addr...	[('children', 24),('work', 21),('budget', 17),...
12	Bush	2005	republican	https://www.presidency.ucsb.edu/documents/addr...	[('security', 29),('freedom', 20),('social', 1...
13	Obama	2013	democrat	https://www.presidency.ucsb.edu/documents/addr...	[('jobs', 32),('work', 20),('energy', 18),('fa...
14	Trump	2018	republican	https://www.presidency.ucsb.edu/documents/addr...	[('tax', 15),('last', 13),('together', 13),('w...
15	Joe	2022	democrat	https://www.presidency.ucsb.edu/documents/addr...	[('folks', 19),('see', 15),('families', 15),('...

      president  year       party  \
0        Hoover  1929  republican   
1   FD_Roosvelt  1934    democrat   
2        Truman  1949    democrat   
3    Eisenhower  1957  republican   
4       Kennedy  1961    democrat   
5        Lyndon  1965    democrat   
6         Nixon  1974  republican   
7          Ford  1975  republican   
8        Carter  1978    democrat   
9        Reagan  1985  republican   
10         Bush  1989  republican   
11      Clinton  1997    democrat   
12         Bush  2005  republican   
13        Obama  2013    democrat   
14        Trump  2018  republican   
15          Joe  2022    democrat   

                                                 html  \
0   https://www.presidency.ucsb.edu/documents/annu...   
1   https://www.presidency.ucsb.edu/documents/annu...   
2   https://www.presidency.ucsb.edu/documents/annu...   
3   https://www.presidency.ucsb.edu/documents/annu...   
4   https://www.presidency.ucsb.edu/documents/annu...   
5   https://www.presidency.ucsb.edu/documents/annu...   
6   https://www.presidency.ucsb.edu/documents/addr...   
7   https://www.presidency.ucsb.edu/documents/addr...   
8   https://www.presidency.ucsb.edu/documents/the-...   
9   https://www.presidency.ucsb.edu/documents/addr...   
10  https://www.presidency.ucsb.edu/documents/addr...   
11  https://www.presidency.ucsb.edu/documents/addr...   
12  https://www.presidency.ucsb.edu/documents/addr...   
13  https://www.presidency.ucsb.edu/documents/addr...   
14  https://www.presidency.ucsb.edu/documents/addr...   
15  https://www.presidency.ucsb.edu/documents/addr...   

                                              top ten  
0   [('public', 33),('law', 25),('service', 23),('...  
1   [('industrial', 9),('work', 8),('recovery', 7)...  
2   [('prosperity', 12),('production', 12),('power...  
3   [('free', 16),('security', 16),('economy', 12)...  
4   [('economic', 16),('development', 10),('peace'...  
5   [('freedom', 12),('life', 9),('progress', 8),(...  
6   [('peace', 27),('energy', 17),('war', 8),('pro...  
7   [('energy', 25),('oil', 20),('tax', 17),('econ...  
8   [('inflation', 17),('economic', 14),('tax', 13...  
9   [('freedom', 20),('tax', 16),('growth', 14),('...  
10  [('budget', 17),('work', 12),('hope', 10),('dr...  
11  [('children', 24),('work', 21),('budget', 17),...  
12  [('security', 29),('freedom', 20),('social', 1...  
13  [('jobs', 32),('work', 20),('energy', 18),('fa...  
14  [('tax', 15),('last', 13),('together', 13),('w...  
15  [('folks', 19),('see', 15),('families', 15),('...

American priorities shifted over time. As we can see, From Hoover (1929) until Nixon (1974) issues and words related to “freedom”, and “peace” were emphasized. This makes sense since during that time, WW2 and the Vietnam War were fought.

“Economy” seems like a topic that is popular in almost every presidency.

During Ford’s and Reagan’s presidency, “tax”, and “growth” became really hot buzz words. Especially during Reagan administration, big tax reforms were introduced which they have significantly reduce taxes for businesses.

“inflation” and “energy” were also popular during Nixon, Ford, and Reagan. It is important because especially during Nixon, the US economy after 14 years of economic development got in a stagflationary state; oil and gas crisis at the 70s was also a part of that.

During George W Bush (2005), “security” was the most popular word used. This is because especially after 911, security became the main focus of his presidency.

Healthcare gained importance during the Clinton Administration, and two administrations later, the Obama Administration expanded Medicaid. With the Covid-19 pandemic, healthcare again dominated the policy priorities in Biden’s 2022 address.

All the presidents, irrespective of their political affiliation (Democrats vs Republicans), mentioned about strengthening / growing the economy. Only presidents affiliated with the Democratic party seemed to emphasize on “healthcare”, whereas a common theme among the Republican presidents’ addresses was “war / military spending / terrorism”.

Top ten words of all Democrat Presidents

#Previous speech links were not scraped so we import the excel that includes them here for the analysis
democrat_speeches = pd.read_excel('democrat_speeches.xlsx')

#I use a function and include the code that want to apply for every president.
def my_function(x):
    r = requests.get(x['html'])
    r.encoding = 'utf-8' 
    html = r.text 
    #print(html[0:2000]) 

    soup = BeautifulSoup(html, "html.parser")
    president_text = soup.get_text()
    #print(president_text[0:2000])

    #THE DOCUMENT IS TOKENIZED -- SPLITTING TEXT INTO INDIVUDUAL WORDS
    tokenizer = nltk.tokenize.RegexpTokenizer('\w+')
    tokens = tokenizer.tokenize(president_text) 
    #print(tokens[0:10])

    #Defining stopwords and adding more to the list. This list is same across all 14 presidents' speeches. 
    sw = nltk.corpus.stopwords.words('english') 
    newsw = ['annual', 'number', 'help', 'thank', 'get',  'going', 'think', 'look', 'said', 
             'create', 'citizens', 'citizen', 'across', 'since', 'go', 'believe', 'say', 
             'long', 'better', 'plan', 'national', 'ask' '10', 'much', 'good', 'great', 
             'best', 'cannot', 'still', 'know', 'years', '1', 'major', 'want', 'able', 'put', 
             'capacity', 'programs', 'per', 'percent', 'million', 'act', 'provide', 'afford', 
             'needed', 'may', 'possible', 'full', '2', 'effort', 'meeting', 'address', 'ever', 
             'measures', 'ago', 'delivered', '5', 'program', 'past', 'future', 'need', 'needs', 
             'house', 'also', 'tonight', 'propose', 'toward', 'continue', 'society','country', 
             'seek', 'period', 'year', 'man', 'men', 'one', 'areas', 'begin', 'live', 'make', 
             'let', 'upon', 'well', 'office', 'meet', 'make' 'citizens', 'human', 'self', 'among', 
             'peoples', 'affairs', 'would', 'field', 'first', 'interest', 'today', 'recommendations', 
             'recomenndation', 'within', 'shall', 'administration', 'nation', 'nations', 'us', 'we', 
             'policy', 'legislation', 'time', 'new', 'many', 'several', 'few', 'government', 'world', 
             'people', 'united', 'states', 'system', 'every', 'people', 'must', '626','give', 
             'categories', '226762', '17608', '24532', '430', '38','statistics', 'analyses', 
             'miscellaneous', 'congressional', 'skip', 'content', 'documents', 'attributes', 'media', 'message', 
             'congress', 'state', 'union', 'america', 'american', 'americans', 'presidency', 'president', 
             'project', 'search', 'toggle', 'navigation', 'search', 'guidebook', 'archive', 'category', 
             'main', 'take','like','yet','j','000', 'ask', '1974', 'federal', 'http', 'www', 'usb',
             'edu', 'the', 'the', 'before', 'joint', 'session', 'the','american', 'america',
            'year','congress','let','time', 'nation', 'new', 'people']
    sw.extend(newsw)
    #print(sw[0:30])

    president_words = [token.lower() for token in tokens] 
    words_ns = [word for word in president_words if word not in sw] 
    #print(words_ns[:20])
    president_ns = " ".join(words_ns)

    #Determining the most commoon words 
    count = Counter(words_ns)
    top_ten_president = count.most_common(10)
    top_10_string = ','.join([str(x) for x in top_ten_president])
    print_list = [top_10_string]
    return print_list

#for i in range(13):
# #    fun(df2)
top_list = []
for index, row in democrat_speeches.iterrows():
    top = my_function(row)
    top_list.append(top)

democrat_speeches['top ten'] = top_list

display(democrat_speeches)

	president	year	party	html	top ten
0	FD_Roosvelt	1934	democrat	https://www.presidency.ucsb.edu/documents/annu...	[('industrial', 9),('work', 8),('recovery', 7)...
1	Truman	1949	democrat	https://www.presidency.ucsb.edu/documents/annu...	[('prosperity', 12),('production', 12),('power...
2	Kennedy	1961	democrat	https://www.presidency.ucsb.edu/documents/annu...	[('economic', 16),('development', 10),('peace'...
3	Lyndon	1965	democrat	https://www.presidency.ucsb.edu/documents/annu...	[('freedom', 12),('life', 9),('progress', 8),(...
4	Carter	1978	democrat	https://www.presidency.ucsb.edu/documents/the-...	[('inflation', 17),('economic', 14),('tax', 13...
5	Clinton	1997	democrat	https://www.presidency.ucsb.edu/documents/addr...	[('children', 24),('work', 21),('budget', 17),...
6	Obama	2013	democrat	https://www.presidency.ucsb.edu/documents/addr...	[('jobs', 32),('work', 20),('energy', 18),('fa...
7	Joe	2022	democrat	https://www.presidency.ucsb.edu/documents/addr...	[('folks', 19),('see', 15),('families', 15),('...

Top ten words of all Republican Presidents

#Previous speech links were not scraped so we import the excel that includes them here for the analysis
republican_speeches = pd.read_excel('republican_speeches.xlsx')

#I use a function and include the code that want to apply for every president.
def my_function(x):
    r = requests.get(x['html'])
    r.encoding = 'utf-8' 
    html = r.text 
    #print(html[0:2000]) 

    soup = BeautifulSoup(html, "html.parser")
    president_text = soup.get_text()
    #print(president_text[0:2000])

    #THE DOCUMENT IS TOKENIZED -- SPLITTING TEXT INTO INDIVUDUAL WORDS
    tokenizer = nltk.tokenize.RegexpTokenizer('\w+')
    tokens = tokenizer.tokenize(president_text) 
    #print(tokens[0:10])

    #Defining stopwords and adding more to the list. This list is same across all 14 presidents' speeches. 
    sw = nltk.corpus.stopwords.words('english') 
    newsw = ['annual', 'number', 'help', 'thank', 'get',  'going', 'think', 'look', 'said', 
             'create', 'citizens', 'citizen', 'across', 'since', 'go', 'believe', 'say', 
             'long', 'better', 'plan', 'national', 'ask' '10', 'much', 'good', 'great', 
             'best', 'cannot', 'still', 'know', 'years', '1', 'major', 'want', 'able', 'put', 
             'capacity', 'programs', 'per', 'percent', 'million', 'act', 'provide', 'afford', 
             'needed', 'may', 'possible', 'full', '2', 'effort', 'meeting', 'address', 'ever', 
             'measures', 'ago', 'delivered', '5', 'program', 'past', 'future', 'need', 'needs', 
             'house', 'also', 'tonight', 'propose', 'toward', 'continue', 'society','country', 
             'seek', 'period', 'year', 'man', 'men', 'one', 'areas', 'begin', 'live', 'make', 
             'let', 'upon', 'well', 'office', 'meet', 'make' 'citizens', 'human', 'self', 'among', 
             'peoples', 'affairs', 'would', 'field', 'first', 'interest', 'today', 'recommendations', 
             'recomenndation', 'within', 'shall', 'administration', 'nation', 'nations', 'us', 'we', 
             'policy', 'legislation', 'time', 'new', 'many', 'several', 'few', 'government', 'world', 
             'people', 'united', 'states', 'system', 'every', 'people', 'must', '626','give', 
             'categories', '226762', '17608', '24532', '430', '38','statistics', 'analyses', 
             'miscellaneous', 'congressional', 'skip', 'content', 'documents', 'attributes', 'media', 'message', 
             'congress', 'state', 'union', 'america', 'american', 'americans', 'presidency', 'president', 
             'project', 'search', 'toggle', 'navigation', 'search', 'guidebook', 'archive', 'category', 
             'main', 'take','like','yet','j','000', 'ask', '1974', 'federal', 'http', 'www', 'usb',
             'edu', 'the', 'the', 'before', 'joint', 'session', 'the','american', 'america',
            'year','congress','let','time', 'nation', 'new', 'people']
    sw.extend(newsw)
    #print(sw[0:30])

    president_words = [token.lower() for token in tokens] 
    words_ns = [word for word in president_words if word not in sw] 
    #print(words_ns[:20])
    president_ns = " ".join(words_ns)

    #Determining the most commoon words 
    count = Counter(words_ns)
    top_ten_president = count.most_common(10)
    top_10_string = ','.join([str(x) for x in top_ten_president])
    print_list = [top_10_string]
    return print_list

#for i in range(13):
# #    fun(df2)
top_list = []
for index, row in republican_speeches.iterrows():
    top = my_function(row)
    top_list.append(top)

republican_speeches['top ten'] = top_list

display(republican_speeches)

	president	year	party	html	top ten
0	Hoover	1929	republican	https://www.presidency.ucsb.edu/documents/annu...	[('public', 33),('law', 25),('service', 23),('...
1	Eisenhower	1957	republican	https://www.presidency.ucsb.edu/documents/annu...	[('free', 16),('security', 16),('economy', 12)...
2	Nixon	1974	republican	https://www.presidency.ucsb.edu/documents/addr...	[('peace', 27),('energy', 17),('war', 8),('pro...
3	Ford	1975	republican	https://www.presidency.ucsb.edu/documents/addr...	[('energy', 25),('oil', 20),('tax', 17),('econ...
4	Reagan	1985	republican	https://www.presidency.ucsb.edu/documents/addr...	[('freedom', 20),('tax', 16),('growth', 14),('...
5	Bush	1989	republican	https://www.presidency.ucsb.edu/documents/addr...	[('budget', 17),('work', 12),('hope', 10),('dr...
6	Bush	2005	republican	https://www.presidency.ucsb.edu/documents/addr...	[('security', 29),('freedom', 20),('social', 1...
7	Trump	2018	republican	https://www.presidency.ucsb.edu/documents/addr...	[('tax', 15),('last', 13),('together', 13),('w...

Can you conduct topic analysis of LDA using the speeches to determine what things presidents talk about in state of the union speeches?

Linear Discriminant Analysis (LDA) is like PCA, but it focuses on maximizing the seperatibility among known categories

type(df2.iloc[:,0])

pandas.core.series.Series

dataset = pd.read_excel('speeches53463.xlsx')

grouped = dataset.groupby('name')
dataset['date'] = pd.to_datetime(dataset['date'])

#print(dataset.head())
grouped = dataset.groupby('name')

dataset2 = dataset.loc[dataset.groupby('name').date.idxmin()]
#.filter(lambda x: x['date'] == min(x['date']))

import re
import numpy as np
    
# Print the titles of the first rows 
print(df2[[0]].head())

# Remove punctuation
dataset['title_processed'] = dataset['speech'].map(lambda x: re.sub('[,\.!?]', '', x))

# Convert the titles to lowercase
dataset['title_processed'] = dataset['title_processed'].map(lambda x: x.lower())

# Print the processed titles of the first rows 
dataset['title_processed'].head()

                                                   0
0  Address Before a Joint Session of the Congress...
1  Address Before a Joint Session of Congress on ...
2  Address Before a Joint Session of Congress on ...
3  Annual Message to the Congress on the State of...
4  Radio Address Summarizing the State of the Uni...

0    the presidentthank you thank you thank you goo...
1    the presidentthank you all very very much than...
2    thank you very much mr speaker mr vice preside...
3    the presidentmr speaker mr vice president memb...
4    the presidentmadam speaker mr vice president m...
Name: title_processed, dtype: object

# Load the library with the CountVectorizer method
from sklearn.feature_extraction.text import CountVectorizer
import numpy as np

# Helper function
def plot_10_most_common_words(count_data, count_vectorizer):
    import matplotlib.pyplot as plt
    words = count_vectorizer.get_feature_names()
    total_counts = np.zeros(len(words))
    for t in count_data:
        total_counts+=t.toarray()[0]
    
    count_dict = (zip(words, total_counts))
    count_dict = sorted(count_dict, key=lambda x:x[1], reverse=True)[0:10]
    words = [w[0] for w in count_dict]
    counts = [w[1] for w in count_dict]
    x_pos = np.arange(len(words)) 

    plt.bar(x_pos, counts,align='center')
    plt.xticks(x_pos, words, rotation=90) 
    plt.xlabel('words')
    plt.ylabel('counts')
    plt.title('10 most common words')
    plt.show()

# Initialise the count vectorizer with the English stop words
count_vectorizer = CountVectorizer(stop_words='english')

# Fit and transform the processed titles
count_data = count_vectorizer.fit_transform(dataset['title_processed'])

# Visualise the 10 most common words
plot_10_most_common_words(count_data, count_vectorizer)

Interpretation

While the most common words do not seem to reveal something particular, when those generic words are cleaned as we saw above, we get specific topics and buzz words from each president.

import warnings
warnings.simplefilter("ignore", DeprecationWarning)

# Load the LDA model from sk-learn
from sklearn.decomposition import LatentDirichletAllocation as LDA
 
# Helper function
def print_topics(model, count_vectorizer, n_top_words):
    words = count_vectorizer.get_feature_names()
    for topic_idx, topic in enumerate(model.components_):
        print("\nTopic #%d:" % topic_idx)
        print(" ".join([words[i]
                        for i in topic.argsort()[:-n_top_words - 1:-1]]))
        
# Tweak the two parameters below (use int values below 15)
number_topics = 5
number_words = 25

# Create and fit the LDA model
lda = LDA(n_components=number_topics)
lda.fit(count_data)

# Print the topics found by the LDA model
print("Topics found via LDA:")
print_topics(lda, count_vectorizer, number_words)

Topics found via LDA:

Topic #0:
world war nations peace people united nation free economic great shall congress year military security men national forces new freedom defense time power american government

Topic #1:
states government united congress country public great citizens people year time state war treaty foreign shall present subject american general power peace mexico law relations

Topic #2:
government congress law public federal business national great people country present labor power legislation action conditions shall men year world necessary work nation make time

Topic #3:
new america people year years congress american world government make americans help work time federal nation tax economy jobs let programs budget need health program

Topic #4:
states united government great congress treaty act year commerce spain public session present state nations france citizens war duties vessels relations minister effect subject commercial

Interpretation

Topic 0: aligns with internal state laws and political stability.

Topic 1: seems to align mostly with internal US issues. It is targeted to citizens stability (jobs) and needs.

Topic 2: Seems to be more related with tax policy and approach to the US economy.

Topic 3: It has to do with public policy and foreign affairs.

Topic 4: Mainly targeted to defence department and peace status maintenance while protecting the interests of the US.

Can you determine the sentiment of each state of the union using nltk’s Vader module?

analyzer=SentimentIntensityAnalyzer()
def polarity_score(text):
    if len(text)>0:
        score=analyzer.polarity_scores(text)['compound']
        return score
    else:
        return 0
dataset['polarityscore'] = dataset['speech'].apply(lambda text : polarity_score(text))
dataset['polarityscore']

0      0.9999
1      0.9999
2      1.0000
3      0.9999
4      0.9999
        ...  
254    0.9998
255    0.9994
256    0.9995
257    0.9991
258   -0.9997
Name: polarityscore, Length: 259, dtype: float64

def sentianamolybarplot(df):
    polarity_scale=[0.9991,0.9992,0.9993,0.9994,0.9995,0.9996,0.9997,0.9998,0.9999,1]
    #'Review_polarity' is column name of sentiment score calculated for whole review.
    df3=df[(df['polarityscore']>0)]
    out = pd.cut(df3['polarityscore'],polarity_scale)
    ax = out.value_counts(sort=False).plot.bar(rot=0, color="b", figsize=(12,8))
    for p in ax.patches:
        ax.annotate(str(p.get_height()), (p.get_x() * 1.040, p.get_height() * 1.015))
    plt.show()
sentianamolybarplot(dataset)

Interpretation

In the State of Union Speeches, the presidents talk about important issues facing Americans and offers their ideas on solving the nation’s problems, including suggestions for new laws and policies. As displayed in the plot, the polarity scores for 13 speeches (barring W. Bush) is positive. This is understandable as the State of Union speeches are a PR vehicle, leveraged to display the President’s power and positive influence.

Do speeches of different presidents cluster in any way that can allow you to determine their political party? How different are Biden and Trump according to this clustering?

def remove_noise(text, stop_words = nltk.corpus.stopwords.words('english')):
    newsw = ['annual', 'number', 'help', 'thank', 'get', 'going', 'think', 'look', 'said', 'create',
             'citizens', 'citizen', 'across', 'since', 'go', 'believe', 'say', 'long', 'better', 
             'plan', 'national', 'ask' '10', 'much', 'good', 'great', 'best', 'cannot', 'still', 
             'know', 'years', '1', 'major', 'want', 'able', 'put', 'capacity', 'programs', 'per', 
             'percent', 'million', 'act', 'provide', 'afford', 'needed', 'may', 'possible', 'full',
             '2', 'effort', 'meeting', 'address', 'ever', 'measures', 'ago', 'delivered', '5', 
             'program', 'past', 'future', 'need', 'needs', 'house', 'also', 'tonight', 'propose', 
             'toward', 'continue', 'society','country', 'seek', 'period', 'year', 'man', 'men', 
             'one', 'areas', 'begin', 'live', 'make', 'let', 'upon', 'well', 'office', 'meet', 
             'make' 'citizens', 'human', 'self', 'among', 'peoples', 'affairs', 'would', 'field', 
             'first', 'interest', 'today', 'recommendations', 'recomenndation', 'within', 'shall', 
             'administration', 'nation', 'nations', 'us', 'we', 'policy', 'legislation', 'time', 
             'new', 'many', 'several', 'few', 'government', 'world', 'people', 'united', 'states', 
             'system', 'every', 'people', 'must', '626','give', 'categories', '226762', '17608', 
             '24532', '430', '38','statistics', 'analyses', 'miscellaneous', 'congressional', 
             'skip', 'content', 'documents', 'attributes', 'media', 'message', 'congress', 'state',
             'union', 'america', 'american', 'americans', 'presidency', 'president', 'project', 
             'search', 'toggle', 'navigation', 'search', 'guidebook', 'archive', 'category', 'main', 
             'take','like','yet','j','000']
    stop_words = stop_words + newsw
    tokens = word_tokenize(text)
    cleaned_tokens = []
    for token in tokens:
        token = re.sub('[^A-Za-z0-9]+', '', token)
        if len(token) > 1 and token.lower() not in stop_words:
            # Get lowercase
            cleaned_tokens.append(token.lower())
    return cleaned_tokens

# Initialize TfidfVectorizer
tfidf_vectorizer = TfidfVectorizer(max_df = 0.8,
                                   max_features = 50,
                                   min_df = 0.1,
                                   tokenizer = remove_noise)

# Use the .fit_transform() method on the list plots
tfidf_matrix = tfidf_vectorizer.fit_transform(dataset['speech'].values)

num_clusters = 2

# Generate cluster centers through the kmeans function
cluster_centers, distortion = kmeans(tfidf_matrix.todense(), num_clusters)
# display(cluster_centers)
# Generate terms from the tfidf_vectorizer object
terms = tfidf_vectorizer.get_feature_names()

for i in range(num_clusters):
    print('Cluster: {}'.format(i+1))
    # Sort the terms and print top 3 terms
    center_terms = dict(zip(terms, list(cluster_centers[i])))
    sorted_terms = sorted(center_terms, key=center_terms.get, reverse=True)
    print(sorted_terms [:15])

Cluster: 1
['federal', 'economic', 'budget', 'work', 'tax', 'economy', 'security', 'jobs', 'nt', 'freedom', 'health', 'free', 'life', 'together', 'defense']
Cluster: 2
['treaty', 'subject', 'commerce', 'treasury', 'general', 'relations', 'powers', 'laws', 'duty', 'session', 'interests', 'rights', 'however', 'service', 'trade']

Interpretation

On clustering the popular words in the speech, it seems like Cluster 1 aligns with speeches by Republican presidents and Cluster 2 with that of speeches by Democartic presidents.

#Converting the datafram into list for further analysis.
speech_list = []
for i in range(len(dataset2)):
    speech_list.append(dataset2.iloc[[i]]['speech'].item())
    
titles = []
for i in range(len(dataset2)):
    titles.append(dataset2.iloc[[i]]['name'].item())
    
texts = [txt.split() for txt in speech_list]

# Create an instance of a PorterStemmer object
porter = PorterStemmer()

# For each token of each text, we generated its stem 
texts_stem = [[porter.stem(token) for token in text] for text in texts]

# Create a dictionary from the stemmed tokens
dictionary = corpora.Dictionary(texts_stem)

# Create a bag-of-words model for each speech, using the previously generated dictionary
bows = [dictionary.doc2bow(text) for text in texts_stem]

# Generate the tf-idf model
model = TfidfModel(bows)

# Compute the similarity matrix (pairwise distance between all speeches)
sims = similarities.MatrixSimilarity(model[bows])

# Transform the resulting list into a DataFrame
sim_df = pd.DataFrame(list(sims))

# Add the name of the presidents as columns and index of the DataFrame
sim_df.columns = titles
sim_df.index = titles

# Print the resulting matrix
sim_df

# creat function that adds two numbers

	Abraham Lincoln	Andrew Jackson	Andrew Johnson	Barack Obama	Benjamin Harrison	Calvin Coolidge	Chester A. Arthur	Donald J. Trump	Dwight D. Eisenhower	Franklin D. Roosevelt	...	Rutherford B. Hayes	Theodore Roosevelt	Thomas Jefferson	Ulysses S. Grant	Warren G. Harding	William Howard Taft	William J. Clinton	William McKinley	Woodrow Wilson	Zachary Taylor
Abraham Lincoln	1.000000	0.121246	0.151205	0.047722	0.110114	0.118578	0.114825	0.047754	0.075706	0.047841	...	0.092012	0.056717	0.110742	0.124756	0.082261	0.092794	0.046491	0.113427	0.077418	0.145153
Andrew Jackson	0.121246	1.000000	0.121070	0.053532	0.131924	0.086656	0.106385	0.054705	0.077823	0.047385	...	0.112656	0.070145	0.117074	0.146622	0.092593	0.113945	0.050358	0.102445	0.086139	0.155541
Andrew Johnson	0.151205	0.121070	1.000000	0.047996	0.097058	0.094947	0.079968	0.049593	0.073742	0.053389	...	0.082619	0.077785	0.093668	0.127001	0.090346	0.077516	0.049155	0.080710	0.082443	0.092401
Barack Obama	0.047722	0.053532	0.047996	1.000000	0.046357	0.069860	0.037358	0.213163	0.103428	0.096561	...	0.035344	0.069914	0.054143	0.047495	0.073191	0.049368	0.317297	0.050867	0.064416	0.039508
Benjamin Harrison	0.110114	0.131924	0.097058	0.046357	1.000000	0.095429	0.198889	0.043727	0.082837	0.068579	...	0.269502	0.057292	0.080109	0.146124	0.080189	0.125491	0.049069	0.142834	0.076850	0.134368
Calvin Coolidge	0.118578	0.086656	0.094947	0.069860	0.095429	1.000000	0.085849	0.072170	0.118543	0.078246	...	0.081089	0.087503	0.077970	0.098388	0.128878	0.092294	0.076209	0.074285	0.086982	0.092193
Chester A. Arthur	0.114825	0.106385	0.079968	0.037358	0.198889	0.085849	1.000000	0.042525	0.058818	0.042672	...	0.144648	0.045146	0.063310	0.179516	0.066345	0.137828	0.045936	0.123470	0.059776	0.132799
Donald J. Trump	0.047754	0.054705	0.049593	0.213163	0.043727	0.072170	0.042525	1.000000	0.077175	0.062748	...	0.040511	0.077699	0.046509	0.056097	0.065570	0.040417	0.183823	0.042885	0.056082	0.040505
Dwight D. Eisenhower	0.075706	0.077823	0.073742	0.103428	0.082837	0.118543	0.058818	0.077175	1.000000	0.108428	...	0.059357	0.074973	0.060950	0.085503	0.107023	0.072974	0.128444	0.065945	0.104088	0.069061
Franklin D. Roosevelt	0.047841	0.047385	0.053389	0.096561	0.068579	0.078246	0.042672	0.062748	0.108428	1.000000	...	0.064508	0.072999	0.052945	0.059127	0.094722	0.054787	0.086663	0.063699	0.077462	0.050414
Franklin Pierce	0.138249	0.149229	0.113608	0.039742	0.135522	0.096017	0.129588	0.034980	0.083958	0.055784	...	0.096431	0.059912	0.091416	0.149410	0.087211	0.114362	0.038283	0.098263	0.081036	0.181408
George Bush	0.039750	0.110718	0.045218	0.223350	0.039387	0.067593	0.030361	0.166730	0.101130	0.063581	...	0.029770	0.064276	0.043393	0.048644	0.064347	0.045680	0.236761	0.042725	0.064541	0.038623
George W. Bush	0.045682	0.045953	0.042901	0.244457	0.043902	0.076660	0.044497	0.170425	0.109930	0.069602	...	0.038201	0.047932	0.043533	0.048213	0.069961	0.033635	0.258270	0.042687	0.054859	0.033777
George Washington	0.075427	0.080150	0.054498	0.022271	0.046189	0.048199	0.043466	0.021537	0.033813	0.023598	...	0.056167	0.028022	0.075892	0.047378	0.038203	0.039933	0.022466	0.030561	0.040550	0.058378
Gerald R. Ford	0.040651	0.041807	0.043266	0.133494	0.045676	0.079921	0.053701	0.082173	0.135036	0.062518	...	0.035063	0.051462	0.039757	0.051069	0.071961	0.044571	0.183497	0.041057	0.048935	0.039580
Grover Cleveland	0.117943	0.128614	0.102102	0.036857	0.129553	0.088140	0.141185	0.043138	0.070560	0.049183	...	0.098994	0.069918	0.072758	0.129015	0.083239	0.154388	0.035827	0.105645	0.084410	0.169106
Harry S. Truman	0.060982	0.071213	0.065671	0.087225	0.070839	0.099180	0.058541	0.080372	0.165834	0.092160	...	0.055451	0.064122	0.054537	0.079809	0.106270	0.081560	0.105554	0.059129	0.081950	0.079267
Herbert Hoover	0.103231	0.079289	0.081658	0.070459	0.111400	0.127897	0.128176	0.049934	0.146132	0.097870	...	0.078349	0.062296	0.072092	0.110752	0.135878	0.098500	0.090548	0.076114	0.096796	0.091948
James Buchanan	0.074219	0.117172	0.085650	0.053138	0.213879	0.067012	0.108787	0.033511	0.050480	0.062365	...	0.181783	0.050247	0.071056	0.138804	0.063212	0.083642	0.036889	0.143667	0.058985	0.117347
James K. Polk	0.108668	0.127383	0.080034	0.034267	0.083551	0.063491	0.086467	0.037409	0.050591	0.038777	...	0.081692	0.041648	0.059918	0.124555	0.054302	0.098106	0.031113	0.069200	0.073058	0.204857
James Madison	0.071391	0.094473	0.062079	0.024049	0.066594	0.058868	0.067699	0.027369	0.047359	0.028658	...	0.071821	0.037817	0.098455	0.088595	0.049361	0.056839	0.027100	0.063285	0.035300	0.113097
James Monroe	0.121900	0.130480	0.097276	0.034281	0.096656	0.073366	0.089977	0.034260	0.062603	0.045482	...	0.076209	0.049221	0.112049	0.125313	0.073308	0.077521	0.033153	0.085300	0.060619	0.129747
Jimmy Carter	0.038933	0.046552	0.046946	0.240644	0.048041	0.088707	0.047153	0.162571	0.159608	0.088317	...	0.040429	0.069212	0.045362	0.048941	0.091093	0.049687	0.229270	0.049615	0.075476	0.042085
John Adams	0.085961	0.099759	0.060527	0.025965	0.059714	0.045513	0.077210	0.026649	0.044619	0.024816	...	0.053008	0.042880	0.083986	0.104180	0.036770	0.063388	0.029237	0.067327	0.038350	0.110925
John F. Kennedy	0.054274	0.064872	0.067020	0.145161	0.064716	0.087427	0.052211	0.091397	0.166903	0.082637	...	0.060584	0.066535	0.050081	0.067712	0.091555	0.065329	0.145564	0.086820	0.069619	0.051925
John Quincy Adams	0.134975	0.163353	0.091024	0.045042	0.101502	0.086493	0.104866	0.043737	0.062438	0.040328	...	0.084253	0.063036	0.126550	0.123556	0.071098	0.098951	0.040523	0.082057	0.072336	0.149257
John Tyler	0.131702	0.152333	0.117307	0.045652	0.123190	0.103976	0.117884	0.047375	0.062916	0.067584	...	0.141787	0.081356	0.119774	0.153659	0.101667	0.120820	0.052103	0.113313	0.072481	0.170983
Joseph R. Biden	0.030122	0.035307	0.032910	0.345184	0.028887	0.054890	0.031472	0.220390	0.068947	0.053099	...	0.021369	0.050805	0.034194	0.039431	0.052046	0.028445	0.306361	0.027510	0.054481	0.033634
Lyndon B. Johnson	0.045479	0.043621	0.046318	0.128715	0.035904	0.065005	0.038097	0.106125	0.100885	0.056544	...	0.033196	0.058649	0.040729	0.053333	0.065493	0.042870	0.168042	0.054452	0.047279	0.036023
Martin van Buren	0.123016	0.189609	0.095427	0.050744	0.148065	0.078894	0.110916	0.030465	0.075507	0.051770	...	0.114139	0.069005	0.121863	0.127558	0.084042	0.114678	0.043701	0.112850	0.086940	0.182542
Millard Fillmore	0.128602	0.168584	0.111184	0.040514	0.116195	0.083215	0.112446	0.046339	0.078221	0.067761	...	0.102227	0.080306	0.119655	0.143147	0.097297	0.102903	0.044584	0.100449	0.100466	0.200081
Richard Nixon	0.039255	0.049510	0.062639	0.139949	0.042007	0.068414	0.038798	0.105487	0.136241	0.066019	...	0.036937	0.061997	0.043634	0.054097	0.077172	0.050246	0.174663	0.044486	0.073935	0.037279
Ronald Reagan	0.042896	0.039901	0.044701	0.231009	0.048165	0.076235	0.040150	0.140081	0.140061	0.081909	...	0.034992	0.052896	0.045076	0.049203	0.080074	0.039373	0.300796	0.046391	0.057221	0.038241
Rutherford B. Hayes	0.092012	0.112656	0.082619	0.035344	0.269502	0.081089	0.144648	0.040511	0.059357	0.064508	...	1.000000	0.058833	0.074037	0.131257	0.078165	0.077687	0.046635	0.139910	0.066988	0.097658
Theodore Roosevelt	0.056717	0.070145	0.077785	0.069914	0.057292	0.087503	0.045146	0.077699	0.074973	0.072999	...	0.058833	1.000000	0.050832	0.071917	0.103435	0.060783	0.066718	0.069393	0.067994	0.059469
Thomas Jefferson	0.110742	0.117074	0.093668	0.054143	0.080109	0.077970	0.063310	0.046509	0.060950	0.052945	...	0.074037	0.050832	1.000000	0.093839	0.069619	0.051927	0.047307	0.067414	0.075135	0.109134
Ulysses S. Grant	0.124756	0.146622	0.127001	0.047495	0.146124	0.098388	0.179516	0.056097	0.085503	0.059127	...	0.131257	0.071917	0.093839	1.000000	0.091914	0.121181	0.060122	0.167308	0.080233	0.137702
Warren G. Harding	0.082261	0.092593	0.090346	0.073191	0.080189	0.128878	0.066345	0.065570	0.107023	0.094722	...	0.078165	0.103435	0.069619	0.091914	1.000000	0.081010	0.092189	0.096931	0.096031	0.070389
William Howard Taft	0.092794	0.113945	0.077516	0.049368	0.125491	0.092294	0.137828	0.040417	0.072974	0.054787	...	0.077687	0.060783	0.051927	0.121181	0.081010	1.000000	0.045340	0.082594	0.087100	0.124097
William J. Clinton	0.046491	0.050358	0.049155	0.317297	0.049069	0.076209	0.045936	0.183823	0.128444	0.086663	...	0.046635	0.066718	0.047307	0.060122	0.092189	0.045340	1.000000	0.051195	0.075808	0.044786
William McKinley	0.113427	0.102445	0.080710	0.050867	0.142834	0.074285	0.123470	0.042885	0.065945	0.063699	...	0.139910	0.069393	0.067414	0.167308	0.096931	0.082594	0.051195	1.000000	0.067011	0.103954
Woodrow Wilson	0.077418	0.086139	0.082443	0.064416	0.076850	0.086982	0.059776	0.056082	0.104088	0.077462	...	0.066988	0.067994	0.075135	0.080233	0.096031	0.087100	0.075808	0.067011	1.000000	0.081701
Zachary Taylor	0.145153	0.155541	0.092401	0.039508	0.134368	0.092193	0.132799	0.040505	0.069061	0.050414	...	0.097658	0.059469	0.109134	0.137702	0.070389	0.124097	0.044786	0.103954	0.081701	1.000000

43 rows × 43 columns

Here we can see the degree of similarity of each president’s speech with each other. There is no speech with similarity more than 40%.

# Compute the clusters from the similarity matrix,
# using the Ward variance minimization algorithm
Z = hierarchy.linkage(sim_df, 'ward')
plt.rcParams['figure.figsize'] = [10,20]
# Display this result as a horizontal dendrogram
a = hierarchy.dendrogram(Z,  leaf_font_size=20, labels=sim_df.index,  orientation="left")

Interpretation

In the dendrogram above, the presidents seem to be clustered based on the era that they served. Two big clusters are distinct – the green one which includes mostly recent presidents 20th and 21st century, and – the orange one which includes presidents of the 18th and 19th century.

Who was the president whose speech was the most similar to the speech of Biden in 2022?

v = sim_df[['Joseph R. Biden']]
v

	Joseph R. Biden
Abraham Lincoln	0.030122
Andrew Jackson	0.035307
Andrew Johnson	0.032910
Barack Obama	0.345184
Benjamin Harrison	0.028887
Calvin Coolidge	0.054890
Chester A. Arthur	0.031472
Donald J. Trump	0.220390
Dwight D. Eisenhower	0.068947
Franklin D. Roosevelt	0.053099
Franklin Pierce	0.030974
George Bush	0.227037
George W. Bush	0.205001
George Washington	0.012446
Gerald R. Ford	0.094785
Grover Cleveland	0.025062
Harry S. Truman	0.052615
Herbert Hoover	0.045206
James Buchanan	0.024993
James K. Polk	0.038190
James Madison	0.016895
James Monroe	0.019880
Jimmy Carter	0.198534
John Adams	0.016608
John F. Kennedy	0.094466
John Quincy Adams	0.024704
John Tyler	0.035437
Joseph R. Biden	1.000000
Lyndon B. Johnson	0.107517
Martin van Buren	0.030007
Millard Fillmore	0.024063
Richard Nixon	0.113265
Ronald Reagan	0.223184
Rutherford B. Hayes	0.021369
Theodore Roosevelt	0.050805
Thomas Jefferson	0.034194
Ulysses S. Grant	0.039431
Warren G. Harding	0.052046
William Howard Taft	0.028445
William J. Clinton	0.306361
William McKinley	0.027510
Woodrow Wilson	0.054481
Zachary Taylor	0.033634

# This is needed to display plots in a notebook
%matplotlib inline
plt.rcParams['figure.figsize'] = [10,10]

# Select the column corresponding to Biden's address and 
v = sim_df['Joseph R. Biden']

# Sort by ascending scores
v_sorted = v.sort_values(ascending=True)

# Plot this data has a horizontal bar plot
v_sorted.plot.barh(x='lab', y='val', rot=0).plot()

# Modify the axes labels and plot title for better readability
plt.xlabel("Cosine distance")
plt.ylabel("")
plt.title("Most similar to Biden's")

Text(0.5, 1.0, "Most similar to Biden's")

Interpretation

Biden’s address is most similar to that of Obama’s, follwed by Clinton and George Bush. As we can see, George Washington, John Adams and James Madison are the least similar to Biden. This can be explained by the different eras that each president lived. Speeches in 18th century are different than speeches of today. Biden’s speech similarity with Obama and Clinton makes sense also because they are all recently elected democrats.

Bonus points: (5 points): Develop and algorithm that can allow you to determine if the speech was given by a Democrat or by a republican.

PS2: I will go over this homework on Thursday to help you think through how to solve it. You will be able to recycle a lot of code discussed.

speeches

	president	year	party	html	top ten
0	Hoover	1929	republican	https://www.presidency.ucsb.edu/documents/annu...	[('public', 33),('law', 25),('service', 23),('...
1	FD_Roosvelt	1934	democrat	https://www.presidency.ucsb.edu/documents/annu...	[('industrial', 9),('work', 8),('recovery', 7)...
2	Truman	1949	democrat	https://www.presidency.ucsb.edu/documents/annu...	[('prosperity', 12),('production', 12),('power...
3	Eisenhower	1957	republican	https://www.presidency.ucsb.edu/documents/annu...	[('free', 16),('security', 16),('economy', 12)...
4	Kennedy	1961	democrat	https://www.presidency.ucsb.edu/documents/annu...	[('economic', 16),('development', 10),('peace'...
5	Lyndon	1965	democrat	https://www.presidency.ucsb.edu/documents/annu...	[('freedom', 12),('life', 9),('progress', 8),(...
6	Nixon	1974	republican	https://www.presidency.ucsb.edu/documents/addr...	[('peace', 27),('energy', 17),('war', 8),('pro...
7	Ford	1975	republican	https://www.presidency.ucsb.edu/documents/addr...	[('energy', 25),('oil', 20),('tax', 17),('econ...
8	Carter	1978	democrat	https://www.presidency.ucsb.edu/documents/the-...	[('inflation', 17),('economic', 14),('tax', 13...
9	Reagan	1985	republican	https://www.presidency.ucsb.edu/documents/addr...	[('freedom', 20),('tax', 16),('growth', 14),('...
10	Bush	1989	republican	https://www.presidency.ucsb.edu/documents/addr...	[('budget', 17),('work', 12),('hope', 10),('dr...
11	Clinton	1997	democrat	https://www.presidency.ucsb.edu/documents/addr...	[('children', 24),('work', 21),('budget', 17),...
12	Bush	2005	republican	https://www.presidency.ucsb.edu/documents/addr...	[('security', 29),('freedom', 20),('social', 1...
13	Obama	2013	democrat	https://www.presidency.ucsb.edu/documents/addr...	[('jobs', 32),('work', 20),('energy', 18),('fa...
14	Trump	2018	republican	https://www.presidency.ucsb.edu/documents/addr...	[('tax', 15),('last', 13),('together', 13),('w...
15	Joe	2022	democrat	https://www.presidency.ucsb.edu/documents/addr...	[('folks', 19),('see', 15),('families', 15),('...

df2[[0]]

	0
0	Address Before a Joint Session of the Congress...
1	Address Before a Joint Session of Congress on ...
2	Address Before a Joint Session of Congress on ...
3	Annual Message to the Congress on the State of...
4	Radio Address Summarizing the State of the Uni...
5	Fifth Annual Message \| The American Presidency...
6	First Annual Message \| The American Presidency...
7	First Annual Message \| The American Presidency...
8	First Annual Message \| The American Presidency...
9	Fifth Annual Message \| The American Presidency...
10	Fifth Annual Message \| The American Presidency...
11	State of the Union Message to the Congress: Ov...
12	State of the Union Message to the Congress on ...
13	Address Before a Joint Session of the Congress...

state_speeches = pd.read_excel('state_speeches.xlsx')
state_speeches

	president	year	party	html	address
0	Hoover	1929	republican	https://www.presidency.ucsb.edu/documents/annu...	The Constitution requires that the President "...
1	FD_Roosvelt	1934	democrat	https://www.presidency.ucsb.edu/documents/annu...	I COME before you at the opening of the Regula...
2	Truman	1949	democrat	https://www.presidency.ucsb.edu/documents/annu...	I am happy to report to this 81st Congress tha...
3	Eisenhower	1957	republican	https://www.presidency.ucsb.edu/documents/annu...	I appear before the Congress today to report o...
4	Kennedy	1961	democrat	https://www.presidency.ucsb.edu/documents/annu...	It is a pleasure to return from whence I came....
5	Lyndon	1965	democrat	https://www.presidency.ucsb.edu/documents/annu...	On this Hill which was my home, I am stirred b...
6	Nixon	1974	republican	https://www.presidency.ucsb.edu/documents/addr...	We meet here tonight at a time of great challe...
7	Ford	1975	republican	https://www.presidency.ucsb.edu/documents/addr...	Twenty-six years ago, a freshman Congressman, ...
8	Carter	1978	democrat	https://www.presidency.ucsb.edu/documents/the-...	Two years ago today we had the first caucus in...
9	Reagan	1985	republican	https://www.presidency.ucsb.edu/documents/addr...	I come before you to report on the state of ou...
10	Bush	1989	republican	https://www.presidency.ucsb.edu/documents/addr...	Mr. Speaker, Mr. President, and distinguished ...
11	Clinton	1997	democrat	https://www.presidency.ucsb.edu/documents/addr...	Mr. Speaker, Mr. Vice President, Members of th...
12	Bush	2005	republican	https://www.presidency.ucsb.edu/documents/addr...	As a new Congress gathers, all of us in the el...
13	Obama	2013	democrat	https://www.presidency.ucsb.edu/documents/addr...	Please, everybody, have a seat. Mr. Speaker, M...
14	Trump	2018	republican	https://www.presidency.ucsb.edu/documents/addr...	The President. Mr. Speaker, Mr. Vice President...
15	Joe	2022	democrat	https://www.presidency.ucsb.edu/documents/addr...	The President. Thank you all very, very much. ...


# Split training and testing data
X_train, X_test, y_train, y_test = train_test_split(state_speeches['address'], state_speeches['party'], test_size=0.3, 
                 random_state=53)

# Initialize count vectorizer
count_vectorizer = CountVectorizer(stop_words='english', 
                                   min_df=0.05, max_df=0.9)

# Create count train and test variables
count_train = count_vectorizer.fit_transform(X_train)
count_test = count_vectorizer.transform(X_test)

# Initialize tfidf vectorizer
tfidf_vectorizer = TfidfVectorizer(stop_words='english', 
                                   min_df=0.05, max_df=0.9)


# Create tfidf train and test variables
tfidf_train = tfidf_vectorizer.fit_transform(X_train)
tfidf_test = tfidf_vectorizer.transform(X_test)

tfidf_nb = MultinomialNB()
tfidf_nb.fit(tfidf_train, y_train)
tfidf_nb_pred = tfidf_nb.predict(tfidf_test)
tfidf_nb_score = metrics.accuracy_score(y_test, tfidf_nb_pred)

count_nb = MultinomialNB()
count_nb.fit(count_train, y_train)
count_nb_pred = count_nb.predict(count_test)
count_nb_score = metrics.accuracy_score(y_test, count_nb_pred)

print('NaiveBayes Tfidf Score: ', tfidf_nb_score)
print('NaiveBayes Count Score: ', count_nb_score)

NaiveBayes Tfidf Score:  0.2
NaiveBayes Count Score:  0.2

%matplotlib inline
from sklearn.metrics import plot_confusion_matrix


tfidf_nb_cm = metrics.confusion_matrix(y_test, tfidf_nb_pred, labels=['republican', 'democrat'])
count_nb_cm = metrics.confusion_matrix(y_test, count_nb_pred, labels=['republican', 'democrat'])

# plot_confusion_matrix(tfidf_nb_cm, classes=['republican', 'democrat'], title="TF-IDF NB Confusion Matrix")

# plot_confusion_matrix(count_nb_cm, classes=['republican', 'democrat'], title="Count NB Confusion Matrix", figure=1)

Interpretation

It looks like that algorithm’s power to identify whether the speech comes from a democrat or a republican is only 20%. In this case, for the algorithm to get stronger, more speeches are necessary from both sides, and maybe more text cleaning.

pip install jupyterthemes

Requirement already satisfied: jupyterthemes in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (0.20.0)
Requirement already satisfied: matplotlib>=1.4.3 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from jupyterthemes) (3.4.3)
Requirement already satisfied: lesscpy>=0.11.2 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from jupyterthemes) (0.15.0)
Requirement already satisfied: notebook>=5.6.0 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from jupyterthemes) (6.4.5)
Requirement already satisfied: ipython>=5.4.1 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from jupyterthemes) (7.29.0)
Requirement already satisfied: jupyter-core in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from jupyterthemes) (4.8.1)
Requirement already satisfied: setuptools>=18.5 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.4.1->jupyterthemes) (58.0.4)
Requirement already satisfied: appnope in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.4.1->jupyterthemes) (0.1.2)
Requirement already satisfied: traitlets>=4.2 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.4.1->jupyterthemes) (5.1.0)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.4.1->jupyterthemes) (3.0.20)
Requirement already satisfied: pygments in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.4.1->jupyterthemes) (2.10.0)
Requirement already satisfied: backcall in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.4.1->jupyterthemes) (0.2.0)
Requirement already satisfied: pexpect>4.3 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.4.1->jupyterthemes) (4.8.0)
Requirement already satisfied: matplotlib-inline in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.4.1->jupyterthemes) (0.1.2)
Requirement already satisfied: decorator in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.4.1->jupyterthemes) (5.1.0)
Requirement already satisfied: pickleshare in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.4.1->jupyterthemes) (0.7.5)
Requirement already satisfied: jedi>=0.16 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from ipython>=5.4.1->jupyterthemes) (0.18.0)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from jedi>=0.16->ipython>=5.4.1->jupyterthemes) (0.8.2)
Requirement already satisfied: six in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from lesscpy>=0.11.2->jupyterthemes) (1.16.0)
Requirement already satisfied: ply in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from lesscpy>=0.11.2->jupyterthemes) (3.11)
Requirement already satisfied: python-dateutil>=2.7 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from matplotlib>=1.4.3->jupyterthemes) (2.8.2)
Requirement already satisfied: cycler>=0.10 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from matplotlib>=1.4.3->jupyterthemes) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from matplotlib>=1.4.3->jupyterthemes) (1.3.1)
Requirement already satisfied: pyparsing>=2.2.1 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from matplotlib>=1.4.3->jupyterthemes) (3.0.4)
Requirement already satisfied: pillow>=6.2.0 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from matplotlib>=1.4.3->jupyterthemes) (8.4.0)
Requirement already satisfied: numpy>=1.16 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from matplotlib>=1.4.3->jupyterthemes) (1.20.3)
Requirement already satisfied: jinja2 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from notebook>=5.6.0->jupyterthemes) (2.11.3)
Requirement already satisfied: pyzmq>=17 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from notebook>=5.6.0->jupyterthemes) (22.2.1)
Requirement already satisfied: argon2-cffi in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from notebook>=5.6.0->jupyterthemes) (20.1.0)
Requirement already satisfied: jupyter-client>=5.3.4 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from notebook>=5.6.0->jupyterthemes) (6.1.12)
Requirement already satisfied: Send2Trash>=1.5.0 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from notebook>=5.6.0->jupyterthemes) (1.8.0)
Requirement already satisfied: prometheus-client in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from notebook>=5.6.0->jupyterthemes) (0.11.0)
Requirement already satisfied: nbformat in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from notebook>=5.6.0->jupyterthemes) (5.1.3)
Requirement already satisfied: ipykernel in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from notebook>=5.6.0->jupyterthemes) (6.4.1)
Requirement already satisfied: ipython-genutils in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from notebook>=5.6.0->jupyterthemes) (0.2.0)
Requirement already satisfied: tornado>=6.1 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from notebook>=5.6.0->jupyterthemes) (6.1)
Requirement already satisfied: terminado>=0.8.3 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from notebook>=5.6.0->jupyterthemes) (0.9.4)
Requirement already satisfied: nbconvert in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from notebook>=5.6.0->jupyterthemes) (6.1.0)
Requirement already satisfied: ptyprocess>=0.5 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from pexpect>4.3->ipython>=5.4.1->jupyterthemes) (0.7.0)
Requirement already satisfied: wcwidth in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=5.4.1->jupyterthemes) (0.2.5)
Requirement already satisfied: cffi>=1.0.0 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from argon2-cffi->notebook>=5.6.0->jupyterthemes) (1.14.6)
Requirement already satisfied: pycparser in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from cffi>=1.0.0->argon2-cffi->notebook>=5.6.0->jupyterthemes) (2.20)
Requirement already satisfied: debugpy<2.0,>=1.0.0 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from ipykernel->notebook>=5.6.0->jupyterthemes) (1.4.1)
Requirement already satisfied: MarkupSafe>=0.23 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from jinja2->notebook>=5.6.0->jupyterthemes) (1.1.1)
Requirement already satisfied: entrypoints>=0.2.2 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from nbconvert->notebook>=5.6.0->jupyterthemes) (0.3)
Requirement already satisfied: jupyterlab-pygments in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from nbconvert->notebook>=5.6.0->jupyterthemes) (0.1.2)
Requirement already satisfied: pandocfilters>=1.4.1 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from nbconvert->notebook>=5.6.0->jupyterthemes) (1.4.3)
Requirement already satisfied: nbclient<0.6.0,>=0.5.0 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from nbconvert->notebook>=5.6.0->jupyterthemes) (0.5.3)
Requirement already satisfied: bleach in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from nbconvert->notebook>=5.6.0->jupyterthemes) (4.0.0)
Requirement already satisfied: testpath in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from nbconvert->notebook>=5.6.0->jupyterthemes) (0.5.0)
Requirement already satisfied: defusedxml in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from nbconvert->notebook>=5.6.0->jupyterthemes) (0.7.1)
Requirement already satisfied: mistune<2,>=0.8.1 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from nbconvert->notebook>=5.6.0->jupyterthemes) (0.8.4)
Requirement already satisfied: async-generator in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->notebook>=5.6.0->jupyterthemes) (1.10)
Requirement already satisfied: nest-asyncio in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->notebook>=5.6.0->jupyterthemes) (1.5.1)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from nbformat->notebook>=5.6.0->jupyterthemes) (3.2.0)
Requirement already satisfied: attrs>=17.4.0 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat->notebook>=5.6.0->jupyterthemes) (21.2.0)
Requirement already satisfied: pyrsistent>=0.14.0 in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat->notebook>=5.6.0->jupyterthemes) (0.18.0)
Requirement already satisfied: packaging in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from bleach->nbconvert->notebook>=5.6.0->jupyterthemes) (21.0)
Requirement already satisfied: webencodings in /Users/loizoskon/opt/anaconda3/lib/python3.9/site-packages (from bleach->nbconvert->notebook>=5.6.0->jupyterthemes) (0.5.1)
Note: you may need to restart the kernel to use updated packages.

!jt -l

Available Themes: 
   chesterish
   grade3
   gruvboxd
   gruvboxl
   monokai
   oceans16
   onedork
   solarizedd
   solarizedl