As part of a project, I needed to write a class for LDA with Gensim. I thought I would share it here… It’s based on another tutorial I found online, but it’s been modified and is a bit more reusable now. Readme is located in the repository here.

LDA Example

Using COVID-19 dataset sample - first 300 news articles

https://aylien.com/blog/free-coronavirus-news-dataset

Import Data

In [1]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
pd.set_option('display.max_colwidth', None)

covid_df = pd.read_csv('data/covid_df_tester.csv', sep='\t')
In [2]:
covid_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300 entries, 0 to 299
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   source_id      300 non-null    int64 
 1   source_domain  300 non-null    object
 2   author_id      300 non-null    int64 
 3   author_name    252 non-null    object
 4   title          300 non-null    object
 5   body           300 non-null    object
dtypes: int64(2), object(4)
memory usage: 14.2+ KB
In [3]:
covid_df.head()
Out[3]:
source_id source_domain author_id author_name title body
0 1737 Complex 973106 Gavin Evans British Prime Minister Boris Johnson Hospitalized 10 Days After COVID-19 Diagnosis On Sunday, British Prime Minister Boris Johnson was hospitalized "for tests" because of "persistent" COVID-19 symptoms 10 days after he tested positive, CNN reports. \nJohnson reportedly went to the unspecified London hospital after his doctor advised him to do so. A press release from his office called the move "precautionary." \nOn March 26, Johnson revealed he had tested positive and that he had been dealing with symptoms since that date. Britain had gone into lockdown two days earlier.\nSince the 26th, Johnson has been quarantined at his Downing Street residence. He is the first known world leader to have contracted the virus. \nRoughly a month ago, right around the time the U.K. started dealing with an outbreak, Johnson garnered media coverage for saying he'd shook hands with coronavirus patients during a hospital visit.  \n"I shook hands with everybody, you will be pleased to know, and I continue to shake hands," Johnson said during a press conference that took place on March 3. His positive test was registered 23 days later. \nOn Saturday, Johnson's fiancée, Carrie Symonds, tweeted out that she'd spent a week in bed with coronavirus symptoms. She had not officially been tested for the disease, but said she felt "stronger" and "on the mend" following the week of rest:
1 433 SBS 940858 Australia NSW coronavirus death toll hits 18 as cases rise to 2637 NSW has now recorded 18 COVID-19 deaths as the state's total number of cases rises to 2637.\n\nNSW Health said on Monday the state had recorded 57 new cases - a drop on the previous day which was partly explained by fewer tests being done over the weekend.\n\nThe death toll rose to 18 after the deaths of an 86-year-old man and an 85-year-old man on Sunday.\n\nIt comes after NSW Police Commissioner Mick Fuller on Sunday announced an investigation into the circumstances surrounding the docking and disembarkation of passengers from the ill-fated Ruby Princess cruise ship.\n\nThe investigation - led by the NSW police homicide squad - aims to identify how passengers were allowed to disembark from the ship in Sydney, which is linked to 622 COVID-19 cases and at least 11 deaths across the country.\n\n"The only way I can get to the bottom of whether our national biosecurity laws and our state laws were broken is through a criminal investigation," Mr Fuller said.\n\nMr Fuller told reporters transparency regarding patient health on board the cruise ship was a key question for the investigation.\n\nThe ship will dock in Port Kembla, near Wollongong on Monday.\n\nIt's expected to spend up to 10 days docked for medical assessments, treatment or emergency extractions of the crew, NSW Police say.\n\nThe investigation will cover the actions of the port authority, ambulance, police, the NSW Health department and Carnival Australia.\n\nThe NSW government on Sunday urged young people to take the COVID-19 pandemic seriously, revealing more than a quarter of the state's current coronavirus cases are in people aged under 29.
2 460 Hindustan Times 941178 Ht Correspondent Industry in Chandigarh will need major impetus by government post lockdown, say businessmen ChandigarhWith shops and manufacturing units closed due to the curfew imposed to stop the spread of coronavirus disease (Covid-19), traders fear economic recovery will be difficult.It is for the first time that all business activity, trading and manufacturing, has been shut down in the city.“There is great uncertainty among businessmen as to what the future holds. People have even stopped planning how to manage the after-effects of the shutdown. We also don’t know for how long businesses will remain disturbed because of the Covid-19 pandemic,” said Neeraj Bajaj of the Chandigarh Business Council.In the city’s industrial area, more than 30,000 are employed in manufacturing and service units.“Beyond the short-term struggle, the industry will need major impetus from the government in the short and long term. There should be moratorium period or extension of six months for payment of liabilities including utility bills, taxes and duties to the government,” said Pankaj Khanna, president, industries association, Chandigarh.Industry is also seeking for loans accounts, which become non-performing assets (NPA) during lockdown, to not be considered wilful defaulters.“Stimulus package for MSMEs (micro, small and medium enterprises) should be considered for the financial year 2020-2021. Financial support for unorganized sectors will also be required,” Khanna said.Chandigarh Beopar Mandal wrote to Prime Minister Narendra Modi and Punjab governor and UT administrator VP Singh Badnore on Friday, seeking help, as they struggle to deal with the acute economic crisis caused by the pandemic.“We request the government to allow us to pay our employees ₹4,000 to ₹5,000 per month as ration cost till the lockdown continues,” the traders’ body stated in its communiqué.“Most of our traders have taken overdraft/CC limits or term loans in order to run their businesses. When there isno business activity, paying these interests is also a big challenge,” said Charanjiv Singh, chairman of the Chandigarh Beopar Mandal.FACTORY WORKERS ISSUED CURFEW PASSES TO FACILITATE WAGE DISBURSEMENT TO LABOURTo resolve the issue of payment of wages to more than 25,000 factory workers in the city, the UT administration has started issuing curfew passes to factory officials to allow them to disburse wages.“So far, we have issued 120 curfew passes for factory officials. In case there are any problems regarding this, we are taking them up on priority basis,” said Harjit Singh Sandhu, director, industries.Due to restrictions on movement imposed because of the curfew, both factory owners and workers have been facing problems with the disbursement of wages. Various industry associations had taken up the issue with the administration.The UT labour department, too, has contacted industrialists and factory owners regarding payment of wages to the labourers. “Around 10,000 labourers have already received their salaries. Efforts are being made to ensure that the remaining labour is paid wages at the earliest,” said a senior UT official, wishing not to be named.Meanwhile, household workers like maids, gardeners, etc are finding it difficult to collect their monthly salaries because of curfew restrictions. “The administration should devise a plan that allows household workers to collect their salaries from houses of their employers. They are not able to visit their workplaces and they don’t have curfew passes either. This is causing them a great deal of hardship,” said RK Garg, a city based social activist.
3 460 Hindustan Times 1588290 Amanjeet Singh Salyal Coronavirus in Chandigarh: Follow advisories, one cannot be too careful, says 23-year-old discharged patient Chandigarh The 23-year-old man, discharged from the isolation ward of Government Medical College and Hospital (GMCH), Sector 32, after two weeks of treatment, is elated to be home.“I am feeling perfectly fine,” he said, as he arrived at his house in Sector 19 around 6pm. Son of a senior UT official, the youth had tested positive for Covid-19 on March 22 after he came in contact with the brother of Chandigarh’s first coronavirus patient, a 23-year-old woman from Sector 21.“One cannot be too careful. Though I did not have severe symptoms, all of us, even the young, are not immune to it. Advisories for social distancing and hand washing should be followed religiously,” he said.“During the course of my stay at the hospital, I learnt that there is nothing to be afraid of, but we need to be careful. My morale was high and I was confident of quick recovery,” he said, while adding that those affected by the disease should not lose hope as recovery was possible.The youth reported to the hospital after his fever rose, alerting him that he could be infected. “In isolation, even though I was not allowed to move out, I spent my days reading and talking to my friends. The doctors, nurses and the supporting staff were immensely helpful,” he said.The 25-year-old brother of the city’s first positive patient was also discharged from GMCH on Saturday. He had tested positive on March 20 after coming in contact with his sister, who flew back from the United Kingdom on March 15 and tested positive on March 18. Their mother, who had also contracted the infection, was discharged on Saturday.On his recovery, the youth said following the instructions of the doctors was vital to get cured at the earliest. “Different bodies react to the virus differently. I did not have many symptoms, while other positive patients had cough and fever. So, precautions are a must to contain the spread of the virus,” he said.
4 460 Hindustan Times 941178 Ht Correspondent Crackers sound jarring note as Chandigarh tricity lights up on PM Modi’s solemn plea CHANDIGARH The stillness which had become so much a part of the tricity over the last two weeks was shattered by exploding firecrackers on Monday night as people in their enthusiasm to follow Prime Minister Narendra Modi’s solemn call for a candlelight vigil to unite to fight the coronavirus epidemic exceeded their brief to keep things low-key.Even as residents, following PM Modi’s Friday call to show solidarity and battle the pandemic, switched on their mobile phone lights and lit candles and diyas on Sunday at 9 pm, the sound of crackers sounded a jarring note. “This is the height of insanity and insensitivity to celebrate death and disease with so much pomp, show and bursting of crackers. This is a real wake of call for people who love India and the Indian ethos of compassion and humanism above all,” said, Pramod Sharma, who heads Yuvsatta, an NGO. “It was very peaceful initially. People had lit candles in their balconies, many were playing bhajans on their music systems, when suddenly crackers started popping. This was a solemn occasion, not something to celebrate,” said Nayna, a resident of Sector 43.Dr Ravindra Khaiwal , additional professor, environment health, The School of Public Health, Post Graduate Institute of Medical Education and Research, said, “The purpose (of lighting diyas) was for showing solidarity with the citizens who are at the forefront of dealing with the coronavirus outbreak, but a few people resorted to bursting of crackers. That’s an undesired step,” he said. “We are in the midst of a pandemic and people are celebrating as if it’s Diwali,” quipped Mohali mayor Kulwant Singh. Former railways minister and Congressman Pawan Kumar Bansal felt this showed “the over enthusiasm and hype built up over the PM’s announcement. What was supposed to be voluntary action has now become a must-do thing.”Crackers were burst in most parts of the tricity. In some localities, enthusiastic Bharatiya Janata Party workers resorted to sloganeering in favour of their party and PM Modi. “It appears that people, by bursting crackers, did not understand the sensitivity involved in the issue. The call was not for this,” said Dr Pramod Kumar, director, Institute for Development and Communication (IDC), Chandigarh.

Split Data and Create LDA Object

In [4]:
from LDA import LDAClassification
from sklearn.model_selection import train_test_split

# Grab body text and convert to list
covid_df_list = covid_df['body'].values.tolist()

# Train Test Split
covid_df_train, covid_df_test = train_test_split(covid_df_list, test_size=0.3, random_state=42, shuffle=True)

# Build model
# Exclude common words to produce some variety in topics
lda_c = LDAClassification(stop_words_extend=['coronavirus', 'covid 19', 'covid-19', 'covid'])

Find optimal topics

In [5]:
import matplotlib.pyplot as plt

# Try up to 20 topics
limit=20; start=2; step=2;

# Can take a long time to run.
model_list, perplexity_values, coherence_values = lda_c.train_find_optimal_topics(text_list=covid_df_train, start=start, limit=limit, step=step)

# Show graph
x = range(start, limit, step)

# create figure and axis objects with subplots()
fig,ax = plt.subplots()
# make a plot
ax.plot(x, coherence_values, color="blue", marker="o")
# set x-axis label
ax.set_xlabel("Num Topics",fontsize=14)
# set y-axis label
ax.set_ylabel("Coherence Score",color="blue",fontsize=14)

# twin object for two different y-axis on the sample plot
ax2=ax.twinx()

# make a plot with different y-axis using second axis object
ax2.plot(x, perplexity_values,color="red",marker="o")
ax2.set_ylabel("Perplexity",color="red",fontsize=14)
plt.show()

Predict

In [6]:
# Train with optimal topics
lda_c.train(covid_df_train, num_topics=8)

# Predict with FULL set
df_dominant_topic, df_topic_distributions = lda_c.predict(covid_df_list)
Perplexity:  -7.85632162344386
Coherence Score:  0.4718676259818084
In [7]:
df_dominant_topic
Out[7]:
Document_No Dominant_Topic Topic_Perc_Contrib Keywords Text
0 0 3 0.9946 say, people, indigenous, go, case, health, death, also, take, many [british_prime, hospitalize, test, persistent_symptom, day, tested_positive, report, reportedly, go, unspecified, hospital, doctor, advise, press, release, office, call, move, precautionary, reveal, tested_positive, deal, symptom, go, lockdown, day, early, quarantine, residence, first, know, world, leader, contract, virus, roughly, month, ago, right, time, start, deal, outbreak, garner, medium, coverage, say, shake, hand, patient, hospital, visit, shake, hand, please, know, continue, shake, hand, say, press, conference, take, place, positive, test, register, day, later, fiancee, carrie_symond, tweet, spend, week, bed, symptom, officially, test, disease, say, feel, strong, mend, follow, week, rest]
1 1 3 0.9069 say, people, indigenous, go, case, health, death, also, take, many [record, death, state, total, number, case, rise, health, say, state, record, new, case, drop, previous, day, partly, explain, few, test, do, weekend, death_toll, rise, death, man, man, come, police, announce, investigation, circumstance, surround, docking, disembarkation, passenger, ill, fate, ruby_princess, cruise_ship, investigation, lead, aim, identify, passenger, allow, disembark, ship, link, case, least, death, country, way, get, bottom, national, biosecurity, law, state, law, break, criminal, investigation, say, full, told_reporter, transparency, regard, patient, health, board, cruise_ship, key, question, investigation, ship, dock, port, kembla, expect, spend, day, dock, medical, assessment, treatment, emergency, extraction, crew, police, say, investigation, cover, action, port, authority, ambulance, urge, young, ...]
2 2 3 0.5258 say, people, indigenous, go, case, health, death, also, take, many [chandigarhwith, shop, manufacturing, unit, close, due, curfew, impose, stop, spread, disease, trader, fear, economic, recovery, difficult, first, time, business, activity, trading, manufacturing, shut, city, great, uncertainty, businessman, future, hold, people, even, stop, plan, manage, effect, shutdown, also, know, long, business, remain, disturbed, pandemic, say, industrial, area, employ, manufacturing, service, unit, short, term, struggle, industry, need, major, impetus, government, short, moratorium, period, extension, month, payment, liability, include, utility, bill, taxis, duty, government, say, also, seek, loan, account, become, non, perform, asset, npa, lockdown, consider, wilful, defaulter, stimulus, package, msme, micro, small, medium, enterprise, consider, financial, year, financial, support, unorganized, sector, also, ...]
3 3 3 0.8975 say, people, indigenous, go, case, health, death, also, take, many [chandigarh, man, discharge, isolation, ward, government, medical, college, hospital, gmch, sector, two_week, treatment, elate, home, feel, perfectly, fine, said, arrive, house, sector, pm, son, senior, official, youth, tested_positive, march, come, contact, brother, first, patient, woman, sector, one, careful, severe, symptom, even, young, immune, advisory, social_distance, hand, washing, follow, religiously, say, course, stay, hospital, learn, afraid, need, careful, morale, high, confident, quick, recovery, say, add, affected, disease, lose, hope, recovery, possible, youth, report, hospital, fever, rise, alert, infect, isolation, allow, move, spend, day, read, talk, friend, doctor, nurse, support, staff, immensely, helpful, say, brother, city, first, positive, patient, also, discharge, come, ...]
4 4 3 0.9385 say, people, indigenous, go, case, health, death, also, take, many [chandigarh, stillness, become, much, part, tricity, last, two_week, shatter, explode, firecracker, night, people, enthusiasm, follow, prime_minister, solemn, call, candlelight, vigil, unite, fight, epidemic, exceed, brief, keep, thing, low, key, even, resident, follow, pm, show, solidarity, battle, pandemic, switch, mobile, phone, light, light, candle, diyas, pm, sound, cracker, sound, jarring, note, height, insanity, insensitivity, celebrate, death, disease, much, pomp, show, burst, cracker, real, wake, call, people, love, humanism, say, head, peaceful, initially, people, light, candle, balcony, many, play, bhajan, music, system, suddenly, cracker, start, pop, solemn, occasion, celebrate, say, resident, sector, additional, professor, environment, health, school, public, health, post, medical, research, ...]
... ... ... ... ... ...
295 295 3 0.6989 say, people, indigenous, go, case, health, death, also, take, many [early, day, administration, impassioned, group, mental, health, professional, warn, public, president, cramp, disordered, mind, darken, attic, flutter, bat, assessment, controversial, ethic, expressly, forbid, member, diagnose, public, figure, afar, enough, enough, argue, person, analysis, donald_trump, reveal, hide, depth, internal, sonar, barely, fathom, bottom, sink, exceptional, urgent, time, back, conservative, lawyer, wife, write, long, devastating, note, trump, hallmark, narcissistic, personality, disorder, disorder, dangerous, enough, time, prosperity, jeopardize, moral, institutional, foundation, country, global, president, pathology, endanger, institution, life, let, start, basic, first, narcissistic_personalitie, harbor, skyscraping, delusion, capability, exaggerate, accomplishment, focus, obsessively, project, power, wish, desperately, win, mean, trump, say, get, plenty, test, available, ...]
296 296 3 0.5398 say, people, indigenous, go, case, health, death, also, take, many [spend, large, part, year, marriage, live, different, continent, man, endure, year, separation, mark, work, presenter, film, isolate, together, home, speak, chat, former, say, year, long, distance, relationship, lock, girl, dream, use, spend, time, call, talk, hour, phone, different, time, zone, together, minute, day, nice, sit, together, evening, front, tv, worry, rush, time, important, couple, together, cook, together, lot, crack, chef, make, lot, delicious, banana, bread, let, sous, chef, help, really, think, period, go, make, grateful, amazing, friend, family, mark, connect, friend, live, share, little, much, recent, chat, pop, star, turn, camera, sit, bed, dress, gown, former, corrie, actress, quickly, shout, ...]
297 297 3 0.8062 say, people, indigenous, go, case, health, death, also, take, many [turn, light, nationwide, candle, light, vigil, heed, call, unity, country, battle, ask, citizen, observe, minute, electricity, local, time, urge, challenge, darkness, lighting, candle, lamp, million, respond, lighting, night, sky, show, unity, salute, light, lamp, bring, auspiciousness, health, prosperity, destroy, negative, feeling, tweet, time, vigil, critic, dismiss, event, stunt, argue, distract, health, economic, crisis, cause, pandemic, impose, nationwide, lockdown, announce, little, warning, leave, million, strand, food, million, observe, minute, vigil, call, confirm, infection, death, late, figure, johns_hopkin, university, say, true, figure, however, think, far, high, low, testing, rate, world, effort, way, capacity, lockdown, fear, major, outbreak, country, world, densely, populated, result, ...]
298 298 3 0.8999 say, people, indigenous, go, case, health, death, also, take, many [worry, go, preach, month, later, dead, love, laugh, love, play, guitar, play, guitar, even, suppose, say, good, man, world, day, wife, child, hope, hold, large, celebratory, memorial, make, funeral, guest, include, blue, guitarist, play, graveside, little, month, ago, drive, wife, view, festivity, opportunity, music, save, soul, hundred, thousand, people, attend, join, daughter, come, go, pub, club, bar, play, blue, connect, musician, tell, love, say, time, new_york, new, year, sea, people, drink, party, say, loud, laugh, element, recent_year, pastor_spradlin, realise, dream, use, preach, hone, church, state, take, street, medium, love, playing, instrument, age, induct, blue, hall, fame, feel, save, alcoholism, drug, ...]
299 299 3 0.9530 say, people, indigenous, go, case, health, death, also, take, many [indigenous, community, amazon, elsewhere, danger, wipe, health, expert, respiratory, illness, develop, influenza, virus, already, main, death, native, community, report, confirm, case, death, infection, initially, concentrate, industrialize, state, however, spread, country, include, indigenous, territory, amazon, combine, first, case, indigenous, people, record, state, indigenous, group, make, population, incredible, risk, virus, spread, native, community, wipe, say, researcher, lead, health, project, indigenous, people, amazon, rainforest, fear, similar, impact, previous, major, outbreak, highly, contagious, respiratory, disease, measle, measle, outbreak, member, yanomami, community, live, border, kill, infected, get, sick, lose, old, people, wisdom, social, organization, say, chaos, response, pandemic, add, community, plan, split, small, group, seek, ...]

300 rows × 5 columns

In [8]:
df_topic_distributions
Out[8]:
Topic Keywords Num_Documents Perc_Documents
0 0.0 drone, mask, perform, inmate, federal_government, zoom, ventilator, meeting, federal, prison 6.0 0.0200
1 2.0 overdraft, interest, bank, rate, pay, law, image_caption, say, image_copyright, new NaN NaN
2 3.0 say, people, indigenous, go, case, health, death, also, take, many 9.0 0.0300
3 4.0 say, home, family, delivery, get, go, other, well, stay, year 239.0 0.7967
4 5.0 customer, box, basic, vulnerable, supermarket, priority, help, include, essential, need 25.0 0.0833
5 6.0 church, funeral, county, palm_sunday, cat, order, tiger, moment, zoo, notification 5.0 0.0167
6 7.0 hedge_fund, stock, market, position, also, fund, group, bullish, end, investor 1.0 0.0033
7 NaN NaN 15.0 0.0500