tl;dr: I created Engaged Citizens to make video of public meetings from my local government searchable and discoverable. Anyone can now search for a word or phrase and jump right to the point in a meeting video where relevant discussion occurs. To date this tool has indexed 722 meetings and made more than 1,300 hours of video easily accessible. The code behind the project is open sourced on GitHub.
Background
I live in the quaint bedroom community of Superior, Colorado. Superior is approximately four square miles in land area and is home to around 13,000 residents. The governing town board is made up of six trustees and a mayor, all elected by popular vote, that serve the interests of the local citizenry. The town board also organizes citizen-led advisory groups that meet regularly to explore specific municipal issues such as open space, parks, sustainability, culture, and local history.
While Superior does a laudable job of advertising upcoming public meetings, it can be quite challenging for a resident to actively keep up to date on the discussions and decisions made by the town government. In the past local newspapers provided regular independent recaps of public meetings. Over the last decade though, as newsrooms across the country have shrunk, traditional media reporting about local government proceedings has significantly diminished.
Problem Exploration
As a proud resident of Superior I am keenly interested in what topics are discussed and decided upon during local government meetings, but attending these meetings in person requires a significant investment of time. I travel frequently for my job, which makes it logistically impossible to be at the local Town Hall for most meetings. During the course of my research for this project I discovered that Superior is currently convening around 300 hours of public meetings each year, which would be an incredible commitment of time for even the most dedicated of civic-minded residents.
Like many towns, Superior contracts a company to host raw video footage from public meetings online. While this is a start down the road toward transparency and accessibility, the leading solutions that governments leverage to provide this service are very outdated. These platforms have user interfaces that lack significant feature functionality, and are also not easily navigable from a mobile phone or tablet. These tools are antiquated one-way panes of glass that do not foster much tangible interactivity. Even the assumption that citizens will watch video of public meetings online may be a bit of an anachronism at this point.
We live in a golden age of media content. Netflix, HBO, Amazon Prime, and others are collectively spending billions of dollars annually to produce engaging movies and original series; at the end of a long day Americans want to experience this extraordinary entertainment content. Viewing video from a recent four hour long town board meeting is discernibly not an appealing alternative to binge watching Silicon Valley, Stranger Things, or Jack Ryan.
New solutions for providing insight into the proceedings of local government need to focus on allowing citizens to invest a minimal amount of time into getting effectively informed and engaged. Traditional government engagement models that simply “push” raw video content out to the public need to be replaced with solutions that allow the public to selectively “pull” only the most salient and actionable information from the entire corpus of public records.
The Prototype
It was against this backdrop that I set out in early 2018 to make the proceedings of local government more searchable and discoverable. As a software engineer at Google, I focus on how governments can use Google Cloud products to make operations more effective and efficient. I reached into the robust software toolset native to Google Cloud to prototype a way for citizens to engage with their government in a more modern self-service way.
The cornerstone of this engineering effort was the Google Speech-to-Text API (Speech API), which uses cutting edge machine learning algorithms to transform spoken words into written text. My assumption was that with text transcripts from each meeting I could create novel data visualization and search tools. The machine learning models currently available for transcription are not perfect, but achieve 90% accuracy or better for my purposes and will continue to improve over time.
As of the date this blog post was published, the system I built has transcribed and made searchable more than 1,300 hours of video from 722 local government meetings, bringing accessibility to meeting videos dating back more than 10 years. I call this proof of concept project Engaged Citizens, and it is accessible publicly on the internet at engagedcitizens.us
Technical Methodology
I stood up a data pipeline to perform batch ingestion of raw meeting video, transcode the video to audio, submit the audio files to the Speech API, and then process the transcription results. I use Google Cloud Storage as an object store for housing the video content, audio files, and the JSON API responses. I employ Google Cloud SQL to create a relational repository for the metadata about each meeting (meeting type, meeting date, meeting length, etc.).
The data pipeline uses a series of containers to perform distinct processing functions, and leverages Google Cloud Pub/Sub as the real-time messaging service that communicates between the different containerized steps. The diagram below illustrates each step in the pipeline.
The outputs from the data pipeline are a word cloud, records in a master search index, and a PDF human-readable transcript.
The word clouds use fairly simple styling and are produced as PNG images. I leverage a Python wordcloud library for creating these visualizations.
Elasticsearch provides outstanding searchability. Transcripts of each meeting are broken into small text excerpts, paired with their associated timestamps, and stored in an Elasticsearch index. This makes it possible to search for a word or phrase and return a list of points in a video where these topics are mentioned. Elasticesearch offers a feature that returns snippets of text from each excerpt and I use these snippets to give users a glimpse of where their search query matches appear in the context of the surrounding words.
With a single click or tap a user can jump to the exact spot in a meeting where a word or phrase is used. A robust fully populated Elasticsearch index also enables longitudinal searches, so users can seek out the occurrence of a word of phrase over time and across all meetings.
For longitudinal searches I use the Google Charts API to give a quick visual representation of search query frequency over time, and to show a high level view of the types of meetings where a search query was found (planning commission meetings, town board meetings, etc.).
The interface for all of this is a very bare bones implementation of the Bootstrap Material Design UI kit. This provides responsiveness and makes the content accessible from any phone, tablet, or computer. While functional, there is certainly room for improving the basic user experience I put together for this proof of concept. My goal from the outset was only to demonstrate the art of the possible.
To facilitate video playback, I use the YouTube Data API to programmatically upload videos to YouTube. I leverage the YouTube Player API to then embed the YouTube video player in the Engaged Citizens interface using Javascript.
There are more than 500 hours of full motion video uploaded to YouTube every 60 seconds, and the YouTube playback experience has set the standard for online video playback. Since internet users are extremely familiar with the mechanics of watching videos on YouTube, I am able to easily leverage that consumer familiarity by embedding videos from YouTube directly on the Engaged Citizens website. I also don’t have to worry about transcoding the video into different formats to support playback on different devices; YouTube handles that too.
The Engaged Citizens website is hosted on Google App Engine, which makes management and scaling a breeze.
Data Science Geekery
Creating word clouds is straightforward, but I noticed early on that each individual word cloud contained a preponderance of common terms and procedural meeting jargon. Looking at the first iteration of these word clouds didn’t distinctly provide valuable insight into the content of any particular meeting. I hypothesized that if I could find the most common words across the entire corpus of meetings I could mark these as stop words and drop them out of the word cloud creation process.
To identify stop words I stood up a Google Dataflow processing pipeline to take each Speech API result and tokenize every word in the transcript. I then loaded these individual words, all 8 million of them, as rows in a Google BigQuery table. Now with a simple SQL query I can quickly identify the top words used throughout the entire library of meeting transcripts. I added 900 of these most frequently occurring words to my list of stop words and regenerated all 700+ word clouds. The result was word clouds that were each semantically relevant and relatively useful for discerning the substance of a meeting.
Another nuance of the transcription process is that commonly used local words and phrases sometimes get misconstrued by the Speech API. For instance, there is a town near Superior called Louisville, Colorado. Coloradans don’t pronounce this town name like the Kentucky municipality with the same designation. The local Colorado parlance sounds like “Lewisville,” and that is exactly the spelling that the Speech API returns in each transcript.
To overcome this challenge, I provide the Speech API with speechContexts hints that the system uses to aid in the processing of the audio. I also employ a few post processing routines to normalize acronyms and abbreviations in each transcript.
Next Steps
I am extremely appreciative that Google allowed me to invest resources and energy into building this project in my 20% time. The company also went above and beyond by allowing me to open source all of the intellectual property created during the development of this solution. Now anyone can pick up where I left off. Admittedly, the open source repo on GitHub needs some more supporting documentation, but more than 12,500 line of code are there and available for anyone to grab and build upon.
My work with Google has taken me in a different direction recently, so I don’t have much time to further build out this solution. I do however often find myself daydreaming about developing new features. My hypothetical feature backlog looks something like this:
- Produce line charts to visually map and display sentiment through the course of each meeting, and generate further visualizations to provide insight into how these sentiment metrics align with the topics being discussed.
- Perform topic modeling for each meeting and visually display topics in bubble charts.
- Generate email alerts so concerned citizens can subscribe to receive notifications when a topic is mentioned during a meeting.
- Develop a dashboard to provide anonymized and aggregated insights about the topics citizens are searching for and the relevant themes arising from the sections of meeting videos that are being viewed.
- Leverage the speaker diarization features of the Speech API to create an empirical record of which voices speak the most about which topics.
- Use the Google Translation API to translate English language meeting proceedings into a slew of other languages, making civic engagement possible for non-English speaking populations.