Power Point Readability App

Following the previous post The Science Behind Readability, I received a question from a management consultant on how to actually extract text from a Power Point deck and assess the readability. For this management consultant, the top line of the slide, or tagline, is crucial in conveying the message of the slide. He described the tagline as the ‘So What’ of the slide.

To help improve taglines, I compiled a standalone windows executable app that reads text from Power Point and then scores the readability of the selected text. Following this, I will share the code for how I extract text from Power Point in Python.

1. A Windows App for Tagline Readability

The attached ZIP file contains a windows command line program called ‘Tagline.exe’. This is a self contained windows program that does not need Python. The app takes a Power Point Deck and assesses the readability of your taglines. I have included a Power Point slide called ‘Test.pptx’ to use to test the app as well. Note that the Power Point slide you are using must be in the same directory as the file Tagline.exe.

You can download the ZIP file here: PPT Readability App

ZIP File Contents

The way the App works, is you type in the file including the file extension e.g. Test.pptx. Note that the file name is case sensitive, so make sure you use capital letters when required. Next you enter the font size to extract from the PPT file. If you do not type in anything and press return, the default is 18 PT. Next you enter the minimum number of characters to pull. This is a filter to remove page numbers and title page text that are not full sentences. If you do not type anything and press return, the default is 30 characters.

Enter Power Point File

Lastly, the app will extract the sentences from your Power Point Deck into a file output.txt and give you the Flesch-Kincaid Grade and Flesch Readability Score.

2. Use Python to Extract Power Point Text

The code I used to extract text from PPT uses a Python library called ‘python-pptx’. This library can be used to extract, but also automate slide generation for reports. I.e. you can both read and write to Power Point Slides. You can read more on how to use this library at Python-PPTX .

In the below I use Python v 2.7. An explanation of the code sections is given below:

In #01, I used the functions os.getcwd for the name of the current directory and os.listdir() to return the files in the current directory.

Next in #02, I define a list to put the extracted lines into and prompt a file name to read from.

Section #03 is where most of the work occurs. I first grab a few more variables, font size and the minimum number of characters for a sentence. #03.2 is a ‘for’ loop that goes through all the shapes and paragraphs within the power point deck. The ‘if’ statement then looks for a font size using ‘if font.size == Pt(size):’. If it finds the right font size, it adds the text into the variable ‘text_runs’. Lastly in #03.3, we write the list to a file called ‘output.txt’. This is where we apply the minimum number of characters to ensure we have proper sentences and not small titles or page numbers. You will notice there is a try and except clause here. The reason this is necessary, is that PPT sometimes uses characters that are difficult to extract to text e.g. a long ‘-‘ or a misspelled word that has a red highlight. When Python does not know what the character is, it will skip that part of the sentence and move to the next.

That’s it for the Power Point Tagling readability blog. Let me know if you have any comments by writing to me at 15minanalytics@gmail.com


Share on LinkedInEmail this to someoneTweet about this on TwitterShare on FacebookPrint this page

Leave a Reply

Your email address will not be published. Required fields are marked *