Programming Assignment - CIS122: Flesch Readability Index

Specification:

In this program you will be writing a program that can gauge the legibility of a document without any complex linguistic analysis. The rules for this program are based on the Flesch Readability Index.


The Flesch Readability Index is a value that is no greater than 100. Its values correspond to the reading levels(based on educational levels). The following are the levels:


91 - 100 5th grade level

81 - 90 6th grade level

71 - 80 7th grade level

66 - 70 8th grade level

61 - 65 9th grade level

51 - 60 High school

31 - 50 College

0 - 30 College Graduate

Less than 0 Law School Graduate


The program should prompt the user for a file name, and open the file when it is a correct file and issuing a warning when an incorrect file is loaded. This and other kinds of errors should always be checked by your programs. Remember users are not always the sharpest knives in the drawer.


Computing this index requires these four components:

  1. Compute the number of words in the file of information. A word is a sequence of characters delimited by one or more spaces. This is not a spell checker, and so the word does not have to be an actual word. (hint: read file one string at a time to cut down on the parsing needed internally).

  2. Count all of the syllables in each word. This can be difficult, but for this program we are going to follow these simple rules:

  1. A syllable is a group of vowels (a, e, i, o, u) found in a word. For example, the words 'coin' and 'fiat' would have 1 syllable since there is one grouping of vowels . However, the word 'scapegoat' would have 3 groupings of vowels( a, e, i, o and u) and so it has 3 syllables.

  2. If the word ends in an 'e' then that 'e' does not count as a syllable.

  3. Every word has at least one syllable.

  1. Count all of the sentences in the document. Sentences can be ended with a period, colon, semicolon, question mark, or exclamation mark.

  2. The index is then computed by the following formula:

Index = 206.835 - 84.6 * (# of syllables/# of words) - 1.015*(# of words/# of sentences)


You are also required to keep track of how many times a word occurs. Case does not matter, and so the words "The" and "the" should be counted as the same word occurring twice. The only punctuation that is allowed in a word is the hyphen ('-'). See below for how this information is laid out in the output.


Some interesting indexes for different reading material are:

Comics 95

Consumer Ads 82

Sports Illustrated 65

Time 57

New York Times 39

Insurance Policy 10

IRS rules -6


Example Output:


Please enter the name of a file (enter quit to exit): test3.dat

The # of words 170

List of words and occurrences:

Word #

a 8

boy 2

girl 3

the 7

...etc.

The # of syllables 252

The # of sentences 8

The Flesch index for test3.dat : 59.8592

Please enter the name of a file (enter quit to exit): test8.dat

The file, test8.dat could not be opened

Please enter the name of a file (enter quit to exit): quit


Getting Started:

  1. Create the subdirectory for this program

% cd

% cd cis122

% mkdir prog1

% cd prog1

  1. Create a makefile: The file's name should be flesch.cc

Put the following statements in a file called `makefile':


all: flesch


flesch: flesch.cc

[tab] g++ flesch.cc -o flesch


The [tab] marker just means you should press the tab key there.

  1. Design your program in a modular way using the guidelines we have given you.

  2. Write, compile, and test your functions

  3. Run various test cases. You may use articles from the paper, the web, your own text, etc. as test cases.



Submitting the assignment:

The assignment is due on Thursday Morning by 8AM, Feb 3rd. Use the turnin program:

% cd

% cd cis122/prog1

% turnin cis122a prog1

NOTE: remember to be in your directory when you run the program.