Report Table of Contents
Contents
What can AI (Artificial Intelligence) do for you? Apparently do a decent description of charts, graphs and infographics. I’m generally fairly skeptical about AI applications, but some tools such as the ASU Image Accessibility Creator tool and others can truly save time for someone needing a description.
But like all technology, you need to understand the limitations. Here are some of my observations about AI image description would could be useful.
OCR (Optical Character Recognition) Success!!
Believe it or not one of the most tedious chores of image description is transcribing text. Sighted people can read text embedded in a photograph or graphic, but screen readers cannot.
Optical Character Recognition (OCR) tools have been developed to extract text, but traditional tools still have drawbacks including interpreting reading order of complex layouts such as in infographics, org charts or bar charts.
AI Results
Many of the new AI tools to a fantastic job just with transcribing test. In comparison with more traditional OCR tools, they seem to do a better job of interpreting text, even in very small fonts, and understanding where the pieces go.
For English text, I haven’t found any major issues, even with less than stellar image quality. But I haven’t tested any phonetics or non-English yet.
See an example below of an infographic with a tricky layout about grand juries vs. trial juries.
Note: I asked the AI to not only transcribe the image, but to provide some of the output in an accessible HTML file. Although I had to do some tweaks, the process took about five minutes to complete. The original AI description included icon descriptions, but editors can determine if they are decorative (and can be omitted) or not.
Infographic: Grand Jury v. Trial Jury in the Federal Court System.
View Image

View Description with Table
Description: A comparison chart of Grand Jury vs Trial Jury in the Federal Court System, including jury size, question addressed, what happens, who presents evidence, and duration of service, with icons and descriptions for each aspect. See details below.
Caption: This is a teaching tool with information available at uscourts.gov Educational Resources. Source: <uscourts.gov> Educational Resources.
The image is an infographic titled “Grand Jury v. Trial Jury in the Federal Court System Simplified.” It is divided into two vertical sections: “Grand Jury” on the left and “Trial Jury” on the right. A series of horizontal comparisons are made between the two types of juries, highlighting differences in jury size, the questions they address, the process of presenting evidence, the parties involved, and the duration of service.
- The top section includes the title, with an illustration of people and the word “Simplified.”
- Below the title, a brief text indicates that the infographic is a teaching tool with more information available at “uscourts.gov” in the Educational Resources section.
- The headings “Grand Jury” and “Trial Jury” have arrow-like borders pointing inward to the vertical line dividing the sections.
- “Grand Jury” points include 16-23 jurors, deciding if there’s enough evidence to bring charges, government presenting evidence in closed hearings, U.S. attorneys (prosecutors) presenting evidence, and the duration of service being one day a month for several months.
- “Trial Jury” points include 6-12 jurors, finding guilt or liability in criminal or civil cases, evidence presented in open court by both sides (prosecution and defense), attorneys from both sides presenting evidence, and the duration of service being one trial.
- The visual line connecting these points includes icons like a question mark, stack of papers, and microphone, and ends with a calendar icon.
- At the bottom, the source is mentioned as “<uscourts.gov> Educational Resources.”
Alt-Text:
Infographic comparing Grand Jury and Trial Jury in the federal court system, covering jury size, roles, evidence presentation, involved parties, and service duration.
Editor’s Note: A side by side comparison chart has been converted to the table below.
Aspect | Grand Jury | Trial Jury |
---|---|---|
Jury Size | 16-23 Jurors | 6-12 Jurors |
Question | Decide if the Prosecutor Has Enough Evidence to Bring Charges | Find "Guilty/Not Guilty" (Criminal Cases) or Find "Liable/Not Liable" (Civil Cases) |
What Happens | The Government Presents Evidence in Closed Hearing | Prosecution and Defense Present Evidence in Open Court |
Who Presents Evidence | U.S. Attorneys (Prosecutors) | Attorneys for Both Sides: Prosecution and Defense |
Duration of Service | One Day a Month for Several Months | One Trial |
Technical Diagrams – Pretty Darn Good
Another common task in academic image descriptions are technical diagrams such as bar charts, line charts, maps and all sorts of technical diagrams. So long as your charts aren’t too exotic, AI does a good job providing a description, including …transcribing the text of title, keys, legends, axis labels and other snippets. AI will also describe graphic elements, icons and colors, all of which are helpful when interpreting data.
Missing Data Points
One gap I have noticed is that AI doesn’t always estimate data values. One description of a Harrisburg climate chart mentioned a temperature peak in July, but not an approximate value. Data estimation based on visuals alone can be tricky though. It’s always good to have the data sourced as a table or some sort of spreadsheet/data file.
Harrisburg Climate Chart

AI Description
Open Image Description
Image Description: The image is a combined bar and line graph representing the Monthly Climate Normals for the Middletown-Harrisburg Area, PA, from 1991 to 2020. The vertical axis on the left is labeled "Precipitation (inches)" and scales from 0 to 6 in increments of 0.6. The vertical axis on the right is labeled "Temperature (°F)" and scales from 15 to 90 in increments of 15. The horizontal axis lists months from January to December.
Green vertical bars illustrate total precipitation normal for each month, showing that the precipitation peaks around July [at about 4.5 inches]. There are three lines indicating different temperature measures: the blue line represents the mean minimum temperature normal, the brown line the mean average temperature normal, and the red line the mean maximum temperature normal. All temperature lines show a bell-shaped curve, with the higher temperatures in the middle of the year and lower temperatures at the beginning and end of the year. The min and max temps are listed below:
The lines display a clear seasonal pattern, with the mean max temperature (red) peaking in the middle of the year during the summer months, the mean avg temperature (brown) following closely, and the mean min temperature (blue) being consistently below the other two lines. The graph also contains a note "Click and drag to zoom to a shorter time interval," suggesting interactive functionality.
Alt-Text: "Graph of Monthly Climate Normals for Middletown-Harrisburg Area, PA, displaying precipitation as green bars and temperature with a blue, brown, and red line for mean min, avg, and max temperature normals respectively, showing seasonal variations."
Text in the image:
- Monthly Climate Normals (1991-2020) – Middletown-Harrisburg Area, PA (ThreadEx)
- Click and drag to zoom to a shorter time interval
- Total Precipitation Normal
- Mean Max Temperature Normal
- Mean Min Temperature Normal
- Mean Avg Temperature Normal
- Powered by ACIS
With Added Data Points
View Data Table
T/Precip. | Jan | April | July | Oct |
---|---|---|---|---|
Total Precipitation (Inches) | 3.0 | < 3.6 | < 4.8 | > 3.8 |
Mean Max Temperature (°F) | 43 | 66 | 86 | 66 |
Mean Avg Temperature | 36 | 56 | 78 | 54 |
Mean Min Temperature | 18 | 42 | 66 | 45 |
SME Matters
I would recommend having an SME (subject matter expert) review anything very unusual. I fed AI a syntax chart from a paper of mine, and the results were good, but what took AI about 3-4 paragraphs to describe could be condensed by the SME to 1-2 sentences depending on the context. Below is another syntax tree which AI described well, but I did tweak the descriptions.

AI and SME Description
Open AI Description
Image Description: The image depicts a syntactic tree diagram used in linguistics to illustrate the structure of a sentence. The tree is composed of green text and lines, representing different components of a sentence. At the top of the tree is the letter “S,” indicating the sentence level. Below “S,” the tree branches into “NP” (Noun Phrase) and “VP” (Verb Phrase). Further, the “VP” splits into “V” (Verb) and another “NP,” which is marked as containing a “Det” (Determiner) and “N” (Noun).
Alt-text: Syntactic tree diagram of a sentence [John hit the ball].
Text in Image:
- S
- NP
- VP
- V
- NP
- Det
- N
Open SME Description
A syntax tree for the sentence (S) “John hit the ball.” The top S node branches into an noun phrase (NP) ‘John’ and verb phrase (VP) ‘hit the ball’. The VP (corresponding to a traditional predicate) branches further into V ‘hit’ and NP ‘the ball.’ The last NP ‘the ball’ branches into a determiner (D) ‘the’ and noun (N) ‘ball.’
Non-Visual Representations
In an ideal world, there could b, but even in mainstream academia, there are technology limits to universal use of graphics. Fortunately, there are are often text-based alternatives that can be used. Here’s a linearized representation of the the tree above that could be used in some cases.
Linearized Tree
Sentence: John hit the ball.
[S [NP John] [VP hit [NP [D the ] [N ball.]]]]
Repeated Graphics
Editing descriptions are also very important for repeated graphics in which maybe only one element is really important in a particular graphic to represent a change or focus element. Once a person has become familiar with the format, it’s more efficient to focus on the these first and skip what is already known.
Many people not directly involved in accessibility wonder why SME input is important at the beginning. Reviewing image descriptions in an academic context from both perspectives has made me realize that bridging the gap early makes accessibility more efficient for everyone.
Weaknesses
Facial Recognition
Do you recognize at least one woman in the photo below? It’s a photo of First Lady Barbara Bush and Princess Diana taken in the White House. Many people are able to recognize celebrities, but the AI tools I have used are not able to do so. Instead they describe women sitting on a sofa and their outfits.

AI Caption: Two women seated on a yellow sofa, one with short white hair in a blue outfit, and the other with short blonde hair in a pink suit, looking at a book.
Google Reverse Image Search
If you are in a situation where you cannot identify people or objects, I do recommend Google Reverse Image search or something similar to find other "matching" photos.
Math Equations
Although the ASU tool can theoretically generate MathML, it’s OCR engine has not produced optimal results. I would recommend I would still recommend Equatio or MathPix for equation OCR.
Note: I want to add that this functionality is slightly beyond what is intented. Image description can mean many things depending on the context.

Original Equation (MathML)
According to
ASU LaTeX Raw (Spaces Removed)
\sqrt[4]{\sqrt[5]{\sqrt[3]{\sqrt[17]{240}}}}=
ASU MathML
Note: In my experience, more advanced math students, even those using screen readers, usually avoid "spoken" math. It’s easier to process math symbols, LaTeX or MathML/equation editor outputs.
Art Objects
This was an interesting area. For some fine arts, AI was able to make useful descriptions of color and composition, but often missed details that an art historian would probably comment on.
For example, in a piece of embroidery, AI might be able to identify that there was embroidery, but miss pretty key points like the presence of cross stitch elements or which animals and flowers were being depicted. Both of these are important for identifying artistic genres, periods or locations of art.
Closeups
Adding a closeup of an embroidered piece provides a different perspective for AI. In the closeup, AI did pickup the cross stitch, but had problems with identifying the Runes as Runes – it did much better with the full picture of all the runes. Context does matter a lot.

AI: Image Description: The image shows a piece of white fabric with a cross-stitch design. The design features two rows of thick, intertwined red and blue lines at the top and bottom. In between these rows are two sets of letters. The upper set consists of large, blue letters spelled out in an unusual, varied typeface. Below these, there is a series of smaller, red letters in a simpler typeface. The fabric has a visible weave, typical of cross-stitch materials.
Text: Upper set: “MMROAN” (actually Runic Letters) Lower set: “EMLNGOD”
Final Warning: Proof Your Output
As wonderful as the AI tools are, you need to proof the output.
Hallucinations
AI can "hallucinate," meaning it may make an incorrect assumption, and using that assumption, make further wildly incorrect statements.
I asked it to describe a modern Runic embroidered sampler (below) and it identified one motif as a "Viking ship" and mentioned it had four rows of oars. It also said the piece was a bookmark (it’s really six inches wide).

In reality, the Viking ship was a modified version of an abstract animal with a head similar to a ship’s head motif, but with actual legs and hind quarters. Not a bad miss in terms of shape, but very inaccurate conceptually.
As IBM notes in the link, hallucinations may be the result of incomplete exposure to training data in a given subject domain. Because academic research may be filling in current knowledge gaps, there may always be a slight lag in what AI can do with academic content.
Missed Details
A description could be very close to correct, but miss a crucial detail. In one case of an PowerPoint screenshot with a utility window open, the main screen was described, but not the window on the right. As with other OCR tools, you could upload just a cropped section and redo the scan to receive a transcript. Or you could tweak the prompt.

The AI tool described the slide content, but not the screen order elements on the right. Reloading just the mixing element will usually result in a usable description.
AI Description
Image Description:
The image presents a question on whether the flag of Pennsylvania should be redesigned, showing three different flag designs.
Open Image Description
- To the left is the "Official (1799)" flag, featuring a dark blue background with a detailed crest in the center that includes a bald eagle, horses, and a shield with a sailing ship, plow, and three sheaves of wheat.
- In the middle is a modern alternative design titled "Keystone + State," which displays a simplified solid blue field with a central orange keystone shape containing three white stars above and four below.
- To the right is another alternative titled "Keystone Stripes," depicting a horizontal tri-color flag with blue, white, and gold stripes, and a centered keystone emblem.
- The image source credits are at the bottom with URLs and the names "Jeffcook2756" and "Marmoceet". Images from vexillology.fandom.com (Pennsylvania) Licensed by Creative Commons.
Summary
Image description is one case where AI is coming into its own. But as with automatic speech recognition and other applications, AI output needs some auditing to be 100% accurate.