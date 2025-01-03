Grok, the AI chatbot that’s built into X.com, has quietly added the ability to analyze images. I’ve been testing it, and it does seem to do a pretty good job, until you reach the usage limit on a free account, which is set to a pretty low three uploads at the time of writing.

To use Grok’s new image analysis features on mobile just load up the X app, then tap on the Grok tab at the bottom of the screen (a square with a line through it) then the + button to upload an image. In a browser, go to X.com and click on Grok in the left-hand menu, then use the paperclip button to attach an image to upload. Once it's uploaded you can then ask Grok some questions about it

Analysing images

To start things off I uploaded a cartoon drawing of Odysseus, a king from Greek mythology who featured in Homer’s Odyssey (I’d just watched The Return, so bear with me) to see if Grok could recognize him. Grok did a very good job of recognizing that it was an historical figure from the style of the cartoon, and I could even get it to generate more images of a similar nature by just typing prompts like “redo the image but make it of a cartoon woman instead”.

Being able to analyze the content of an image so that it can reproduce it with changes is a useful ability, but not something that its rivals like ChatGPT can’t do equally as well. But what about understanding text in images?

Grok can generate images as well as analyzing them (Image credit: X)

Analyzing text in images

I uploaded the image of a flyer for a local fitness class, and asked Grok to tell me what text it had found in the image. It extracted all the text perfectly, and even provided clickable links to the web addresses it found. It didn’t seem to provide a link to an Instagram account name though; however ChatGPT didn’t do that either when I tested it.



Being able to extract text from an image is one thing, but Grok needs to be able to analyze that text too. To test out Grok I uploaded a timetable for my local martial arts gym and asked it if there was a BJJ class on Thursdays I could go to. It replied with the perfect answer: “Yes, there is a BJJ class on Thursday at 7:00 AM (BJJ Gi for Adults & Teens) and at 8:00 PM (BJJ No Gi for Adults & Teens).” A feature like that could be genuinely useful for people who have trouble processing visual information.

To take Grok's image analysis even further I tried to upload an academic text as a PDF to see what it made of that, but it turns out that PDF upload isn’t available on Grok unless you upgrade to Premium. Unperturbed, I took a screenshot of the first page of the document, and asked Grok to summarize the text. Again it did an exemplary job, breaking its answer down into sub headings like “Research findings”, "Scholarly contribution” and “Historical context”, whereas ChatGPT simply produced a couple of paragraphs of summary. It seems that Grok has the edge over ChatGPT here.

Grok vs ChatGPT

The biggest issue with Grok currently is that you very quickly hit the free usage limit for uploading images – and again to be fair, you also hit it fairly quickly on the free tier of ChatGPT too. Three uploads isn’t much to go on a day. Aside from that, Grok is impressively good at analyzing images, even beating ChatGPT in some areas, and well worth investigating if the feature sounds like it would be useful to you.