The Uncanny Valley of Factual Correctness: How I Wrote an Article With ChatGPT

The Uncanny Valley of Factual Correctness: How I Wrote an Article With ChatGPT
28 December, 2022 • ... • 34901 views
Doris Day
by Doris Day

We already compared several AI copywriting tools in October this year. However, a month later another AI algorithm drove the internet insane. We’re talking about ChatGPT — the recent release from OpenAI, the creators of the (in)famous DALL·E. Why do people on Twitter ask an AI chat bot to write parody songs? Will this algorithm replace professional copywriters? Let’s find out. We tried writing an article for our blog with ChatGPT — here’s what we ended up with.

How do neural networks learn?

Imagine three composers — Kate, Pete, and Jane. 

Kate learnt the basics of music theory at school, read a lot of books about jazz harmony, and got a college degree in music composition. 

Pete has never had any formal training — he just listened to a lot of music in different genres and played his favorite pieces by ear. 

Jane had some formal training in childhood. But she dropped out of college and made friends with a professional composer who became her mentor. Jane regularly brought her songs to him for feedback and in three years, she was able to write an entire concept album.

All three of them, however, ended up becoming critically acclaimed musicians despite their different learning methods. Kate, Pete, and Jane illustrate three different types of machine learning:

  • Supervised learning — a human AI specialist “explains” the basic data patterns to an algorithm in the form of a labeled dataset. Think of Kate who learned music theory from books and teachers.
  • Unsupervised learning — a human AI specialist gives raw data to an algorithm, the algorithm “explores” this data, finds patterns in it, and “learns” them for its own output. Think of Pete who learnt the most common chord progressions in jazz just by listening to hours of music.
  • Reinforcement learning — a human AI specialist starts by labeling data and feeding it to the algorithm. Then, after the algorithm becomes “smart” enough for output, human “coaches” give it feedback, and the output quality improves over time. Think of Jane who learnt how to write original music with a mentor. 

In our AI copywriting tools review we mentioned the Generative Pre-Trained Transformer 3 (GPT-3). It’s the most popular unsupervised algorithm that generates texts using the knowledge it “gained” from the internet. GPT-3 is used in Copy AI, Jasper, and other AI-powered tools for content marketing. But our hero of the day is built differently.

What is ChatGPT?

In November 2022, Open AI released their new algorithm called ChatGPT, which is a neural network that interacts in a conversational manner. According to Harvard Business Review, this algorithm is a “tipping point” for AI because:

  • It opens opportunities for human-machine cooperation since it’s feedback-based.
  • Hiring AI for creative jobs like writing marketing copy is less dangerous than using it for driving cars because the stakes are lower.
  • The generation speed is much higher and this algorithm can write solid code pieces.

What does this new toy do? ChatGPT can write code for automated QA and product reviews, talk to you about astrophysics, and even compose music — here’s a test I did for funsies:

Prompt: Write a jazzy chord progression in B minor.
Output: i – bVII – bVI – IV – i – V – ii 

(Bm – Am7b5 – Gm7 – Em7 – Bm – C#°7 – Dm)

As a person with formal training in music, I can say it’s not a good chord progression. Jazz is not as chaotic as people think — it has complex rules. However, ChatGPT thinks that “jazzy” means “random minor chords with no rhyme or reason”. 

But its knowledge of harmony is not what I’m interested in. The algorithm was designed mostly for verbal conversations — and it’s not the unsupervised GPT-3 we mentioned earlier. With ChatGPT, the learning process consisted of three stages:

  • Supervised learning — human AI specialists give prompts with samples of desirable human-written outputs that look like a written conversation between two users.
  • Training a reward model — human AI specialists give prompts, receive several outputs, rank them from best to worst, and feed the ranking data back to the algorithm.
  • Optimization — the neural network receives a new prompt that wasn’t in the data set before, then the reward model predicts how the output will be rated by users, and updates the generative algorithm to do better next time.
ChatGPT learning stages
Source: OpenAI

Since ChatGPT is basically the GPT-3 that is enhanced with human feedback. This learning method is supposed to improve the output compared to GPT-3 AI tools designed specifically for copywriting. Does it mean ChatGPT should do better? Let’s find out.

The method

At the preparation stage, my main goal was to address three major issues:

  • Choosing the right topic that is easy to evaluate and is complicated enough to test the algorithm in its full power.
  • Adapting the writing process to the limitations of ChatGPT’s conversational mechanics — since it’s a chatbot, I should “hold a conversation” with it.
  • Coming up with evaluation criteria, preferably quantitative so I could objectively estimate the algorithm’s “copywriting skills”.

Let’s take a closer look at how I solved these problems.

Choosing the topic

Choosing the topic for the article was probably the toughest part of the process. I started by browsing our list of SEO briefs and discussing several variants. I had the following choice criteria:

  • Broadness — the topic shouldn’t be too specific because ChatGPT can’t describe pictures, find email examples, or refer to other articles.
  • Simplicity — the topic should be easy enough to fact check without experts.
  • Complexity — although it sounds contradictory to the point above, the topic should be difficult enough to test the power of the algorithm.

We ended up choosing the SEO brief about B2C email marketing.

Developing the writing method

ChatGPT has two important limitations: its output can’t be longer than 1000 symbols at a time and it’s a conversational model. To me, it means three things:

  • We can’t write a full article in one go.
  • We need to adapt the workflow to the conversation mechanics.

Our SEO briefs include the baseline article structure — here’s what it looks like for our chosen topic:

SEO brief article structure example

The writing method is following this structure and giving prompts for each H2 and H3. I used three types of prompts:

Prompt type Examples Specifications
Questions What is B2C email marketing? Field (B2C email marketing) for ambiguous terms
Writing instructions Write about the differences between B2B and B2C marketing. Volume: Three paragraphs
Coherence prompts Write an introduction for a paragraph about B2C email marketing types. Volume: No more than three sentences

I added “coherence prompts” to keep the flow of the article at least somewhat natural since we’re writing it bit by bit. I also used a range of specifications to the said prompts — like volume and field. I needed to add the “field” one because reminder emails have applications beyond marketing.

Some specifications I considered implementing didn’t work though — for example, those concerning the writing style somehow made the output worse.

No specification In the style of a corporate blog
The first difference between B2B and B2C email marketing is the target audience. B2B email marketing is focused on other businesses, as opposed to individual consumers. These emails typically contain detailed information about products or services that would be of benefit to a business. This type of email also emphasizes informing its potential customers with topics such as case studies, industry news and current trends – all of which help businesses make business decisions. Email marketing is a valuable way for businesses to reach new markets and cultivate relationships with customers old and new. When it comes to B2B (business-to-business) and B2C (business-to-consumer) email marketing, however, there are several differences in terms of objectives, strategies, and content.
The second difference between B2B and B2C email marketing is the content focus. B2B emails focus more on product benefits than on short-term sales, while B2C emails are more likely to speak directly towards discounts and sales opportunities. Along with this, the call-to-actions in each type of email can vary drastically. For example, a B2B email might use a “learn more” call-to-action, while a B2C may use “purchase now” instead. The objective of B2B email marketing is usually to nurture sales leads and establish credibility with potential customers. Companies are apt to send emails that provide useful information, like case studies or research articles which can help decision makers within the organization make an informed purchase decision. In contrast, B2C emails typically have a shorter turnaround goal in mind; retailers often aim to immediately encourage purchasing behavior at the lowest possible cost. As such, emails may contain promotional offers such as discounts or free shipping codes.

The prompt was “Write three paragraphs about the difference between B2B and B2C email marketing”. The no-specification output was surprisingly more on topic and concise — that’s why I ditched the style specification during the initial testing stage.

Developing evaluation criteria

To keep the research at least resembling an actual scientific one, I needed to keep the output evaluation criteria as measurable as possible. Here’s what I came up with:

  • Style: the Flesch index readability checker on Copywritely and the Hemingway editor for more nuanced results.
  • Uniqueness: using the Copywritely plagiarism checker.
  • Grammar & Spelling: the Copywritely and Google Docs spell checkers.
  • Coherence: there’s no way to measure this characteristic in numbers, so the commentary on it will be mostly subjective. 
  • Factuality & Topicality: manual fact checking and proofreading.

I failed to exclude subjectivity and qualitative characteristics completely. But the system is still good enough to evaluate AI’s writing skills in a sufficiently tangible way.

The results

I generated an article using the aforementioned writing method — and ended up with a longread that was 4454 words long. Here’s how ChatGPT did the assignment according to the evaluation criteria.

Style

Even during the writing process, the first thing I noticed was ChatGPT’s unnecessary verbosity. Take a look at this example:

Bad AI writing example

Not only is this the monster of a sentence — the entire paragraph is full of excessive repetitions. A human copywriter would probably write it like this:

An email that displays correctly on all devices will be seen by a wider audience. That’s why responsive email design is important — it gives you a wider outreach. Responsive emails also save time and money because you don’t have to design separate versions for desktop and mobile.

The AI paragraph had 116 words — I compressed it into 47, conveying the same idea. Automated checkers confirmed our suspicions. Copywritely calculated the Flesch readability index as 30/100, which corresponds with “Very hard to read”. Hemingway has a different grading scale — its final verdict was 12, which is “OK. Aim for 9”. Here’s what the article looked like in this editor:

AI text checked in the Hemingway app

On the bright side, ChatGPT knows that too much passive voice is bad. It’s still a bad result though — more than half of the sentences in the article fall under “hard to read” or “very hard to read” categories. 

The only Hemingway metric that is not 100% trustworthy is phrases with simpler alternatives (highlighted in purple). After looking through the entire article, I found out that Hemingway was mostly right but it still yielded a bunch of false positives. For example, the app had a problem with the word “purchase” and suggested replacing it with “buy” or “sale” in the context where “purchase” was clearly a noun. But even with taking false positives into account, ChatGPT uses a lot of complex words — like a school student who writes an essay and desperately tries to sound smarter than they are.

The final rating: ⭐⭐

The final commentary: Even if we wanted to publish this article, we wouldn’t publish raw AI output — it requires deep editing.

Uniqueness

I added this criterion for two reasons. Firstly, a blog filled with cheap copypasted articles doesn’t fulfill its purpose. Companies need blogs not just for brand reputation but also for SEO. And search engines don’t like plagiarized content — you won’t get even to the bottom of the first page on Google. That’s why plagiarism checks are applicable to human copywriters too.

The second reason is a bit more interesting. Sometimes algorithms become a little too well-taught — it’s called overfitting. Overfitting is when a neural network memorizes the training data instead of generalizing it and searching for patterns. If it happens to generative neural networks like ChatGPT, they end up copypasting output from training sets instead of generating original content. If we put it into perspective, the overfitting problem might be one of the biggest obstacles when it comes to replacing human writers with AI.

Copywritely’s originality index for our AI article was 76%, which corresponds with “normal”. However, our blog style guide requires 80% and higher. So, it might be “normal” for Copywritely but it doesn’t meet our requirements for human writers. 

The final rating: ⭐⭐⭐

The final commentary: It doesn’t look like overfitting but if we wanted to publish that article, it would need some rewriting.

Grammar & Spelling

We already mentioned false positives in Hemingway — but the situation is way worse when it comes to automated grammar checks. Copywritely pointed out 44 grammar mistakes, which corresponds with “Bad”. However, according to the detailed review, it’s not as awful as it looks:

  • Dialect errors: for some reason, Copywritely didn’t like the word “Engagement” and suggested replacing it with “Appointment” several times, which is wrong.
  • Possible typo: basically, it was a long list of marketing terms like “CTAs”, “FOMO”, and “eCommerce”. There was one actual mistake though — “signup” instead of “sign-up”. 
  • Grammar: Copywritely highlighted two false positives — “a thank you email” and “the thank you email”. However, the latter counts as a mistake because of the incorrect article use. The Google Docs grammar checker also found two instances of missing prepositions like ”refer customers” instead of “refer to customers”. 
  • Punctuation: three instances of two consecutive dots and a typography recommendation to use m-dashes. 
  • Miscellaneous: one instance of excessive word repetition.
  • Inconsistency: during the manual proofreading, I found a lot of alterations between “B2C” and “b2c” variants. It’s not a grave mistake but still, not a desirable output.

Although ChatGPT made weird typography choices, in general, the result is not that bad.

The final rating: ⭐⭐⭐⭐

The final commentary: The article might need quick proofreading before the publication — but ChatGPT didn’t make more mistakes than an average human writer with mild attention issues.

Coherence

From the very beginning of this “study”, I knew that coherence would be the biggest issue. The conversational mechanics of ChatGPT are the major obstacle — the algorithm writes each paragraph with no prior context. 

An introduction to the article written by ChatGPT

The main problem with this introduction is that, since it’s written out of context, it doesn’t really describe the article’s contents — which is exactly what an introduction is supposed to do. For example, there’s nothing about automation tools in the article — the brief doesn’t imply a detailed guide on how to use them. 

Another issue is that ChatGPT doesn’t really understand the concept of an introduction to a paragraph and treats them like they are introductions to articles. Here’s an example:

An introduction to a paragraph written by ChatGPT

The whole “In this paragraph, we will…” bit is excessive — no one writes like that. Here’s how I would edit AI’s output to make it sound more natural:

B2C email marketing is very versatile and includes many types of content. Let’s explore some of the most popular emails brands use today.

Now, let’s take a look at another introduction written by ChatGPT.

An introduction to a paragraph written by ChatGPT

This might be the worst one in the entire article. ChatGPT seems to constantly repeat very vague and obvious claims like “Email marketing is an effective way to reach potential customers” over and over throughout the entire article. The algorithm did it here again. But the saddest part is, this introduction doesn’t lead to anything. Even “In this paragraph” bits were better — they were unnatural, for sure, but these sentences created at least some kind of coherence. 

And this is what a conclusion should look like, according to AI:

A conclusion to the article written by ChatGPT

The problem is the same as with the introduction — it doesn’t have a lot to do with the article contents and it repeats all these vague statements.

The final rating: (because giving a zero is impossible)

The final commentary: It’s awful but it’s not the fault of the algorithm itself — it’s just the bot’s mechanics being unfit to perform tasks like this.

Factuality & Topicality

Honestly, AI making weird mistakes has become my favorite kind of humor this year. This is DALL·E’s version of the Wendy’s logo from Janelle Shane’s hilarious Twitter thread.

Wendy’s logo generated by Dalle
Source: Twitter

She commented DALL·E’s output like this: “What I find interesting is how vaguely, recognizably correct they manage to be, while also being utterly wrong”. Which is funny, because ChatGPT’s attempts to educate us on B2C email marketing give the same impression. For example, let’s get back to the article’s conclusion I mentioned earlier.

In conclusion, B2C email marketing has proven to be an effective tool for businesses looking to reach their target audience. <…> Additionally, it provides marketers with powerful analytics about customer segmentation, subscriber behavior, and other key insights.

Does it though? Are analytics exclusive to email marketing only? Are customer segmentation and subscriber behavior insights? This sentence looks like a set of buzzwords with no substance behind it — that was the main issue I recognized during fact checking. Take a look at another example — here’s how ChatGPT thinks you should set goals for a B2C email campaign:

  1. Define Your Target Audience: <…>
  2. Set S.M.A.R.T. Goals: <…>
  3. Increase Traffic to Your Website: <…>
  4. Boost Sales Conversions & Re-engagement: <…>
  5. Increase Brand Awareness: <…>

It’s not that increasing traffic, boosting sales, or raising brand awareness is incorrect and off-topic. The problem is, the paragraph is about how to identify campaign goals not possible variants of campaign goals. Meanwhile, ChatGPT treats these goal types like steps in a tutorial, which is obviously wrong.

Let’s take a look at the next paragraph, which is about list building.

  1. Create a B2C Sign Up Form: <…>
  2. Incorporate Routinely <…>
  3. Leverage Your Social Media Pages <…>
  4. Run Contests & Giveaways <…>
  5. Network Online <…>
  6. Segment Your List <…>

The H3 of this paragraph was “Build and segment your email list”. List building and segmentation are different entities that need separate explanation. What ChatGPT did was literally answering “Segment your list” to “How to build and segment an email list?”. Even worse, this paragraph has an actual mistake. Networking for people in your industry is a B2B tactic and it has nothing to do with building lists for B2C campaigns.

Here’s one more example of poor AI instructions — this one is about launching an email campaign:

  1. Create Your Messaging <…>
  2. Identify Your Target Audience <…>
  3. Design Your Email Template <…>
  4. Put Together Your Contact List <…>
  5. Set Up A/B Testing Methods: <…>
  6. Send Out & Monitor Performance: <…>

Structure-wise, list building, email design and writing were already mentioned in previous paragraphs. But the “best” part is ChatGPT’s definition of A/B testing:

A/B testing makes it easy to measure which of two versions is better at achieving the goal of any digital marketing activity such as an email campaign – visits, leads, registrations or purchases etc., by allowing users to control different elements in each version and compare results afterwards

It’s not really a measurement facilitation, A/B testing is literally the comparison of two emails. The “makes it easy to measure” bit would be correct if ChatGPT was talking about data analysis methods like Pearson’s Chi Square. Secondly, the “users” bit is confusing to the point I thought about emails that subscribers themselves can customize.

Here’s the final example — this one is about the sense of urgency:

Using a sense of urgency in B2C email marketing campaigns helps to motivate consumers to take action faster. It encourages them to purchase goods before the offer expires, thus moving them closer to becoming customers and increasing sales.

Not that it’s blatantly incorrect. The problem is that none of that should have been said in the first place — it’s too obvious. Basically, the whole article mostly consists of claims like this — vaguely correct, right words in an unhinged order and context, overexplaining, repeating the same ideas, and so on. 

The final rating: ⭐⭐⭐

The final commentary: We’ve reached the uncanny valley of factual correctness.

If it’s that bad, why use ChatGPT for content marketing?

Yes, it’s bad — and the “bad” characteristic is not just about low ratings according to my evaluation system. The problem is, even if we just rewrite ChatGPT’s monstrosity with shorter sentences and correct its wrong statements, we won’t get a good article. Corporate blogs don’t exist for the sake of SEO alone — they exist for readers. An AI can’t add jokes, use metaphors to explain complex concepts, find pictures to illustrate articles, or provide interesting statistics to prove certain points. If we only edit the style and blatant mistakes, we’ll be left with a sterile text that is not informative enough and not interesting enough to read. Even if you use ChatGPT as a writing aid, it will be faster to write everything from scratch. So, yeah, I’ll still keep my job for now — until a much better AI algorithm comes out.

ChatGPT is not good for long-form content — but what if it’s good for something else? Despite the disastrous article, I don’t condemn ChatGPT completely. Here’s how you can actually use AI for writing:

  • Getting a general idea of a new topic. Imagine that you need to write an extensive guide about email subject lines from scratch — you don’t even have a brief and you’re completely new to email marketing. You open ChatGPT and ask it how to create a good subject line. You end up with a list like this:
  1. Keep it short and to the point 
  2. Focus on the benefit to your reader 
  3. Incorporate a deadline or sense of urgency 
  4. Personalize it with the recipient’s name 
  5. Mention a keyword related to the topic 
  6. Ask a question that sparks curiosity 
  7. Make a bold statement 
  8. Leverage phrases like “Warning” or “Attention” 
  9. Stress the exclusivity of the offer 
  10. Use emojis to add personality and visual appeal

Maybe not all points from this list are actually useful or correct — and you obviously won’t use all of them in the article. But ChatGPT gave you a general idea of the unfamiliar topic, so now you know what to look for and what exactly you should cover in your content copy. Thus, it’ll be much easier for you to start writing right away.

  • Brainstorming ideas for creative copywriting. Your brain is a biological neural network — which means that you too create stuff based on your previous experiences and the content you consumed. For many beginner content marketers, it might be hard to just come up with a catchphrase or an email subject. You can give yourself some input with ChatGPT. The algorithm will probably give you a bunch of bad content ideas — but it will act as a warm-up for your creative mind. And, after looking at 10 bad AI-written email subjects, you might come up with a good one.
  • Writing aid for short-form content. We already mentioned that ChatGPT’s conversational mechanics don’t allow it to create coherent long articles. So why not use it as a writing aid for social media posts? However, be cautious. AI lacks creativity in its output, which is important for SMM. And all the generated texts will require extra fact checking and editing. 

What is interesting, the algorithm seems to agree with me on the matter.

Prompt: What do you think, will AI ever fully replace human copywriters?
Output: No, AI will never fully replace human copywriters. While AI can certainly help with certain aspects of writing such as grammar, spelling and data entry, the creative process of copywriting requires a human touch. Ultimately, human copywriters are best equipped to craft compelling stories that capture the attention of readers and create an emotional response.
28 December, 2022
Article by
Doris Day
An experienced writer and editor with a degree in theoretical linguistics and a specialization in B2B/IT/SaaS marketing copy. I see my mission as an educator who explains complex phenomena using simple terms. My favorite show is "What We Do in the Shadows" and I usually spend my weekends somewhere in nature.
Visit Doris's
Selzy
Selzy
Selzy
Selzy
Selzy
Latest Articles
Selzy Selzy Selzy Selzy