High-Quality Summaries with ChatGPT: Chain of Density (Prompt Included)
This is a groundbreaking method to generate increasingly dense summaries.
In today's online world, where there's so much information, turning big chunks of text into short, clear summaries is very important. Here comes ChatGPT (GPT-4). Among the many things it can do with a simple prompt, making quick summaries is one of its top skills, helping users get the main idea of long articles, reports, or data easily and quickly.
The Challenge of Summarization
Crafting the perfect summary is no easy feat. It involves selecting just the right amount of information—enough to be detailed and entity-centric, but not so much that it becomes dense and hard to follow. This delicate balance is what researchers from Salesforce AI, MIT, and Columbia University aimed to strike with their innovative "Chain of Density" (CoD) prompt.
Introducing the Chain of Density (CoD) Prompting
The CoD approach is a groundbreaking method that seeks to generate increasingly dense GPT-4 summaries. The process begins with a very simple summary. From there, GPT-4 iteratively incorporates missing important information, all without increasing the summary's length.
How Chain of Density Works
The CoD method is iterative. It starts with a summary that focuses on just 1-3 initial key pieces of information (entities). As the process continues, 1-3 missing entities are identified from the source text and seamlessly fused into the summary, all while maintaining the original length. This is achieved through a combination of abstraction, compression, and fusion. The aim is to convey more information within a fixed token budget, ensuring the summary remains legible and accurate. There are 5 total iterations that happen and all these iterations can happen within a single prompt.
Comparing CoD Summaries with Traditional GPT-4 Summaries
Summaries produced using the CoD method have distinct advantages. They are more abstractive, exhibit greater fusion, and have less lead bias than those generated by a vanilla GPT-4 prompt. A human preference study, conducted on 100 CNN/DailyMail articles, revealed that humans favoured GPT-4 summaries produced using the CoD method. These summaries were almost as dense as human-written ones and made more sense than those generated by a vanilla prompt.
Practical Application: The CoD Prompt in Action
The Chain of Density prompt is a marvel of innovation. Here’s the exact prompt:
Article: {{ ARTICLE}}
You will generate increasingly concise, entity-dense summaries of the above Article.
Repeat the following 2 steps 5 times.
Step 1. Identify 1-3 informative Entities (";" delimited) from the Article which are missing from the previously generated summary.
Step 2. Write a new, denser summary of identical length which covers every entity and detail from the previous summary plus the Missing Entities.
A Missing Entity is:
- Relevant: to the main story.
- Specific: descriptive yet concise (5 words or fewer).
- Novel: not in the previous summary.
- Faithful: present in the Article.
- Anywhere: located anywhere in the Article.
Guidelines:
- The first summary should be long (4-5 sentences, ~80 words) yet highly non-specific, containing little information beyond the entities marked as missing. Use overly verbose language and fillers (e.g., "this article discusses") to reach ~80 words.
- Make every word count: re-write the previous summary to improve flow and make space for additional entities.
- Make space with fusion, compression, and removal of uninformative phrases like "the article discusses".
- The summaries should become highly dense and concise yet self-contained, e.g., easily understood without the Article.
- Missing entities can appear anywhere in the new summary.
- Never drop entities from the previous summary. If space cannot be made, add fewer new entities. Remember, use the exact same number of words for each summary.
Answer in JSON. The JSON should be a list ( length 5) of dictionaries whose keys are "Missing_Entities" and "Denser_Summary"
Tip: Include the article in XML tags such as <Article></Article>
for good results
When applied to an article titled "iPhone 15 hands-on: pre-orders, release date, price", the following output was produced:
The 3rd output seems to be the best.
Evaluating the Quality of CoD Summaries
The quality of CoD summaries was put to the test through both human and GPT-4 evaluations. Human annotators were presented with randomly shuffled CoD summaries from 100 CNN/DailyMail articles. Their feedback was enlightening: for 3 out of 4 annotators, the first iteration of CoD received the most first-place votes across the 100 examples. However, in aggregate, 61% of top-ranked summaries had undergone at least three iterations. This suggests that a significant portion of annotators favoured summaries that had undergone further densification. The inferred entity density, calculated as the ratio of entities mentioned per token, after three iterations was approximately 0.15, closely mirroring human-written summaries (0.151) and surpassing those produced by a vanilla GPT-4 prompt.
GPT-4 itself showcased its versatility by being able to evaluate the quality of summaries. When prompted, it provided ratings, revealing a preference for the middle iterations with scores of 4.78, 4.77, and 4.76, while the first and last iterations were less favoured. This dual approach to evaluation, combining human insights with machine precision and specific ratings, offers a comprehensive understanding of summary quality.
Limitations and Future Directions
While the CoD method has shown promise, it's worth noting that its analysis has been limited to news summarization. Additionally, CoD has been exclusively tested on GPT-4, a closed-source model, and hasn't been evaluated on other large language models (LLMs). There's potential for its application across various domains, and the quest continues to determine the optimal density for summaries. The broader applicability of CoD on different LLMs remains an area ripe for exploration.
Conclusion
The Chain of Density prompting method offers a fresh perspective on automated text summarization. It's an invitation for readers and researchers alike to experiment with CoD and GPT-4, pushing the boundaries of what's possible in the realm of concise, high-quality summaries.