Python difflib: Comparing Text and Sequences

The difflib module compares sequences and generates human-readable diffs. Whether you're building a code review tool or comparing configuration files, difflib has you covered.

Basic Comparison

Find differences between two texts:

import difflib
 
text1 = "Hello world"
text2 = "Hello there world"
 
matcher = difflib.SequenceMatcher(None, text1, text2)
print(matcher.ratio())  # 0.78 - similarity ratio

Unified Diff

Generate diffs like git diff:

import difflib
 
original = """line 1
line 2
line 3
line 4""".splitlines(keepends=True)
 
modified = """line 1
line 2 modified
line 3
new line
line 4""".splitlines(keepends=True)
 
diff = difflib.unified_diff(
    original, modified,
    fromfile='original.txt',
    tofile='modified.txt'
)
 
print(''.join(diff))

Output:

--- original.txt
+++ modified.txt
@@ -1,4 +1,5 @@
 line 1
-line 2
+line 2 modified
 line 3
+new line
 line 4

Context Diff

More context around changes:

import difflib
 
diff = difflib.context_diff(
    original, modified,
    fromfile='original.txt',
    tofile='modified.txt'
)
 
print(''.join(diff))

HTML Diff

Generate visual HTML diffs:

import difflib
 
differ = difflib.HtmlDiff()
html = differ.make_file(
    original, modified,
    fromdesc='Original',
    todesc='Modified'
)
 
with open('diff.html', 'w') as f:
    f.write(html)

Finding Close Matches

Find similar strings in a list:

import difflib
 
words = ['apple', 'application', 'apply', 'banana', 'band']
 
matches = difflib.get_close_matches('app', words)
print(matches)  # ['apple', 'apply', 'application']
 
# Control cutoff and number of results
matches = difflib.get_close_matches('app', words, n=2, cutoff=0.6)
print(matches)  # ['apple', 'apply']

Spell Checking Pattern

import difflib
 
def suggest_corrections(word, dictionary):
    """Suggest spelling corrections."""
    matches = difflib.get_close_matches(word, dictionary, n=3, cutoff=0.6)
    if matches:
        return f"Did you mean: {', '.join(matches)}?"
    return "No suggestions found"
 
dictionary = ['python', 'programming', 'program', 'project', 'process']
print(suggest_corrections('progrm', dictionary))
# Did you mean: program, programming?

Line-by-Line Comparison

The Differ class shows detailed changes:

import difflib
 
text1 = ['one\n', 'two\n', 'three\n']
text2 = ['one\n', 'tree\n', 'three\n']
 
d = difflib.Differ()
diff = d.compare(text1, text2)
print(''.join(diff))

Output:

  one
- two
+ tree
  three

Understanding Diff Markers

  (space) - line is same in both
- (minus) - line only in first sequence
+ (plus)  - line only in second sequence
? (question) - highlights specific changes

SequenceMatcher Details

Get detailed matching information:

import difflib
 
s = difflib.SequenceMatcher(None, "abcd", "bcde")
 
# Matching blocks
for block in s.get_matching_blocks():
    print(block)
# Match(a=1, b=0, size=3)  - "bcd" matches
# Match(a=4, b=4, size=0)  - end marker
 
# Operations to transform a to b
for op, i1, i2, j1, j2 in s.get_opcodes():
    print(f"{op:7} a[{i1}:{i2}] --> b[{j1}:{j2}]")
# delete  a[0:1] --> b[0:0]  - remove 'a'
# equal   a[1:4] --> b[0:3]  - keep 'bcd'
# insert  a[4:4] --> b[3:4]  - add 'e'

Ignoring Junk

Skip unimportant characters:

import difflib
 
# Ignore spaces
s = difflib.SequenceMatcher(
    lambda x: x == " ",  # junk function
    "hello world",
    "helloworld"
)
print(s.ratio())  # Higher because space is ignored

File Comparison

Compare two files:

import difflib
 
def compare_files(file1, file2):
    with open(file1) as f1, open(file2) as f2:
        lines1 = f1.readlines()
        lines2 = f2.readlines()
    
    diff = difflib.unified_diff(
        lines1, lines2,
        fromfile=file1,
        tofile=file2
    )
    return ''.join(diff)
 
print(compare_files('old_config.txt', 'new_config.txt'))

Config Change Detection

import difflib
 
def detect_config_changes(old_config, new_config):
    """Detect and report configuration changes."""
    old_lines = old_config.strip().split('\n')
    new_lines = new_config.strip().split('\n')
    
    d = difflib.Differ()
    changes = list(d.compare(old_lines, new_lines))
    
    added = [line[2:] for line in changes if line.startswith('+ ')]
    removed = [line[2:] for line in changes if line.startswith('- ')]
    
    return {'added': added, 'removed': removed}
 
old = "debug=false\nport=8080"
new = "debug=true\nport=8080\nhost=localhost"
 
changes = detect_config_changes(old, new)
print(f"Added: {changes['added']}")    # ['debug=true', 'host=localhost']
print(f"Removed: {changes['removed']}")  # ['debug=false']

Performance Note

For very long sequences, SequenceMatcher can be slow. Use autojunk=False if you're getting unexpected results:

s = difflib.SequenceMatcher(None, long_text1, long_text2, autojunk=False)

When to Use difflib

Use difflib when:

Comparing configuration files
Building diff viewers
Spell checking / fuzzy matching
Finding similar strings
Generating patch files

For binary file comparison or very large files, consider specialized tools. But for text comparison in Python, difflib is the standard library solution.

React to this post:

#Basic Comparison

#Unified Diff

#Context Diff

#HTML Diff

#Finding Close Matches

#Spell Checking Pattern

#Line-by-Line Comparison

#Understanding Diff Markers

#SequenceMatcher Details

#Ignoring Junk

#File Comparison

#Config Change Detection

#Performance Note

#When to Use difflib

Keep Reading

Need help shipping fast?