The difflib module compares sequences and generates human-readable diffs. Whether you're building a code review tool or comparing configuration files, difflib has you covered.

Basic Comparison

Find differences between two texts:

import difflib
 
text1 = "Hello world"
text2 = "Hello there world"
 
matcher = difflib.SequenceMatcher(None, text1, text2)
print(matcher.ratio())  # 0.78 - similarity ratio

Unified Diff

Generate diffs like git diff:

import difflib
 
original = """line 1
line 2
line 3
line 4""".splitlines(keepends=True)
 
modified = """line 1
line 2 modified
line 3
new line
line 4""".splitlines(keepends=True)
 
diff = difflib.unified_diff(
    original, modified,
    fromfile='original.txt',
    tofile='modified.txt'
)
 
print(''.join(diff))

Output:

--- original.txt
+++ modified.txt
@@ -1,4 +1,5 @@
 line 1
-line 2
+line 2 modified
 line 3
+new line
 line 4

Context Diff

More context around changes:

import difflib
 
diff = difflib.context_diff(
    original, modified,
    fromfile='original.txt',
    tofile='modified.txt'
)
 
print(''.join(diff))

HTML Diff

Generate visual HTML diffs:

import difflib
 
differ = difflib.HtmlDiff()
html = differ.make_file(
    original, modified,
    fromdesc='Original',
    todesc='Modified'
)
 
with open('diff.html', 'w') as f:
    f.write(html)

Finding Close Matches

Find similar strings in a list:

import difflib
 
words = ['apple', 'application', 'apply', 'banana', 'band']
 
matches = difflib.get_close_matches('app', words)
print(matches)  # ['apple', 'apply', 'application']
 
# Control cutoff and number of results
matches = difflib.get_close_matches('app', words, n=2, cutoff=0.6)
print(matches)  # ['apple', 'apply']

Spell Checking Pattern

import difflib
 
def suggest_corrections(word, dictionary):
    """Suggest spelling corrections."""
    matches = difflib.get_close_matches(word, dictionary, n=3, cutoff=0.6)
    if matches:
        return f"Did you mean: {', '.join(matches)}?"
    return "No suggestions found"
 
dictionary = ['python', 'programming', 'program', 'project', 'process']
print(suggest_corrections('progrm', dictionary))
# Did you mean: program, programming?

Line-by-Line Comparison

The Differ class shows detailed changes:

import difflib
 
text1 = ['one\n', 'two\n', 'three\n']
text2 = ['one\n', 'tree\n', 'three\n']
 
d = difflib.Differ()
diff = d.compare(text1, text2)
print(''.join(diff))

Output:

  one
- two
+ tree
  three

Understanding Diff Markers

  (space) - line is same in both
- (minus) - line only in first sequence
+ (plus)  - line only in second sequence
? (question) - highlights specific changes

SequenceMatcher Details

Get detailed matching information:

import difflib
 
s = difflib.SequenceMatcher(None, "abcd", "bcde")
 
# Matching blocks
for block in s.get_matching_blocks():
    print(block)
# Match(a=1, b=0, size=3)  - "bcd" matches
# Match(a=4, b=4, size=0)  - end marker
 
# Operations to transform a to b
for op, i1, i2, j1, j2 in s.get_opcodes():
    print(f"{op:7} a[{i1}:{i2}] --> b[{j1}:{j2}]")
# delete  a[0:1] --> b[0:0]  - remove 'a'
# equal   a[1:4] --> b[0:3]  - keep 'bcd'
# insert  a[4:4] --> b[3:4]  - add 'e'

Ignoring Junk

Skip unimportant characters:

import difflib
 
# Ignore spaces
s = difflib.SequenceMatcher(
    lambda x: x == " ",  # junk function
    "hello world",
    "helloworld"
)
print(s.ratio())  # Higher because space is ignored

File Comparison

Compare two files:

import difflib
 
def compare_files(file1, file2):
    with open(file1) as f1, open(file2) as f2:
        lines1 = f1.readlines()
        lines2 = f2.readlines()
    
    diff = difflib.unified_diff(
        lines1, lines2,
        fromfile=file1,
        tofile=file2
    )
    return ''.join(diff)
 
print(compare_files('old_config.txt', 'new_config.txt'))

Config Change Detection

import difflib
 
def detect_config_changes(old_config, new_config):
    """Detect and report configuration changes."""
    old_lines = old_config.strip().split('\n')
    new_lines = new_config.strip().split('\n')
    
    d = difflib.Differ()
    changes = list(d.compare(old_lines, new_lines))
    
    added = [line[2:] for line in changes if line.startswith('+ ')]
    removed = [line[2:] for line in changes if line.startswith('- ')]
    
    return {'added': added, 'removed': removed}
 
old = "debug=false\nport=8080"
new = "debug=true\nport=8080\nhost=localhost"
 
changes = detect_config_changes(old, new)
print(f"Added: {changes['added']}")    # ['debug=true', 'host=localhost']
print(f"Removed: {changes['removed']}")  # ['debug=false']

Performance Note

For very long sequences, SequenceMatcher can be slow. Use autojunk=False if you're getting unexpected results:

s = difflib.SequenceMatcher(None, long_text1, long_text2, autojunk=False)

When to Use difflib

Use difflib when:

  • Comparing configuration files
  • Building diff viewers
  • Spell checking / fuzzy matching
  • Finding similar strings
  • Generating patch files

For binary file comparison or very large files, consider specialized tools. But for text comparison in Python, difflib is the standard library solution.

React to this post: