The difflib module compares sequences and generates human-readable diffs. Whether you're building a code review tool or comparing configuration files, difflib has you covered.
Basic Comparison
Find differences between two texts:
import difflib
text1 = "Hello world"
text2 = "Hello there world"
matcher = difflib.SequenceMatcher(None, text1, text2)
print(matcher.ratio()) # 0.78 - similarity ratioUnified Diff
Generate diffs like git diff:
import difflib
original = """line 1
line 2
line 3
line 4""".splitlines(keepends=True)
modified = """line 1
line 2 modified
line 3
new line
line 4""".splitlines(keepends=True)
diff = difflib.unified_diff(
original, modified,
fromfile='original.txt',
tofile='modified.txt'
)
print(''.join(diff))Output:
--- original.txt
+++ modified.txt
@@ -1,4 +1,5 @@
line 1
-line 2
+line 2 modified
line 3
+new line
line 4Context Diff
More context around changes:
import difflib
diff = difflib.context_diff(
original, modified,
fromfile='original.txt',
tofile='modified.txt'
)
print(''.join(diff))HTML Diff
Generate visual HTML diffs:
import difflib
differ = difflib.HtmlDiff()
html = differ.make_file(
original, modified,
fromdesc='Original',
todesc='Modified'
)
with open('diff.html', 'w') as f:
f.write(html)Finding Close Matches
Find similar strings in a list:
import difflib
words = ['apple', 'application', 'apply', 'banana', 'band']
matches = difflib.get_close_matches('app', words)
print(matches) # ['apple', 'apply', 'application']
# Control cutoff and number of results
matches = difflib.get_close_matches('app', words, n=2, cutoff=0.6)
print(matches) # ['apple', 'apply']Spell Checking Pattern
import difflib
def suggest_corrections(word, dictionary):
"""Suggest spelling corrections."""
matches = difflib.get_close_matches(word, dictionary, n=3, cutoff=0.6)
if matches:
return f"Did you mean: {', '.join(matches)}?"
return "No suggestions found"
dictionary = ['python', 'programming', 'program', 'project', 'process']
print(suggest_corrections('progrm', dictionary))
# Did you mean: program, programming?Line-by-Line Comparison
The Differ class shows detailed changes:
import difflib
text1 = ['one\n', 'two\n', 'three\n']
text2 = ['one\n', 'tree\n', 'three\n']
d = difflib.Differ()
diff = d.compare(text1, text2)
print(''.join(diff))Output:
one
- two
+ tree
three
Understanding Diff Markers
(space) - line is same in both
- (minus) - line only in first sequence
+ (plus) - line only in second sequence
? (question) - highlights specific changes
SequenceMatcher Details
Get detailed matching information:
import difflib
s = difflib.SequenceMatcher(None, "abcd", "bcde")
# Matching blocks
for block in s.get_matching_blocks():
print(block)
# Match(a=1, b=0, size=3) - "bcd" matches
# Match(a=4, b=4, size=0) - end marker
# Operations to transform a to b
for op, i1, i2, j1, j2 in s.get_opcodes():
print(f"{op:7} a[{i1}:{i2}] --> b[{j1}:{j2}]")
# delete a[0:1] --> b[0:0] - remove 'a'
# equal a[1:4] --> b[0:3] - keep 'bcd'
# insert a[4:4] --> b[3:4] - add 'e'Ignoring Junk
Skip unimportant characters:
import difflib
# Ignore spaces
s = difflib.SequenceMatcher(
lambda x: x == " ", # junk function
"hello world",
"helloworld"
)
print(s.ratio()) # Higher because space is ignoredFile Comparison
Compare two files:
import difflib
def compare_files(file1, file2):
with open(file1) as f1, open(file2) as f2:
lines1 = f1.readlines()
lines2 = f2.readlines()
diff = difflib.unified_diff(
lines1, lines2,
fromfile=file1,
tofile=file2
)
return ''.join(diff)
print(compare_files('old_config.txt', 'new_config.txt'))Config Change Detection
import difflib
def detect_config_changes(old_config, new_config):
"""Detect and report configuration changes."""
old_lines = old_config.strip().split('\n')
new_lines = new_config.strip().split('\n')
d = difflib.Differ()
changes = list(d.compare(old_lines, new_lines))
added = [line[2:] for line in changes if line.startswith('+ ')]
removed = [line[2:] for line in changes if line.startswith('- ')]
return {'added': added, 'removed': removed}
old = "debug=false\nport=8080"
new = "debug=true\nport=8080\nhost=localhost"
changes = detect_config_changes(old, new)
print(f"Added: {changes['added']}") # ['debug=true', 'host=localhost']
print(f"Removed: {changes['removed']}") # ['debug=false']Performance Note
For very long sequences, SequenceMatcher can be slow. Use autojunk=False if you're getting unexpected results:
s = difflib.SequenceMatcher(None, long_text1, long_text2, autojunk=False)When to Use difflib
Use difflib when:
- Comparing configuration files
- Building diff viewers
- Spell checking / fuzzy matching
- Finding similar strings
- Generating patch files
For binary file comparison or very large files, consider specialized tools. But for text comparison in Python, difflib is the standard library solution.