Compare Lists with Fuzzy, Phonetic & Numeric Matching

Note: Maximum file count is 4,000,000 characters per file. Approximate matching is limited to files under 160,000 characters.

File A

Preview

File B

Preview

Comparison Options

Preprocessing Options

Removes whitespace from the beginning and/or end of each line
Symbols to remove from text before comparison (comma-separated list)

Substitutions

Replace specific text with alternatives before comparison

Exclusion List

Items that will be completely ignored from the matching process

Advanced Options

Note: Only applies to pure numeric records and less than 1 million with up to 6 digits.
Finds matches with small differences (typos, extra characters, etc.). Limited to files under 160,000 characters.
Maximum number of character changes allowed for approximate matches
Matches words that sound similar but are spelled differently (e.g., "phone" and "fone")

Comparison Results

Frequently Asked Questions

How can I compare lists with different formats of the same data?

Use the "Substitutions" feature to standardize formats before comparison. For example, add substitution rules to convert "(123) 456-7890" to "1234567890" or "+1-123-456-7890" to "1234567890" to convert data to a consistent formatting.
In addition, you can use Approximate Matching with a threshold of 1-2 for fuzzy matching.

How can I handle inconsistent spacing or formatting?

Combine multiple features:
- Use "Trim Whitespace" options (start/end or both)
- Add substitutions for common formatting issues:
“ ” (double space) → “ ” (single space)
“--” → “-” (standardize dashes)

Can I exclude records from my comparison?

Yes! Use the "Exclusions" feature to specify entire records you want to completely ignore during comparison.

How can I use the tool to identify data entry patterns or errors?

When you upload the same file to both inputs and check the "Approximate Matches" tab with a threshold (1-2), you'll find entries that differ by just a few characters. Review these patterns to identify data entry errors, such as consistent misspellings, extra spaces, or transposed characters.

How can I use the numeric tolerance feature?

Remember that numeric tolerance only works on purely numeric entries. For mixed entries containing both numbers and text:
1. Use substitutions to extract just the numeric portion
2. For example, add substitutions like "Price: $" → "" (empty text) to convert "Price: $1234" to "1234"
3. Then set your desired numeric tolerance
This transforms mixed text/numeric entries into purely numeric ones that can use the tolerance feature.

How can I compare financial data with currency symbols?

Add currency symbols ($, €, £, ¥) to the "Ignore Symbols" option and allow for slight variations in the numeric tolerance if needed.

How can I match financial data that includes both currency symbols and text?

For entries like "$5,000 Annual Fee" that contain both numeric and text elements:
1. Use substitutions to standardize formats (e.g., " Annual Fee" → "" empty text)
2. Add $ to "Ignore Symbols"

When exactly does the numeric tolerance matching work?

Numeric tolerance only applies to records that are entirely numeric with no text. For example, records like"12.34" and "12.35" can match with numeric tolerance, but records like "ID-12.34" and "ID-12.35" would not be considered for numeric matching. You can use substitutions to substitute "ID-" with "" (empty text) and then you can match them numerically.

Can I match ranges of numeric values?

While the tool doesn't directly support range matching, you can use creative substitutions. For example, if you want to match prices in a range (like $10-15), create substitution rules for each value in the range to standardize to a single value: "$10" → "price_tier_1", "$11" → "price_tier_1", etc.

What's the best way to find names that sound similar but are spelled differently?

Use the "Phonetic Matching" option, which identifies entries that sound alike despite different spellings. This is excellent for finding matches like "Smith" and "Smyth" or "Catherine" and "Katherine" or "phone" and "fone".

What's the best way to handle text with accents and special characters?

You have several methods:
- Enable "Phonetic Matching" to match similar-sounding words regardless of accents
- Use "Approximate Matching" with a threshold of 1-2 to catch minor character differences
- Add specific substitutions for common accent variations (é → e, ñ → n)

I have product codes with minor variations I want to match. What's the best method?

For alphanumeric codes with slight differences (like "PRD-1234" and "PRD1234"), add - to the "Ignore Symbols" options.

How can I match similar company names with different legal entities?

Use the substitutions feature to standardize company names:
- Add substitutions like "Ltd." → "Limited"
- "Corp." → "Corporation"
- "Inc." → "Incorporated"

How can I standardize different date formats before comparison?

Use substitutions to convert dates to a standard format:
- Add substitutions like "Jan" → "01", "Feb" → "02"
- Remove date separators using ignored symbols

Can I match specific columns from complex spreadsheets?

While the tool works with line-by-line comparison, you can prepare your data by extracting only the relevant columns before uploading. For Excel files, create a text file with just the column you want to compare.

Can I create a custom "dictionary" of equivalent terms?

Yes! Use the "Substitutions" feature to build a custom dictionary. For example, add substitutions like "United States" → "USA", "Limited" → "Ltd", etc. The tool will substitute these terms before comparison.

How can I find entries that appear frequently in one list but rarely in another?

Check the "Top Frequent" tab in your results. This shows the most common entries in each list, helping you identify terms that appear disproportionately in one list versus the other.

How can I identify and manage duplicate entries?

The tool provides several methods:
- Check the "Duplicates" tab for each file separately
- Use the "Top Frequent" tab to identify commonly repeated values

Can I match records with transposed information?

For fields where information order might be swapped (like "John Smith" vs "Smith, John"), add both variants as substitution rules. For example, add "Smith, John" → "John Smith" to standardize the format.

Is there a way to identify near-duplicates within a single list?

Yes! Upload the same list as both File A and File B. The tool will identify entries that match with themselves (exact matches), while near-duplicates will appear in the Approximate, Phonetic, or Numeric tabs depending on the match type you choose.

What's the difference between the various export formats?

Each format has specific advantages:
- CSV: Universal compatibility and easy to open in any spreadsheet program
- XLSX: Maintains formatting and multiple tabs for different match types
- JSON: Ideal for developers or for importing results into other systems

How can I export just a specific type of match?

When viewing results, navigate to the specific tab with the match type you want (Exact, Approximate, Phonetic, etc.). Then click the export button in that tab to export only those specific matches.

How can I completely remove specific words or phrases instead of excluding them entirely from my comparison?

Use the "Substitutions" feature to replace terms with an empty string. For example, add a substitution rule like "Confidential" → "" (empty) to remove this term from all records before comparison.

Can I use the tool to merge and deduplicate lists while preserving unique entries?

Yes! After comparison:
1. Export the "Exact Matches" to get entries that appear in both lists
2. Export "Unmatched A" to get entries unique to the first list
3. Export "Unmatched B" to get entries unique to the second list
Combining these three exports gives you a fully merged and deduplicated dataset.

Can I use the tool to deduplicate a single file while preserving the original data format?

Yes! Upload the same file as both File A and File B. The "Duplicates" tab will show you all entries that appear more than once in your file, along with their frequency counts. Export this list to identify and clean duplicates from your original file.

Is there a way to preserve the original row numbers or identifiers from my source data?

Before uploading, consider adding a unique identifier or row number to each entry (separated by a delimiter like a tab). After comparison, these identifiers will help you trace matches back to their original position in your source data.

Can I use exclusions to filter out statistical noise from my comparisons?

Absolutely! Run an initial self-comparison (same file in both inputs) to identify very high-frequency terms using the "Top Frequent Items" tab. Add these common terms to your exclusion list, then run your actual comparison. This filters out "noise" terms that would otherwise dominate your results, making it easier to spot meaningful differences.

How can I clean up messy product or SKU data with the tool?

For product codes that might contain variations:
1. Upload your product list as both File A and File B
2. Enable "Approximate Matching" with threshold 1 or 2
3. Check results to identify likely duplicate or variant product codes

Can I use the tool for hierarchical matching of complex data?

Yes, with a multi-compare strategy:
1. First run comparison to find exact matches
2. Export and remove these matches from your original files or add them to the exclusion list
3. Run a second compare with matching criteria (approximate, phonetic) on the remaining data
4. Continue with increasingly flexible matching options
This "waterfall" method gives you confidence in match quality while still finding less obvious matches.

What file types can I upload for comparison?

The comparison tool only supports plain text (.txt) files. Binary files or other non-text formats will not be accepted.

Do you store my uploaded files on your server?

No, we DO NOT store any of your uploaded data. All data is discarded once the comparison is complete. Your privacy and data security are important to us, so no information from your files is ever saved on our servers.

Got another question, custom comparison case, or any inquiry?

Contact me at admin at likegeeks dot com