Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

school_mapping.py #17

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

school_mapping.py #17

wants to merge 1 commit into from

Conversation

Cybertechnnp
Copy link

Description
This pull request adds a script (school_matching.py) to match schools from Source A to Source B using fuzzy matching of transliterated school names and district IDs.

Changes Made

  • Added school_matching.py which:
    • Loads school data from school_list_A.tsv and school_list_B.tsv.
    • Transliterates Devanagari text to Romanized text using the Velthuis method.
    • Matches schools based on transliterated names and district IDs using the RapidFuzz library.
    • Saves the matching results to school_mapping_results.csv.

Assumptions

  • District mapping data is provided in jilla.tsv with Devanagari district names.
  • Fuzzy matching threshold is set to 70.

This approach ensures accurate matching based on transliterated names within the same district. Open to feedback and suggestions for improvements.

Contributor
This contribution was made by Bimal Bhandari.

 Description
This pull request adds a script (`school_matching.py`) to match schools from Source A to Source B using fuzzy matching of transliterated school names and district IDs.

Changes Made
- Added `school_matching.py` which:
  - Loads school data from `school_list_A.tsv` and `school_list_B.tsv`.
  - Transliterates Devanagari text to Romanized text using the Velthuis method.
  - Matches schools based on transliterated names and district IDs using the RapidFuzz library.
  - Saves the matching results to `school_mapping_results.csv`.

Assumptions
- District mapping data is provided in `jilla.tsv` with Devanagari district names.
- Fuzzy matching threshold is set to 70.


This approach ensures accurate matching based on transliterated names within the same district. Open to feedback and suggestions for improvements.

Contributor
This contribution was made by Bimal Bhandari.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant