Ground Truth NLP Extraction
ImpactClean data pipeline
Data QualityCleaned
LanguagesMulti
ModulesReusable

Overview
Developed an NLP pipeline to extract ground truth values from messy technician work records. The system cleans, translates, and processes text data to identify problem statements, work performed, and parts used.
The Challenge
Raw technician data was inconsistent, multilingual, and unstructured, making it impossible to use for training ML models or generating insights.
The Solution
Built a comprehensive NLP pipeline using Google Translations for language normalization, TextBlob and NLTK for text processing, and created reusable OOP-based modules for similar use cases.
Key Results
Extracted usable rows from raw unstructured data
Identified part combinations and work sequences
Created reusable NLP modules for similar scenarios
Tech Stack
Google Translate APITextBlobNLTKPythonPandasOOP Design Patterns