Innodata Inc.
Innodata solves your toughest data engineering challenges using artificial intelligence and human expertise.
Senior Language Data Scientist – Search Specialization
Location
New Jersey
Posted
11 days ago
Salary
Not specified
Postgraduate Degree5 yrs expEnglishPandasPython
Job Description
• Lead long-term projects with high complexity and ambiguity from first discussion with the client to completion
• Design/improve workflows to create data for AI/ML training and evaluation.
• Design and refine search data annotation frameworks, including relevance judging guidelines
• Dive deep into existing workflows and processes to gather data and insights, make recommendations, and drive improvement through innovation and cross-functional collaboration with customers
• Assess and optimize search-specific evaluation approaches, including A/B testing frameworks, ranking metrics, and human evaluation studies for search result quality
• Quantitatively analyze large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance.
• Work closely with client stakeholders on understanding goals, gathering requirements, proposing solutions, and executing them.
• Contribute to establishing best practices and standards for generative AI development with customers and within the organization
Job Requirements
- MA in (computational) linguistics, data science, computer science (AI / ML / NLU), quantitative social sciences or a related scientific / quantitative field, PhD strongly preferred
- Ability to collaborate directly with technical stakeholders including senior project managers, data engineers, and research scientists.
- Extensive experience working with search-specific language data (queries, documents, relevance judgments, intent labels) and designing human evaluation tasks, including multi-phase and complex workflows.
- Advanced knowledge of statistics, metrics (e.g. f1 score, inter-rater reliability metrics), and data analysis methods such as sampling.
- Experience with Natural Language Processing (NLP) techniques and tools, such as SpaCy, NLTK, or Hugging Face.
- Proficiency in Python to handle / transform large datasets (e.g. pre- and postprocessing data, pandas) perform quantitative analyses visualize data (for example matplotlib, seaborn)
- Deep understanding of data pipelines to support ML and NLP workflows.
- Knowledge of efficient data collection, transformation, and storage.
- Knowledge of data structures, algorithms, and data engineering principles.
Benefits
- Health insurance
- 401(k) matching
- Flexible work hours
- Professional development opportunities