Innodata Inc.

Innodata solves your toughest data engineering challenges using artificial intelligence and human expertise.

Senior Language Data Scientist – Search Specialization

Full TimeRemoteTeam 1,001-5,000H1B No SponsorCompany SiteLinkedIn

Location

New Jersey

Posted

11 days ago

Salary

Not specified

Postgraduate Degree5 yrs expEnglishPandasPython

Job Description

• Lead long-term projects with high complexity and ambiguity from first discussion with the client to completion • Design/improve workflows to create data for AI/ML training and evaluation. • Design and refine search data annotation frameworks, including relevance judging guidelines • Dive deep into existing workflows and processes to gather data and insights, make recommendations, and drive improvement through innovation and cross-functional collaboration with customers • Assess and optimize search-specific evaluation approaches, including A/B testing frameworks, ranking metrics, and human evaluation studies for search result quality • Quantitatively analyze large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance. • Work closely with client stakeholders on understanding goals, gathering requirements, proposing solutions, and executing them. • Contribute to establishing best practices and standards for generative AI development with customers and within the organization

Job Requirements

  • MA in (computational) linguistics, data science, computer science (AI / ML / NLU), quantitative social sciences or a related scientific / quantitative field, PhD strongly preferred
  • Ability to collaborate directly with technical stakeholders including senior project managers, data engineers, and research scientists.
  • Extensive experience working with search-specific language data (queries, documents, relevance judgments, intent labels) and designing human evaluation tasks, including multi-phase and complex workflows.
  • Advanced knowledge of statistics, metrics (e.g. f1 score, inter-rater reliability metrics), and data analysis methods such as sampling.
  • Experience with Natural Language Processing (NLP) techniques and tools, such as SpaCy, NLTK, or Hugging Face.
  • Proficiency in Python to handle / transform large datasets (e.g. pre- and postprocessing data, pandas) perform quantitative analyses visualize data (for example matplotlib, seaborn)
  • Deep understanding of data pipelines to support ML and NLP workflows.
  • Knowledge of efficient data collection, transformation, and storage.
  • Knowledge of data structures, algorithms, and data engineering principles.

Benefits

  • Health insurance
  • 401(k) matching
  • Flexible work hours
  • Professional development opportunities

Related Categories

Related Job Pages