Data Scientist, India 397 views

Data Scientist

Country : India

City : New Delhi

Type : Full Time

Program (Division) : Country Programs – India


The Clinton Health Access Initiative, Inc. (CHAI) is a global health organization committed to saving lives and reducing the burden of disease in low-and middle-income countries, while strengthening the capabilities of governments and the private sector in those countries to create and sustain high-quality health systems that can succeed without our assistance. For more information, please visit:

CHAI, in partnership with its India affiliate William J Clinton Foundation (WJCF), works in close partnership with and under the guidance of the Ministry of Health and Family Welfare (MoHFW) at the Central and States’ levels on an array of high priority initiatives aimed at improving health outcomes. Currently CHAI/WJCF works across projects to expand access to quality care and treatment for HIV/AIDS, Hepatitis-C, tuberculosis, cancer and immunization.


The world has not experienced a pandemic at the scale or pace of the current COVID-19 pandemic; the virus has ferociously attacked high- and low-income countries alike, but in so doing has exposed inequities in access to quality care within and across countries. Despite a proactive mitigation and stringent suppression strategy, the number of COVID cases in India continues to rise. While the indirect impacts of the COVID-19 on availability and utilization of routine health services cannot yet be known, lessons from previous crises suggest that service disruption and displacement will have grave impact on population health. In order to address these challenges, a robust response based on strengthened health workforce, resilient supply chains and rapid increase in diagnostic capacity is being spearheaded by the Ministry of Health and the Indian Council of Medical Research (ICMR).

Program Overview

WJCF, in partnership with its affiliate CHAI, is supporting the COVID-19 response in the areas of testing policy, strengthened data and delivery systems, and development of guidelines. The programme has been supporting the ICMR in scaling up testing, providing data-driven insights to support supply chain management including forecasting and inventory management, and helping build guidelines and Standard Operating Procedures (SOPs) for downstream distribution of commodities. Separately, the programme also supports selected states on localized strategies for testing scale up and equitable access, stronger engagement with the private sector, improved treatment readiness, robust awareness campaigns and capacity building initiatives, and analytics-based programme management.


We seek a highly motivated individual with demonstrated analytical abilities for the role of Data Scientist based at Delhi. The Data Scientist will analyse programme data and support government stakeholders in scientific research. The Data Scientist will also be responsible for synthesizing data from multiple datasets and creating frameworks for analysing program data which can be used for informing strategic programmatic decisions. The Data Scientist will also be responsible to propose additional research and program management initiatives supported with data / data collection exercises.

A strong profile will include outstanding credentials, analytical ability, and communication skills. The candidate must be self-driven, entrepreneurial, adaptable, and have a high level of comfort with ambiguity. The candidate must be able to function independently and flexibly with a strong commitment to excellence. We place great value on qualities such as resourcefulness, responsibility, tenacity, independence, energy, and work ethic.


Support evaluation of data and data systems for research enhancement and research management; iteratively review data and data acquisition systems along with published and ongoing research on Covid-19 and recommend exploration of additional areas of research.

Product management of a large reporting portal (software engineering experience required); recommend and manage program and governance metrics and additional features/reports and functionalities to inform programmatic decision-making. Coordinate with software development team to enhance/add features and manage change requests.

Proactively develop guidelines and policy for data management, audit mechanisms, best practices and data sharing, including relevant methodology for data sharing (HIEs, PII removed data, REST APIs etc depending on context).Develop and validate multiple hypothesis using relevant statistical tests to answer programmatic questions and test management frameworks distributions and anomalies by performing statistical tests of significance, i.e. Chi Square tests, t-tests, correlation matrices, LDA, ANOVA/MANOVA templates that can be used on multiple datasets with minimum rework.

Coordinate a team of software engineers and assist with UI best practices. Recommend UI development strategies for information sharing on the reporting portal by developing wireframes, functionalities and coordinating development and UAT releases.

Analyse reporting needs and requirements, assesses current reporting in the context of strategic goals and devise plans for delivering the most appropriate reporting solutions to users.

Support capacity building of key government functionaries on relevant competencies for sustained ownership and delivery

Perform other responsibilities as requested by program leadership.


Master’s/Bachelor’s degree in engineering and or related fields (Courses with Linear Algebra, Calculus, computer programming) with 5+ years’ work experience

Experience of integrating ML projects into software systems

Proficient in SQL:

Ability to write production grade queries (including nested queries)Understanding of stored procedures, partitioning of tables, AD security, and indexes

Proficient in R/Python:

Familiarity with control structures, loops, OOPS principles



computer programming

data collection

data management

data scientist

data sharing

development strategies


family welfare

global health


health outcomes

health services

health systems


machine learning


medical research

programme management

scientific research

supply chain

supply chain management

Understanding of notebooks

Familiarity with data structures and algorithms – Intermediate

Understanding of text operations including regular expressions, string processing

Proficient in machine learning techniques

Familiarity with Linear and Logistic Regressions

Basic experience with tree and tree ensemble learning techniques (Bagging and Boosting)

Familiarity with unsupervised learning techniques (knn, Cosine similarity, hierarchical clustering)

Proficient in NLP (Named Entity Extraction, POS Tagging, Cosine Similarity, Sequence Networks such as RNN (Recurrent Neural Networks,) BERT, Word2Vec)

Experience with distributed systems such as Hadoop / Spark (PySpark / SparkR)

Experience in open source BI tools such as Metabase and D3

Entrepreneurial mindset, including ability to work independently, self-motivate, and propose and implement new initiatives

Exceptional communication (written and verbal) skills and stakeholder management capabilities

Ability to think strategically, handle ambiguity, and problem solve in a fast-paced, limited-structure, multicultural environment

Ability to be effective in high-pressure situations, handle multiple tasks simultaneously, and set priorities

Ability to absorb and synthesize a broad range of information, especially clinical, scientific and technical manufacturing information


More Information

Share this job
Organisation Information
Contact Us

All the world’s best Global Health opportunities. Register now.