2. Kaggle 2017 US Data Scientist Sample
4197 data scientists included who identified the US as their country.
Factors examined focuses on gender, machine learning knowledge, education, and job title.
Visual explorations and a series of machine learning models were run to explore how these factors
impact compensation levels.
1544 provided compensation data to compare salaries by different demographic factors, and
this subsample was examined through machine learning models.
3. Demographics of US Data Scientists
US data scientists tend to be male and in the field for less than 5
years, though some have been in the field for more than 10 years.
Very few data scientists identify as LGTBQ in the US, despite
increasing levels of openness about this identity.
4. Education of US Data Scientists
Most data scientists in the US (67% have advanced education.
Common majors include math/stat, engineering, and computer science, though other
sciences are well-represented.
Many data scientists come from well-educated families, where parents have obtained
at least a Bachelor’s degree; 45% come from families with a Master’s degree or higher.
5. Importance of Different Factors in Job
Considerations
Diversity is not as important
a consideration as language
used, salary offered, impact
potential, and job industry.
6. Allocation of Time on Data Science
Projects
A lot of time is spent
on gathering data,
and this is a potential
bottleneck in data
science projects.
7. Education and Machine Learning
Knowledge
Those who are able to innovate new algorithms place the highest relative value on education; they comprise
12% of the US data scientist population.
Those know how to run code or tune parameters place the lowest relative value on education and comprise
19% of data scientists.
About 40% can explain it to someone without technical knowledge, a crucial skill in data science positions.
8. Skill Disparity between Male and Female
Data Scientists
Males are more likely to be able to innovate than females (13% vs. 9%). They
are also more likely to make the code faster/code from scratch (31% vs. 23%).
Females are more likely to only have enough knowledge to tune parameters or
run a library (25% vs. 17%).
9. Titles and Skills
Data scientist is the most common title (38%), but account for
only 29% of those who can innovate.
Researchers make up only 19% of titles but a whopping 40% of
those who can innovate.
Analysts make up 17% of titles but only 3% of those who can
innovate algorithms and only 9% of those who can explain the
algorithms to someone non-technical.
10. Education and Skills
Many more doctoral-
level data scientists are
able to innovate (24%)
than bachelor-level
(6%) or master-level
data scientists (9%).
Bachelor-level data
scientists are more
likely to only know how
to run a library (16%)
than master-level (9%)
or doctoral-level (5%)
data scientists.
12. Compensation by Education and Gender
Finishing
college is
essential. A
professional
or doctoral
degree is
worth the
time and
effort, as
well.
13. Gender Compensation Disparities and
Compensation by Fields of Study
Females earn quite a bit less compensation than males and LGTBQ individuals.
Engineering provides the most compensation, while humanities provides the least.
IT folks tend to earn less than those in fields of
math/physics/engineering/computer science.
14. Predictive Modeling of Compensation
Analyses performed on 1522 data scientists providing
compensation information along with all predictors; 22
individuals were missing predictor information.
Several models were run to predict compensation using a
Tweedie distribution: random forest, conditional inference
trees, LASSO, extreme learning machines, evolved trees,
and MARS.
All models yielded similar performance (~3-10% of variance
accounted for).
Age, tenure, and industry were the largest predictors of
compensation.
Major, gender, education, and algorithm understanding
level do play a minor role in compensation, though.
15. Conclusions
Skills vary widely according to education, gender, and role.
Different skills are associated with different pay, as well as different values of education as a
path to data science.
Tenure, age, and industry play a large role in compensation, but these factors are difficult to
change for data scientists entering the field and studying at university.
Addressing the educational and gender disparities in skill level may be a way to even the
playing field through equipping new data scientists with the most valuable skills and knowledge
levels sought in the field.