Exploration of Nobel Prizes awarded between 1901 and 2016
This project is maintained by lav30
This image provides a high level overview of the dataset. The dataset has 969 observations with both numerical and categorical features. The pie charts show the distribution of a single feature and the stacked bar graph displays the prize share in all categories.
Data exploration through visualization often provides interesting insights into a dataset. This visual exploration project focuses on Nobel prizes awarded over a period of more than a hundred years. These data explorations through bar graph, scatter plots and histograms provide a detailed and clear picture of the background of the awardees. Several columns contain missing values and these too can be vizualized through Python modules. Further study of the dataset provides key insights into the gender disparity between male and female laureates. Also, a comprehensive view about the dataset can be obtainbed through age, gender and category related visualizations.
The graphs below illustrate the number of missing values in each column.
This graph shows the number of missing values in each column (feature).
The empty lines inside the matix correspond to the missing values.
The heatmap vizualizes the correlation between the missing values in each column.
It can be observed that the number of Nobel Prizes awarded over the years have increased. This increase in number is due to the introduction of the new Economics Prize in the 1960s and also due to increasing number of laureates who share a prize each year.
This graph displays the number of prizes awarded in a 20 year period.
This graph displays the number of prizes awarded in a ten year period, by category. Chemistry and Medicine prizes have the most prize shares.
This graph represents a kernel density estimate (kde), which is used to visualize the distribution of observations related to a particular feature. The solid line represents the expected distribution and the histogram represents the distribution of the ‘Age’ feature. This graph essentially represents the underlying probability density function of the distribution.
The above graph can be further elaborated to represent the age distribution of each category.
Scatterplot with varying point sizes and hues. Each data point represents an awardee filtered by age and year. The size of the point indicates the fraction of the prize received by the awardee.
Scatterplot with varying point sizes and hues.
Note the disparity between the number of men and women awarded over the years
This graph is an extension of plot 5 above. It displays the probability distribution underlying the ‘Age’ feature for all the categories. The shape of the density curve highlighted in white is essentially the same shape as the histogram, which displays the number of laureates in each age bin.
Number of awards in each category displayed on a logarithmic scale.
This plot is an extension of Figure 6, with the axes reversed. Each point on the both the graphs represents a Nobel Prize awardee categorized by age and year awarded.
These plots display the age at which individuals received the Nobel Prizes over the years.
These charts display the top 20 (by count) countries, cities and organizations that have received Nobel Prizes.
Based on the visual analyses performed above, the most favorable path to become a Nobel Prize laureate is to be a male from the United States, born in New York City and to have studied or worked in the University of California system or Harvard University.