1. Why do we care about data visualization?
We're pretty good at recording data.
Too good, probably.
- On an average day, Twitter generates about 500 million tweets. That's about 15 times the number of books in the Library of Congress
- In August 2010, Eric Schmidt, former CEO of Google, said that between the beginning of time and 2003, humanity generated about 5 exabytes (5 x 1018 bytes) of data.
- Now, we generate that much data roughly every 2 days.
These mountains of data add up. We record much more data than we can possibly analyze 1:1 in a lifetime.
Even when we aggregate data, it's not always clear what's going on.
And this is sort of a bummer, because
Data is only valuable when it's understood.
Even if you ignore problems that truly big data presents for our understanding, we still have a very strong case for the importance of Data Visualiation:
A: Some of our questions have conflicting answers.
- It is a foregone conclusion that someone selling you something has an unobjective bias toward their product.
- The climate of most commonly-shared news sources online is one of commodified viewership. Your attention is what's being bought.
- In the name of persuasion, some sources are simply unreliable -- they require analysis.
- Data lets us explore, lets us fact-check. Visualization of this data lets us do that in an accessible way.
Data makes critical thinkers of us.
B: Sometimes we ask the wrong questions.
- We might ask "How many votes did the candidate receive?"
- We probably mean "How many votes did the candidate receive, relative to Other Candidate"
- Or, if we're trend-watching, "relative to their total from the last election", etc.
Data answers the questions we didn't know we had.
C: Sometimes the answer to a simple question is unsatisfying.
D: Sometimes there are many distinct narratives to explore.
Data makes critical thinkers of us all
Anscombe's Quartet
4 Datasets with nearly-identical statistical properties
Anscombe's Quartet
Upon visual analysis, could not be more distinct from one another.
Statistical Detection of ... ?
source
Statistical Detection of Election Fraud
source
Visualization lends meaning to raw data.
2. Different tools for different jobs
In terms of static data visualization, we've been working at this for a long time. From 1910:
Good data design adheres to the properties of the visual encodings it uses
- Certain types of visualizations are well-suited to particular data types:
- Bar charts are boring, but well-suited to allow us to quickly determine differences in quantity:
- Sometimes pie graphs are okay for this too, but be warned: at a glance, people don't see area, they see height.
- Treemaps show quantity, plus a sense of hierarchy
- Maps provide several ways to show physical location.
- Maps are particularly versatile because of our common understanding of their expected shape; differences from our expectations let us see a different kind of story:
- This cartogram shows expected popularion by 2300. It is only by changing our previous knowledge of the shape of the world that this has any effect.
- Time-Series charts show how quantitative values change over time:
- Scatter plots show us the differences between things in two dimensions
- Network graphs show us the relationships between things - for example, this map of all of the followers of @CollideHalifax
3. Some common problems to avoid
So, when I see a Visualization, can I be sure that the Data Story I'm being told is correct?
- Not always! Be critical.
- Mark Twain's 3 kinds of lies:
"Lies, Damned Lies, and Statistics".
Avoid these common deceptions when designing data stories:
An aside: "The Map is not the Territory" -Alfred Korzybski
B: Cumulative Time-Series Charts
C: Circles in general.
We're meant to see area; we commonly just see height.
D: Pie charts when dealing with close values
(Bar charts are boring but we're really really good at reading them)
E: Bar charts with arbitrary axis points
F: Seeing correlations when they're coincidental
source
4. Notable Interactive Visualizations and what makes them Great
The State of the Union in Context
source
source2
Bar Charts, Interactive
Casting Shakespeare
source
Scatter plot/ Time series, Scrolling narrative.
BBC: Scotland Decides
source
Normalized bar chart, Map. Allows user to sort data.
NPR - When are people working?
source
Area Graph. Allows user to visually compare.
Musicians' Deaths by Age
source
Time Series. Allows user to filter based on category.
Bachelors Degrees by Demographics
source
Visualizing MBTA Data
source
Parallel Coordinates, Maps, Bar and Area. Allows user to explore relative and absolute time, and compare stations/trains.
NY Times: How the Recession Reshaped the Economy, in 255 Charts
source
Loose interaction; the act of scrolling shows comparisons between industries. Sparklines. At the bottom, Small Multiples show the fuller picture.
- The primary theme among the interactions presented here is that they allow users to complete a few common, basic actions:
- Sort: Representing the data in different orders can reveal trends otherwise unseen
- Filter: Only show the data that meets certain user-defined criteria
- Compare: Allow the user to take pieces of the whole and see how they relate.
- Why do these visuals feel so compelling, compared to traditional static media, or data in chart format?
- The general pathway to understanding:
- Unstructured Information
- Data
- Structured Information
- Knowledge
- Wisdom
- Visualization helps bridge the gap between Knowledge and Wisdom. Interactivity prompts exploration, and exploration leads to understanding.
- And smarter users = good for everyone.
<Intermission>
(When we come back: D3, Observable, and Gephi)
What's your preferred code editor?
- Sublime Text?
- Visual Studio?
- Eclipse?
- Vim?
- R Studio?
- What about Chrome?
A skill that data scientists often underdevelop is the ability to derive meaning from data on the fly.
- Sometimes you don't have your ideal tools available
- Even if you did, sometimes the person to whom you're telling a data story, doesn't understand your raw data.
- We're going to use one of the most maligned programming languages, Javascript, to build an on-the-fly data visualization toolkit
- Time to open some tabs!
- Head over to
- Copy the import code
- When you have that copied, head over to
- Open up your Chrome dev tools console - CMD + ALT + i
- (or right-click and select "Inspect Element", then hit ESC)
- Paste in your import code from tinyurl.com/acadia-import and hit enter
- Congratulations! You now have access to a very advanced data visualization toolkit.
Aside: making Chrome Dev Tools more comfortable
- You can return clear your history with
cmd + k
- You can revisit previous history with
↑
- You can access a control panel with
cmd + shift + p
(try typing theme)
Aside: making Chrome Dev Tools more comfortable
- Try declaring a simple array:
thing = [5, 10, 15];
Use console.table(thing) to see a manipulable interface:
console.table(thing);
You can do the same with an object, or an array of objects, or... mostly anything!
Okay, back to it.
Here's what you've just added to this page:
- jQuery: document traversal and manipulation. “What’s the parent element of the first table?"
- Underscore.js: Data manipulation and parsing. “What’s the total of the values in this array?"
- D3.js: Data-as-a-document and visualization. “Plot the min and max values as colors on the red-to-green scale"
Let's try manipulating our data table.
A sample excercise: Use jQuery to find a given cell and change its background color
$('td').eq(150).css('background-color', 'yellow')
"Which year had the highest total imports?"
jQuery: Show me the all the
elements in the last row:
$('tr:last-child td')
Okay! Underscore: what's the lowest value among their them?
lowVal = _.min(
$('tr:last-child td'), function(obj,iter){
return $(obj).text().replace(/,/g,'')
}
)
jQuery, highlight it:
$(lowVal).css('background-color','yellow')
Pre-made snippet for you:
Let's build on this. Let's highlight the import trends of each product from year to year.
Let's iterate (map) over each row (product) and:
- 1. figure out the min and max values
- 2. use D3 to paint the min values red, max values green, and scale the in-between accordingly
- 3. Iterate (map) over each cell in the row and paint its background according to where it sits in the d3 domain
$('tr').map(function(iter,row){
var max = _.max($(row).children('td').map(function(iter,cell){ return $(cell).text().replace(/,/g,'') }))
var min = _.min($(row).children('td').map(function(iter,cell){ return $(cell).text().replace(/,/g,'') }))
var color = d3.scale.linear()
.domain([min, max])
.range(["#c33", "#0fc"]);
$(this).children('td').map(function(iter,cell){
$(cell).css('background-color', color($(cell).text().replace(/,/g,'')))
})
})
Pre-made snippet for you:
Other countries to experiment with:
Some observations:
- 2009: obvious downturn across almost all industries
- Looking for a common exception? ctrl+f "Pharma" in almost every major country
- Canada/Mexico: Food up, household goods down.
- China: almost everything up
- Iraq: the year before the US unilaterally pulled out of the war, almost all industries had an uptick
- Haiti: noteable uptick in textiles (cotton, wool)
- Syria: downtick across the board
- Egypt: Fuel and Natural Gas dropped off the face of the earth
- Belize: Fruits are down, vegetables are up (Thanks, paleo diet!)
Other visualizations, other data types
Let's build a bar chart over columnar data
- Open up your Chrome dev tools console - CMD + ALT + i
- (or right-click and select "Inspect Element", then hit ESC)
- Paste in your import code from
tinyurl.com/acadia-import and hit enter
Let's highlight the Batting Average column:
$('#teams_standard_batting td:nth-child(19)').css('background-color','yellow')
Now, use our iterative method from before to turn that column into a heatmap:
var column = $('#teams_standard_batting td:nth-child(19)')
max = _.max(column.map(function(){ return parseFloat($(this).text()) }))
min = _.min(column.map(function(){ return parseFloat($(this).text()) }))
var color = d3.scale.linear()
.domain([min, max])
.range(["#c33", "#0fc"]);
column.map(function(iter,cell){
return $(cell).css('background-color', color(parseFloat($(cell).text())))
})
Pre-made snippet for you:
Let's make a bar chart!
- Since we already know the "max" value, we can assign that a max-width bar.
- Every other value will be its ratio of that max bar's width
var column = $('#teams_standard_batting td:nth-child(13)')
max = _.max(column.map(function(){ return parseFloat($(this).text()) }))
min = _.min(column.map(function(){ return parseFloat($(this).text()) }))
barwidth = 250
column.map(function(iter,cell){
$(cell).prepend('');
$(cell).children('.bar').css('width', parseFloat($(cell).text()) / max * barwidth )
var color = d3.scale.linear()
.domain([min, max])
.range(["#c33", "#0fc"]);
$(cell).children('.bar').css('background-color', color(parseFloat($(cell).text())))
})
$(column).css('text-align', 'left')
$('.bar').css('height', '10px').css('float', 'left')
Pre-made snippet for you:
7. Introduction to Network Science
Head over to ncase.me/crowds and follow along with the interactive
8. A practical introduction to Gephi
If you haven't already done so, head over to gephi.org and download a free copy
Gephi is a tool to visualize and explore network graph data.
-
Lots of things can be networks! Any list of things ("nodes") with a relationship to one another (for example, people who follow one another on Twitter; words adjacent to one another in a book)
-
Let's grab a list of the characters in Game of Thrones who share a scene tinyurl.com/acadia-got. Open the raw file for got-network.graphml, download it somewhere you can access it.
- Open Gephi and Open
got-network.graphml
- Options for Undirected, Directed, or Mixed graphs
- Number of Edges and Nodes listed
- Observe the 3 main views of Gephi: Overview, Data Laboratory, and Preview
- In Overview, let's open our Statistics panel and segment our nodes with Modularity
- Next, we'll change the appearance of our nodes and edges under the Appearance panel
- Finally, let's explore some of the built-in layouts with the Layouts panel; starting with Force Atlas 2, and Yifan Hu
|