Why life might be worth living, according to philosopher William James

If you ever find yourself in doubt of yourself and other things in your life, remember to remain cognizant in evaluating things. Questioning whether life is worth living is only a part of many larger questions that many people face at some point or another. Whether you find satisfying answers can be difficult, though. Turning to philosophy can provide answers, with some effort at least.

“Is Life Worth Living?” A bold title for the 1896 lecture of philosopher and psychologist William James. And what better way to begin such a than with an 1881 self-help book of a similar title. James himself had been through the existential dilemma. He would ask, was life worth living?

The short answer is that it depends on the liver. Satisfied? If not, there is a more elaborate response. Philosopher John Kaag’s new book Sick Souls, Healthy Minds: How William James Can Save Your Life explores the Father of American Psychology’s personal journey in figuring out if life is worth living.

James would wake up each day “with a horrible dread at the pit of [his] stomach,” contemplating suicide in his early 20s and wondering “how other people could live, how I myself had ever lived, so unconscious of that pit of insecurity beneath the surface of life.” Through an arduous journey of figuring out what made life meaningful and worth living, the philosopher ends up conceding to “our usual refined optimisms and intellectual and moral consolations” and live as though life were worth living.

After Kaag witnessed a suicide by jumping off of the William James Hall at Harvard University in 2014, the philosopher began questioning why it had happened. Sick Souls, Healthy Minds aims to remedy those actions by offering James as a friend in those trying times of misery. Kaag shares own difficult time at age 30 as he was researching William James at Harvard University while going through a divorce and dealing with the death of his alcoholic father. Like his previous book on Nietzsche, Kaag searches for practical wisdom by combining his autobiographical experience alongside the famous philosopher. I still found myself believing that, though Kaag himself went through a tremendous amount of stress, his own story still pales in comparison to James’ style and work.

James’ research in studying philosophy and psychology alongside one another, radical empiricism, pragmatism, “anti-intellectualism” (to be clarified later) and overall revolutionary role in the theory of emotion that still resounds to this day make his life and rumination on its meaning much more impactful. His own life, from going on a scientific journey through the Amazon, studying medicine, and pondering life’s purpose, especially in light of On the Origin of Species, published in 1859, lead him to think humans were merely animals in a deterministic world of cause-and-effect. Choice, like free will, was only an illusion. This lead to his diary entries in 1870, in which he assumed free will was no illusion, and, out of his own free will, he would believe in free will. He wrote he would, “accumulate grain on grain of willful [sic] choice like a very miser” through making habits. After reading French philosopher Charles Renouvier’s, he came to believe these thoughts and kept them close in everything he did.

James’ pragmatism, that truth is not statically there to be perceived or discovered but is, in many cases, what we create in the stride of living, we can jump across the abyss that Nietzsche warned about staring into by jumping across it. James would write about a type of “anti-intellectualism” against the idea that the minds have “a world complete in itself” and need simply to find this world while having no power to re-determine its already-given character. These gave the psychologist-philosopher a type of deterministic that James would use to describe a type of “rich and active commerce” between minds and reality.

When new ideas join older ones, they “marry” one another, James described. You can form beliefs as hypotheses, and their values depend on how they relate to you. This hypothesis of life makes life valuable.

But Kaag also warns the prideful dangers of pragmatism, even if his explanations are a bit indulgent. Kaag’s doubts crept up on him during his first wedding, but his mother suggested to continue with the wedding as planned. He realized he could determine the truth that his marriage would be a happy one, but he also couldn’t the same way he could. It seemed as though James’ free will wouldn’t have helped.

James’ other work reflects the groundbreaking discoveries in psychology and cognitive science while creating the Department of Experimental Psychology at Harvard. James believed emotions are “constituted by, and made up of, those bodily changes which we ordinarily call their expression or consequence.” Being sad is not the cause of crying, but is what it feels like to cry in this sort of “biofeedback” in which we figure out our own emotions. This means, according to James, that whistling a happy tune could prevent yourself from feeling sad. The psychologist-philosopher mocked the cognitivist idea that emotions could simply be states of mind which cause us to have visceral reactions. Without the fiery passion of anger within your heart or heavy weight of mourning at a funeral, an emotion would only be “feelingless cognition.”

If, as Nietzsche said, every great philosophy is “a confession on the part of its author and a kind of involuntary and unconscious memoir,” then the emphasis should be on “involuntary and unconscious.” Maybe, in philosophizing, the personal should let themselves feel what they feel.

Neurons that work together, explained

A theoretical physicist can sit at a computer with a pen and paper may not seem like a likely candidate to understanding how the brain works, but, according to physicists who study statistics and algebra, they can figure out revolutionary theories about how the nervous system works. When I met Princeton theoretical physicist William Bialek in 2013 during my undergraduate years at Indiana University-Bloomington, I asked him about the “magic” of physics and how scientist can capture abstract ways of thinking and apply them to how neurons in the brain work. Bialek’s book “Spikes: Exploring the Neural Code,” one of my inspirations to step into neuroscience research, and his work alongside other researchers in physics and mathematics can answer key questions in neuroscience.

Pairwise Interactions

Often in neuroscience we are confronted with a small sample measurement of a few neurons from a large population. Although many have assumed, few have actually asked: What are we missing here? What does recording a few neurons really tell you about the entire network? Correlations of neurons dominated large networks of neurons. Using Ising models from statistical physics, the researchers of Schneidman et al. 2006 looked at large networks and their ability to correct for errors in representing sensory data. They argue that correlations are due to pairwise, but not 3-wise interactions between neurons, although some might argue that closer inspection reveals otherwise. Pairwise interactions are how neurons forms pairs among themselves to act together. Their pairwise maximum entropy approach can capture the activity of RGB neurons effectively.

Using an elegant preparation retina on a micro electrode array (MEA) viewing defined scenes/stimuli, the researchers showed that statistical physics models that assume pairwise correlations, but disregard any higher order phenomena, perform very well in modeling the data. This indicates a certain redundancy exists in the neural code. The results are also replicated with cultured cortical neurons on a MEA. They noted a dominance of pairwise interactions. This would imply that learning rules depending on pairwise correlations could, on their own, create nearly optimal internal models describing how the retina computes codewords. The brain could, then, assess new events for their degree of surprise with reasonable accuracy. The central nervous system alone could learn the maximum entropy model from the data provided by the retina alone, but the conditionally independent model is not biologically realistic in this sense. Although the pairwise correlations are small and weak and the multi-neuron deviations from independence are large, the maximum entropy model consistent with the pairwise correlations captures almost all of the structure in the distribution of responses from the full population of neurons. The weak pairwise correlations imply strongly correlated states. 

If you modeled the cells independent from one another, they would form the Poisson distribution. The actual distribution is almost exponential, so this doesn’t fit well. For example, the probability of K = 10 neurons spiking together is ~105 x larger than expected in the independent model. For this model, the specific response patterns across the population of neurons show that the N-letter binaries (patterns of 0s and 1s) differ greatly from the experimental results. These discrepancies show the failure of independent coding. The difference between prediction and empirical observation is anti-correlated in clusters of spikes. 

Instead, a group of neurons comes to a decision through pairwise correlations. These rates are predicted with >10% accuracy. The rates scatter between predictions and observations is confined largely to rare events for which the measurement of rates is itself uncertain.

The Jensen–Shannon divergence measures similarity between two probability distributions. This metric can be used to measure mutual information of a random variable to an associated mixture distribution, as the researchers did. In previous work, the researchers had used the same principle to a joint distribution and the product of its two marginal distributions and measure how reliably you can decide if a given response comes from the joint distribution or the product distribution. 

The fractions of full network correlations in 10-cell groups the maximum entropy model of second order plotted as a function of the full network correlation, measured by the multi-information IN. The ratio is larger when IN itself is larger, so that the pairwise model is more effective in describing populations of cells with stronger correlations, and the ability of this model to capture ~90% of the multi-information holds independent of many details. 

The Maximum Entropy Method


Maximum entropy estimate: constructive criterion for setting up probability distributions, on the basis of partial knowledge.

The most general description of the population activity of n neurons, which uses all possible correlation functions among cells, can be written using the maximum entropy principle as shown in the equation above for a probability p̂, Lagrange multipliers hi, and Jij, Z as the normalization constant, and the other variables representing each individual event probability. This method also uses Laplace’s principle of insufficient reason, which states that two events are to be assigned equal probabilities if there is no reason to think otherwise, and Jayne’s principle of maximum entropy, the idea that distributions are determined so as to maximize the entropy (as a measure of uncertainty) in a way consistent with given measurements.

For N neurons, the maximum entropy distributions with Kth-order correlations (K=1, 2, …N) can account for the interactions. Entropy difference (multi-information)  IN = S1 – SN measures the total amount of correlation in the network, independent of whether it arises from pairwise, triplet or more-complex correlations. They found this across organisms, network sizes, appropriate bin sizes, Each entropy value SK decreases monotonically toward the true entropy S : S1 ≥ S2 ≥,… ≥ SN. The contribution of the Kth-order correlation is I(K) = SK-1 – SK and is always positive. More correlation always decreases entropy. 

In a physical system, the maximum entropy distribution is the Boltzmann distribution, and the behavior of the system depends on the temperature, T. For the network of neurons, there is no real temperature, but the statistical mechanics of the Ising model predicts that when all pairs of elements interact, increasing the number of elements while fixing the typical strength of interactions is equivalent to lowering the temperature, T, in a physical system of fixed size, N. This mapping predicts that correlations will be even more important in larger groups of neurons.

The active neurons are those that send an action potential down the axon in any given time window, and the inactive ones are those that do not. Because the neural activity at any one time is modelled by independent bits, Hopfield suggested that a dynamical Ising model would provide a first approximation to a neural network which is capable of learning.

The researchers looked for maximum entropy distribution consistent with experimental findings. Ising models with pairwise interactions are the least structured, or maximum-entropy, probability distributions that exactly reproduce measured pairwise correlations between spins. Schneidman and the researchers used such models to describe the correlated spiking activity of populations of neurons in the salamander retina subjected to naturalistic stimuli. They showed that for groups of N≈10 neurons (which can be fully sampled during a typical experiment) these models with O(N2) tunable parameters provide a good description of the full distribution over 2N possible states. 

They found the maximum entropy model of second order captures over 95% of the multi-information in experiments on cultured networks of cortical neurons. There would be implications for learning rules could be enough to generate nearly optimal internal models for the distribution of “codewords” in the retinal vocabulary and let the brain accurately evaluate new events for their degree of surprise.

Accounting for Cell Bias

The researchers noted they needed to account for the pairwise interactions and cell bias values. Interactions have different signs, the researchers showed that frustration would prevent the system from freezing into a single state in about 40% of all triplets. With enough minimum energy patterns, the system has a representational capacity, and the network can identify the whole pattern uniquely just as Hopfield models of associative memory do. The system would have a holographic or error-correcting property, so that an observer who has access only to a fraction of the neurons would nonetheless be able to reconstruct the activity of the whole population.

The pairwise correlation model also uncovers subtle biases in decision making. It will tell you about how they influence each other, on average. Pairwise maximum entropy models reveal that the code relies on strongly correlated network states and shows distributed error-correcting structure.

To figure out if the pairwise correlations are an effective description of the system, you need to determine if the reduction in entropy from the correlations captures all or most of the multi-information IN. The researchers conclude that, even if the pairwise correlations are small and the multi-neuron deviations from independence are large, the maximum entropy model consistent with the pairwise correlations captures almost all of the structure in the distribution of responses from the full population of neurons. This means the weak pairwise correlations imply strongly correlated states. 

Other Effects

Intrinsic bias dominates small groups of cells, but, in large groups, almost all of the ~N2 pairs of cells are significantly interacting (top). This shifts the balance so that the typical values of the intrinsic bias are reduced while the effective field contributed by other cells increases (bottom). In the Ising model, if all pairs of cells interact significantly with one another, you can limit the typical size of interactions by showing how Jij changes with increasing N. There were no signs of significant changes in J with growing N with the values the researchers tested. 

Extrapolation

For weak correlations, you can solve the Ising model in perturbation theory to show that the multi-information IN is the sum of mutual information terms between all pairs of cells, and IN ~ N(N – 1) (left). This is in agreement with the empirically estimated IN up to N = 15, the largest value for which direct sampling of the data provides a good estimate. Monte Carlo simulations of the maximum entropy models suggest that this agreement extends up to the full population of N = 40 neurons in their experiment (G. Tkačik, E.S., R.S., M.J.B. and W.B., unpublished data). The potential for extrapolation to larger networks of neurons can be shown through the error-correction that comes about (right). The error-correction emerges when figuring out how N-cell activity can predict (N+1)-cell activity. Uncertainty decreases by the number of cells. In a 40-cell population, three cells with spiking probability have an near-perfect linear encoding of the number of spikes generated by other cells in the network. Through these methods of becoming more and more accurate and robust, they showed findings that are similar to how single pyramidal cell spiking correlates with more collective responses. 

Challenges to the Model

The case of two correlated neurons has proven to be particularly challenging, because the Fokker–Planck equations are analytically tractable only in the linear regime of correlation strengths (r ≈ 0) and only for a limited set of current correlation functions. Some analytical results for the spike cross-correlation function have been obtained using advanced approximation techniques for the probability density and expressed as an infinite sum of implicit functions (Moreno-Bote and Parga, 2004, 2006). Similarly, the correlation coefficient of two weakly correlated leaky-integrate-and-fire neurons has been obtained for identical neurons in the limit of large time bins. 

Correlations between neurons can occur at various timescales. It’s possible, by integrating the cross-correlation function (xcorr in Matlab, correlate in numpy) between two neurons, to read off the timescale of the correlation (Bair, Zohary and Newsome 2001). This can help to distinguish correlations due to monosynaptic or disynaptic connections, which are visible at short timescales, with correlations due to slow drift in oscillations, up-down states, attention, etc., which occur at much longer timescales. Correlations depend on physical distance on the cortical map as well as tuning distance between two neurons (Smith and Kohn, 2008).

Decoding techniques of Ising model can be applied to simulated neural ensemble responses from a mouse visual cortex model with an improvement in decoder performance for a model with heterogeneous as opposed to homogeneous neural tuning and response properties. Their results demonstrate the practicality of using the Ising model to read out, or decode, spatial patterns of activity comprised of many hundreds of neurons (Schaub et al. 2011).

Discussion

The research seems to reflect general trends of “the whole is greater than the sum of its parts” or even “less is more,” both concepts in science and philosophy that date back centuries. I even emailed Elad Schneidman a few days ago about this, and he responded, “I think that this idea must predate the ancient greeks ;-)”.

Their work used the application of the maximum entropy formalism of Schneidman et al. 2003, to ganglion cells. The same way a group of neurons behaves differently than the sum (or combination) of each independent neuron gives the research leverage and potential for these systems-like problems of neurocomputation and emergent phenomena. 

The work in deriving an Ising model (or using a maximum entropy method) from statistical mechanics shows the importance of a priori proof work in using equations and theories to deduce “what follows from what.” It’s a great example of using the principles and methods of abstraction that mathematicians and physicists use in solving problems in biology and neuroscience. In my own writing, I’ve described this sort of attention to abstract models and ideas as relevant to biology in a previous blogpost.

In this paper, the researchers very well theorized which shortcomings and limitations their model would have and addressed them appropriately by fitting their model to experimental work. As a result, their research testifies to the power of computational and theoretical research in both describing and explaining empirical phenomena. 

Recreating the Results

With the MaxEnt Toolbox, I used MATLAB to recreate the results, which can be found here: https://github.com/HussainAther/neuroscience/tree/master/maxent/schneidman.

Related Research

In that same year, Tkačik and other researchers would use the same recordings and use Monte-Carlo-based methods to construct the appropriate Ising model for the complete 40-neuron dataset. They showed that pairwise interactions still account for the observed higher-order correlations and argue why the effects of three-body interactions should be suppressed. 

They examined the thermodynamic properties of Ising models of various sizes derived from the data to suggest a statistical ensemble from which the observed networks could have been drawn and, consequently, to create synthetic networks of 120 neurons. They found that with increasing size the networks operate closer to a critical point and start exhibiting collective behaviors reminiscent of spin glasses. They examined more closely the appearance of multiple single-spin-flip stable states.

The method of using a maximum entropy model is equivalent to the method of Roudi et al. 2009, where they described a method of normalizing the the Kullback–Leibler divergence DKL(P, P˜) (for P˜ approximation to distribution P, with the distance from the independent maximum entropy fit. The quality of the pairwise model comes from normalizing this by the corresponding distance of the distribution P from an independent maximum entropy fit DKL(P, P1), where P1 is the highest entropy distribution consistent with the mean firing rates of the cells (equivalently, the product of single-cell marginal firing probabilities): Δ = 1 – DKL(P, P˜)/DKL(P, P1) where Δ = 1 means the pairwise model perfectly fits the additional information left out by the independent model, and Δ = 0 means the pairwise model doesn’t improve at all compared to the independent model. 

In 2014, Tkačik and the researchers from Schneidman et al. 2006 published “Searching for Collective Behavior in a Large Network of Sensory Neurons” with K-pairwise models, more specialized variations of the pairwise models to estimate entropy, classify activity patterns, show that the neural codeword ensembles are extremely inhomogeneous, and demonstrate that the state of individual neurons is highly predictable from the rest of the population, which would allow for error correction. 

Barreiro et al. 2014 found that, over a broad range of stimuli, output spiking patterns are surprisingly well-captured by the pairwise model. They studied an analytically tractable simplification of the retinal ganglion cell mode, and found that in the simplified model, bimodal input signals produce larger deviations from pairwise predictions than unimodal inputs. The characteristic light filtering properties of the upstream retinal ganglion cell circuitry would suppress bimodality in light stimuli, thus removing a powerful source of higher-order interactions. The researchers said this gave a novel explanation for the surprising empirical success of pairwise models.

Ostojic et al. 2009 studied how functional interactions would depend on biophysical parameters and network activity that variations in the background noise changed the amplitude of the cross-correlation function as strongly as variations of synaptic strength. They found that the postsynaptic neuron spiking regularity has a pronounced influence on cross-correlation function amplitude. This suggests an efficient and flexible mechanism for modulating functional interactions.

In 1995, Mainen & Sejnowski showed that single neurons have very reliable responses to current injections. Nevertheless, cortical neurons seem to have Poisson or supra-Poisson variability. It’s possible to find a bound on decodability using the Fisher information matrix (Sompolinsky & Seung 1993). Under the assumption of independent Poisson variability, it is possible to derive a simple scheme for ML decoding that can be implemented in neuronal populations (Jazayeri & Movshon 2006).

The accumulation of noise sources and various other mechanisms cause cortical neuronal populations to be correlated. This poses challenges for decoding. You can get a little more juice out of decoding algorithms by considering pairwise correlations (Pillow et al. 2008).

References

Bair, W. “Correlated firing in macaque visual area MT: time scales and relationship to behavior.” (2001). Journal of Neuroscience. 

Barreiro, et al. “When do microcircuits produce beyond-pairwise correlations?” (2014). Frontiers. 

Bialek, William and Rangnathan, Ramek. “Rediscovering the power of pairwise interactions.” (2018). Arxiv. 

Hopfield, J.J. “Neural networks and physical systems with emergent collective computational abilities.” (1982). Proc. Natl Acad. Sci. USA 79, 2554–-2558. 

Jazayeri, M, Movshon, A. “Optimal representation of sensory information by neural populations.” (2006). Nature Neuroscience. 

Mainen, ZF, Sejnowski, TJ. “Reliability of spike timing in neocortical neurons.” (1955). Science

Moreno-Bote, R., and Parga, N. “Role of synaptic filtering on the firing response of simple model neurons.” (2004). Phys. Rev. Lett. 92, 028102.

Moreno-Bote, R., and Parga, N. “Auto- and crosscorrelograms for the spike response of leaky integrate-and-fire neurons with slow synapses.” (2006). Phys. Rev. Lett. 96, 028101.

Ostojic, et al. “How Connectivity, Background Activity, and Synaptic Properties Shape the Cross-Correlation between Spike Trains.” (2009). The Journal of Neuroscience. 

Pillow, Jonathan, et al. “Spatio-temporal correlations and visual signalling in a complete neuronal population.” (2008). Nature

Roudi, et al. “Pairwise Maximum Entropy Models for Studying Large Biological Systems: When They Can Work and When They Can’t.” (2009). PLoS Computational Biology. 

Schaub, Michael and Schultz, Simon. “The Ising decoder: reading out the activity of large neural ensembles.” (2011). Journal of Computational Neuroscience

Schneidman et al. “Network Information and Connected Correlations.” (2003). Physical Review Letters. 

Schneidman et al. “Weak pairwise correlations imply strongly correlated network states in a neural population.” (2006). Nature

Seung, HS, Sompolinsky, H. “Simple models for reading neuronal population codes.” (1993). PNAS

Shlens, Jonathan, et al. “The structure of multi-neuron firing patterns in primate retina” (2006). The Journal of Neuroscience 26.32: 8254-8266.

Smith, Matthew, and Kohn, Adam. “Spatial and Temporal Scales of Neuronal Correlation in Primary Visual Cortex.” (2008). Journal of Neuroscience. 

Tkačik, Gašper et al.  “Ising models for networks of real neurons.” (2006). arXiv.org:q-bio.NC/0611072. 

Tkačik, Gašper et al. “Searching for Collective Behavior in a Large Network of Sensory Neurons.” (2014). PLoS Comput Biol. 

The Journalist’s Guide to Statistics

There are three kinds of lies: lies, damned lies, and statistics.

Mark Twain, “My Autobiography”

Journalists need a good understanding of numbers. Tapping into the power of data would let them create more meaningful and effective stories. But making sense of numbers can be difficult. Reporting on data is often not as straightforward or manageable as other types of journalism. Writers need to separate signal from noise.

What’s more, researchers and writers need to know the context of data to draw appropriate conclusions. You can know everything about how candidates in an election fare against one another through polls and surveys, but, until you know the causes behind why people would vote that way, you can’t say much about those statistics. I’ve written more here on the nature of causation in the context of scientific research. This guide provides a logical, reader-friendly approach to writers wanting to harness the power of statistics.

Table of contents:

  1. Know the numbers
  2. Study the source
  3. Remember the reader
  4. Present the product

1. Know the numbers

Too often, writers throw around numbers not knowing what they mean. Here is a run-down of statistics terms you should know as a journalist:

  • Bayesian statistics
    • If it rains, how does that affect which football team will win? This branch of statistics lets you figure out how likely something may occur based on how it depends on other factors. This lets you account for factors like false positives (when an experiment detects something that doesn’t exist) such as medical screening flagging false cues as cancer. With Bayesian models, you can account for different sources of information in putting together these conditional probabilities.
    • Using Bayesian statistics to predict how likely future events are is “Bayesian inference.”
  • Beta distribution
    • Using a pre-defined distribution, you can determine how well a baseball player will do at the beginning of a season even when you haven’t collected much data so far. Using her batting average of .270, you can create beta distribution (shown above with α = .81 and β = .219. The average is .270 and the standard deviation is σ2 = .115.
    • If you don’t know the exact probability something occurs, you can figure out how which probability is most likely by selecting it from a beta distribution of probabilities. You can use α and β to calculate the mean μ and standard deviation σ with:
    • You’ll also find binomial distributions which use the same probability for all trials instead of letting it change.
  • Chi-square test (χ2 test)
      Suppose you wanted to find the relationship between being HIV positive and sexual preference. You survey 30 males and find the following data (in a contingency table):
      Sexual preference
      MaleFemaleBothTotal
      HIV+4239
      Not HIV+316221
      7185
      Then, you can multiply the raw numbers and divide by the total to calculate how likely it is HIV+ determines sexual preference. This gives you expected values, different from the observed ones as shown below:
      Sexual preference
      MalesFemalesBothTotal
      HIV+
      Observed (O)4239
      Expected (E)(9*7)/30  = 2.1(9*18)/30 = 5.4(9*5)/30=1.5
      (O-E)1.9-3.4-1.5
      (O-E)^23.6111.562.25
      Not HIV+
      Observed (O)316221
      Expected (E)(21*7)/30 = 4.9(21*18)/30 = 12.6(21*5)/30 = 3.5
      (O-E)-1.93.4-1.5
      (O-E)^23.6111.562.25
      30
    • If you have an expectation or prediction of what your results should look like, the chi-square test compares them to what you actually observe to tell you how well your predictions match what happens. This example is borrowed from David Stockburger at Missouri State.
    • Researchers calculate this by finding the difference between observed and expected values using the formula χ2 = (observed − expected)2/expected.
    • Sometimes you’ll see the difference between observed and expected values referred to as the “residual.”
  • Confounding variable
    • If you want to test if texting leads to an increase in crashes, you would want to make sure that text messages, not weather or traffic, cause the crashes. These extra variables the study doesn’t account for are confounding variables.
  • Controlled experiment
    • If you give a drug to students to observe how it affects sleep, you should compare this group (the treatment group) to a controlled group, a set of students under the same conditions, but without the drug. This makes sure you can determine that it was the drug causing differences in sleep and not some other variable.
  • Correlation
    • This tells you how well two variables are related to one another. Two stocks that change in similar ways to one another over time may be correlated.
  • Fisher’s exact test
      CuredNot CuredTotal
      Drug A4258100
      Drug B1486100
      Total56144
    • Similar to the chi-square test, this test compares whether an outcome occurs using a contingency table (shown above). There’s no formal calculation, but it can give you an idea of the probability an effect occurs.
  • Histogram
    • This tells you how data is distributed with the normal distribution shown above. The height of each bar shows you how many data points in the bin along the x-axis or how likely it is to fall in that bin. For probabilities, the area of the bins should equal 100 percent.
  • Margin of error
    • When you make a measurement, the margin of error (sometimes called “uncertainty”) tells you how much that measurement can change due to other factors. You’ll typically find this in a range of a confidence interval, such as “40 percent +/- 1 percent.”
    • If you’re polling a sample of people, the margin of error can tell you how close the sample is as representative of the entire population.
    • Writer Robert Niles defines this as “1 divided by the square root of the number of people in the sample.”
    • You can further break down error into bias and systematic error:
      • The same way standing on a weighing scale while wearing clothes makes you heavier, a bias creates an error based on how you measure something.
      • If, instead, the weighing scale itself isn’t calibrated properly, there’s a systematic error. This affects all results due to the nature of your measuring equipment itself.
  • Mean
    • This is the average of a set of data points, generally written using μ. When dealing with statistics, keep your language precise to communicate the most effective message possible. If the average life expectancy in the U.S. is 79 years, know the standard deviation and sample size. You may not need to report those factors, but they’ll help you put your averages in context.
    • When journalists write about the “average citizen” or the “average voter,” in most cases, they’re not referring to the strict mathematical definition of an average (the sum of each data point divided by the number of data points). Rather, journalists tend to refer to the “average” as a common, representative individual in a population. Keep in mind the statistical average only represents this “average individual” based on how the standard deviation and sample size.
  • Median
    • If you listed your data points from highest to lowest, the value in the middle is the median. Because this doesn’t depend on how far spread out or varied the data points are, the median is, more or less, the “middle.” It doesn’t matter that the richest person in America makes four times as much as the middle-class. What matters is whether it’s greater or lesser than those in the middle.
    • In some cases, the median can give you a more accurate idea of the “average” person in a population when reporting. Make sure you understand where the median falls in the space between the highest and lowest data point. That can tell you more about how the numbers are distributed.
    • Paleontologist Stephen Jay Gould quoted Twain’s “damned lies” quote to argue that using the eight-month median survival time for peritoneal mesothelioma was misleading. Many people, like Gould who lived for two more decades, would live for years and take an optimistic, positive view of statistics in general.
  • Mode
    • The mode is the number occurring most often. This simple and clean measurement can tell you who’s the most popular candidate in an election. You won’t see this much, but it’s helpful for comparing raw numbers against one another like sales figures.
  • Multiplication rule
    • If there’s a 1/2 chance you’ll draw a red card from a deck and a 1/13 chance the card is a King, then there chance to draw aa red King is 1/2*1/13= 1/26. This holds for independent events.
    • Keep track of how one event may affect the other. If you draw red card from a deck (with a 1/2 probability), the chance the next card is red is now 25/51 because you have one less red card in the deck.
  • Normal (or standard or Gaussian) distribution
    • Imagine taking a set of heights in a population and graphing the heights on the x-axis with how many times they occur on the y-axis. If the data is “normally” distributed, then most people should fall around an average height with fewer and fewer heights farther away from this average as shown in the graph above. In the normal distribution, you can define this distribution using the mean and standard deviation with  σ as the standard deviation and μ as the mean.
    • The normal distribution centers on the average and, with a greater standard deviation, it becomes more spread out in both directions. You most likely won’t report the normal distribution explicitly in a news story.
    • The standard deviation lets you compare the mean to the distribution. About 70 percent of people are one standard deviation from the mean (in either direction), 95 percent are two standard deviations away and almost everyone, three standard deviations away. The Z-score also tells you how far away a data point is from the mean.
    • If you wanted to test if a new psychiatric drug changed the frequency of mood swings, you might measure the number of mood swings in a population with the drug and a population without. If you found that the means of the two distributions are separated by a certain number of standard deviations, you can convert that to a p-value. The smaller the p-value, the more likely it is that the drug itself, not some random variable. This gives you a probability that the drug works.
  • Null hypothesis (H0)
    • To figure out if smoking truly causes cancer, scientists look for ways to show that “smoking doesn’t cause cancer” is false. This is a null hypothesis (H0), usually used to show that there is no effect or no relationship between what you want to show. In the words of scientists, they look for ways to “reject the null hypothesis.”
    • When creating a standard distribution, the p-value tells you how likely it is to reject the null hypothesis.
  • Quartile
    • Split the data into four equally sized groups. The lowest quarter is the lower quartile, the highest quarter is the upper quartile and everything in between is the middle quartile. The range of the middle quartile is the interquartile range.
  • Range
    • This is the highest value minus the smallest. Note that the range is a single number, not a range of numbers.
  • Regression
    • Regression tests what causes something to happen. If smoking really does cause an increase in cancer, then you should see it if you make a graph of cancer prevalence vs. smoking like the graph above, usually with a line of best fit (shown in red). Given enough linear regressions, you can separate a scientific observation explain the relationship between into the variables that cause it.
    • Keep in mind correlation does not imply causation. If you find that video game sales rise around similar times when violent crimes occur, you still need to show that one caused the other before drawing conclusions between the two. Otherwise things may be a coincidence or just a matter of randomness.
    • You’ll see an R value (how well one variable explains the other) or an R2 value (how well the model fits the data). The ANOVA (Analysis of Variance) creates an R2 and whether the result is “statistically significant.”
  • Standard deviation
    • The standard deviation is how widely values are spread apart or how much the data varies. This, along with mean, defines a normal distribution.
    • You can calculate the standard deviation of the population the formula above with x̄ as the average of data points x over n number of data points with Σ the sum of each value (xi – x̄)2. If you want the standard deviation of a specific sample, use n-1 instead of n in the denominator because you only know the mean of that sample, not the population.
    • The standard deviation squared gives you the variance. Sometimes researchers use “deviation” and “variance” interchangeably so keep in mind the difference.
  • Stochastic models
    • These are ways to predict future data like financial portfolios or weather forecasts that depend on randomness. Using distributions like the normal or beta distributions, you can simulate what future data will look like and form predictions.
  • Variable
    • Variables are anything that differs from person to person or sample to sample.
    • Categorical variables are ways of labeling people into groups (like biological sex or state of residence), continuous ones lie on a scale (like age or temperature), qualitative ones use adjectives (like colors) and random variables are what scientists measure as outcomes of experiments (like flipping a coin).
  • 2. Study the source

    The Society for Professional Journalists dictates you should remain accountable and transparent, seek the truth and report it and act independently. In the context of numbers, this means remaining open and honest in data analysis, scrutinizing findings and mathematical methods and doing so free from anything that my interfere with an investigation. After you know the definitions of statistics, you need to know where those numbers came from. This means not only knowing how data was collected, but appealing to statistics in a way that reflects the current principles of journalism.

    Writer William Davies argued the authority of statistics and the researchers who study them is declining. In a post-statistical society, journalists need to remain objective and skeptical to statistics while still appreciating them for what they are. It won’t be a battle between elite facts and populist feelings, but, rather, public rhetoric and the forces against it.

    Remember to keep numbers in context of their original source or how they were measured. If someone asks where you got your information from or how a number was calculated, you should have an appropriate answer. If you’re reporting a p-value for biomedical study, which variables were measured? How does the standard deviation affect the certainty of the results? Make sure that, for whatever claim or argument a scientist has put forward in a study, you can be responsible for however you report on it.

    As you become more statistically literate, you’ll naturally reevaluate how you reason. Becoming aware of common fallacies and pitfalls journalists fall into can make you more prepared to present accurate scientific findings. Be careful when you read a study suggesting that, because people are losing jobs, the economy must be doing poorly or that, if a study found no evidence on the link between fossil fuels and climate change that you conclude there’s evidence of absence. You can begin to see through the arguments that the majority of people saying something is true makes it true and, instead, take a more empirical approach to forming an opinion.

    Much more sinister are those who prey on individuals without a strong statistical or mathematical literacy. Showing that the cost of attending college is a smaller percent of the national debt now than it was in the 1960s doesn’t show that today’s college students pay less for their education. As you study the context and nuances of scientific findings, you’ll become better prepared to separate signal from noise in these situations.

    If there’s a 20 percent chance of rain, does that mean it will rain 20 percent of the time? If a medical procedure has a false positive rate of 1 out of 10 trials, how does that change its effectiveness? It’s easy to appeal to the authority of statistics and science without investigating for yourself. Check what experiments were performed or the historical use of tests like the Fisher’s exact test.

    This way, you’re acting as both a writer and a researcher. The key here is to avoid resorting to phrases like “studies show” or “survey says,” and, instead, ask yourself if you really know what the scientific studies purport. Many times scientists will refer to terms like “standard deviation” or “variance” interchangeably so make sure you know what’s being reported.

    3. Remember the reader

    Now that you have a deep understanding of what you’re reporting and what it means, you need to put it in a context that a general audience can understand.

    If you ask a drunkard what number is larger, 2/3 or 3/5, he won’t be able to tell you. But if you rephrase the question: what is better, 2 bottles of vodka for 3 people or 3 bottles of vodka for 5 people, he will tell you right away: 2 bottles for 3 people, of course.

    Edward Frenkel, “Love and Math: The Heart of Hidden Reality”

    In the quote above, how does drunkard arrive at the correct answer? The statistics are presented differently. In the rephrased question, he has a more “tangible,” usable way of understanding how the proportions of vodka would be arise from the distribution among people.

    How well do you understand what you write? Try answering this question to find out.

    Imagine you conduct a breast cancer screening using mammography in a certain region. You know the following information about the women in this region: The probability that a woman has breast cancer is 1 percent (known as “prevalence”). If a woman has breast cancer, the probability that she tests positive is 90 percent (“sensitivity”). If a woman does not have breast cancer, the probability that she nevertheless tests positive is 9 percent (false-positive rate). A woman tests positive. She wants to know from you whether that means that she has breast cancer for sure, or what the chances are. What is the best answer?

      A. The probability that she has breast cancer is about 81 percent.
      B. Out of 10 women with a positive mammogram, about 9 have breast cancer.
      C. Out of 10 women with a positive mammogram, about 1 has breast cancer.
      D. The probability that she has breast cancer is about 1 percent.

    When German psychologist Gerd Gigerezner posed the question to about 1000 gynecologists, about 21 percent chose the correct answer, C. While that is a little worse than random guessing, I must admit that, on my first attempt, I failed to answer this question correctly, as well. Through his research, Gigerezner has crafted a theory of understanding statistics that would help us in situations like this.

    Similar to Frenkel’s example of the fractions of vodka, psychologists like Daniel Kahneman and Gerd Gigerezner have shown that asking statistics questions in different ways can influence the ways we understand them. For example, when the information preceding the question is framed differently (as shown below), 87 percent of gynecologists answered correctly.

    Assume you conduct breast cancer screening using mammography in a certain region. You know the following information about the women in this region:

    • Ten out of every 1,000 women have breast cancer
    • Of these 10 women with breast cancer, 9 test positive
    • Of the 990 women without cancer, about 89 nevertheless test positive

    In both examples (of breast cancer screening and of bottles of vodka), when we change from “conditional probabilities” to “natural frequencies,” we suddenly understand statistics much better. Like Gigerezner, I believe we can teach the appropriate way to interpret statistics, and, with the effect it has on our health and society, we have a moral imperative to do so.

    You can use a confusion matrix like the one above to keep track of the accuracy metrics of an experiment when presenting information to colleagues.

    This isn’t a simple case of deliberately communicating false information or lying about the statistics we use. While there may be agendas and conflicts-of-interests between professionals (including scientists), we simply don’t understand how to interpret statistics. And, in the field of medicine, this can have disastrous results. We make poor decisions about how long a patient may live, how prevalence of cancer among smokers, and understanding the harms and benefits of screening for breast cancer.

    4. Present the product

    Many ways of visualizing, illustrating or explaining statistics exist no matter the medium. Looking across FiveThirtyEight, The Guardian‘s Data section or other data journalism publications, you can find effective ways of communicating complicated concepts either to the audience of your publication or to colleagues. Use figures and graphs to explain take-home messages and conclusions from your reporting. Make sure they’re easy to read and follow.

    Python and R offer ways of visualizing statistical findings with R providing much more extensive libraries for statistics than Python. My work in creating interactive network graphs, word clouds and even periodic tables show some examples. To produce a confusion matrix like the one shown below, you can use this code.

    Compare this confusion matrix to the Null hypothesis table above. Though it might be too complicated for someone reading a newspaper, you can use it to present findings to other researchers.

    It’s a good idea to value openness and transparency with your code and work in creating visualizations. This gives other researchers and writers ways to check and re-examine what you’ve done. The chart below shows how much the University of California Santa Cruz Science Communication class of 2020 used Slack during their fall quarter (with its code here). Interactive graphs give the reader a better sense of data and let you communicate more information as effective as possible.

    Make sure to perform statistical tests to confirm results from research when you report. In the movie “Rosencrantz and Guildenstern Are Dead,” the two protagonists flip a coin heads 92 times in a row. The chances this may happen is about 1 in 5 octillion. In a more realistic setting, the Dallas Cowboys have won 6 out of 8 coin tosses in the history of Super Bowls. In R, you can use a binomial distribution to return the value 0.109375.

    probability <- .5 # Set the odds of getting heads to .5. 
    wins <- 6 # number of winning coin flips
    totalFlips <- 8 # total coin flips
    dbinom(wins, totalFlips, probability)
    > 0.109375

    In the code, the comments are written with a # in front of them explaining what each line does. These comments are notes programmers write to explain things without affecting the code.

    With enough coin tosses you can make a graph of how these probabilities are based on the number of heads and flips. When you only have two outcomes (heads or tails), it’s a Bernoulli distribution.

    How likely is it the coin is fair? (Code found here.)

    Not all visuals are created equal. Statisticians William Cleveland and Robert McGill found that people can tell differences between length and angles much more easily than shapes and colors. This means, where appropriate, you should use charts and plots that rely on lines slopes when possible and avoid pie charts.

    No matter the code or plot you make, taking an independent investigative approach to statistics can let you harness the power of data in your stories. Becoming more savvy with numbers and calculations can let you present more accurate, verified findings. Though you can’t just drop statistics without context or understanding of how they came about, newsrooms and other workplaces for publications can use a more empirical approach in their findings in presenting scientific research for what it is. Whether its journalists themselves or a hired analyst creating statistical models of disease prevalence, they should adhere to the established standards of journalism.

    Life expectancy: visualized. (Code found here.)

    Journalism emphasizes quick, easy-to-understand conclusions and messages. While some projects can require more complicated work flows such as Bayesian models, bootstrapping or exploratory data analysis, sometimes all that matters is whether an experiment worked or didn’t. In many cases, you simply don’t have the time or capacity to explain what a p-value or regression test is. Still, becoming statistically literate and understanding the mathematics behind calculations involved in research can make you all the more prepared in presenting stories. Being able to tell the difference between causation and correlation can save you from drawing false conclusions and make your arguments more justified on the basis of statistics. It can give you the power to check the work of others and move journalism into a domains of peer-reviewed, egalitarian work. In writing this guide, I hope to do so as well.

    The Emergent Beauty of Mathematics

    The first 30 seconds of a Brownian tree (code can be found here).

    Like a flower in full bloom, nature reveals its patterns shaped by mathematics. Or as particles collide with one another, they create snowflake-like fractals through emergence. How do fish swarm in schools or consciousness come about from the brain? Simulations can provide answers.

    Through code, you can simulate how living cells or physical particles would interact with one another. Using equations that govern the behavior of how cells act when they meet one another or how they would grow and evolve into larger structures. In the gif above, you can use diffusion-limited aggregation to create Brownian trees. These are the structures that emerge when particles move randomly with respect to one another. Particles in fluid (like dropping color dye into water) take these patterns when you look at them under a microscope. As the particles collide and form trees, they create shapes and patterns like water crystals on glass. These visuals can give you a way of appreciating how beautiful mathematics is. The way mathematical theory can borrow from nature and how biological communities of living organisms themselves depend on physical laws shows how such an interdisciplinary approach provides a way to bridge different disciplines.

    After about 20 minutes, the branches of the Brownian tree take form.

    In the code, the particles are set to move with random velocities in two dimensions and, if they collide with the tree (a central particle at the beginning), they form parts of the tree. As the tree grows bigger over time, it takes the shapes of branches the same way neurons in the brain form trees that send signals between one another. These fractals, in their uniqueness, give them a kind of mathematical beauty.

    Conway’s game of life represents another way something emerges from randomness.

    Flashing lights coming and going away like stars shining in the sky are more than just randomness. These models of cellular interactions are known as cellular automaton. The gif above shows an example of Conway’s game of life, a simulation of how living cells interact with one another.

    These cells “live” and “die” according to four simple rules: (1) live cells with fewer than two live neighbors die, as if by underpopulation, (2) live cells with two or three live neighbors live on to the next generation, (3) live cells with more than three live neighbors die, as if by overpopulation and (4) dead cells with exactly three live neighbors become live cells, as if by reproduction.

    Conus textile shows a similar cellular automaton pattern on its shell.

    Through these rules, specific shapes emerge such as “gliders” or “knightships” you can further describe with rules and equations. You’ll find natural versions of cells obeying rules like the colorful patterns on a seashell. Complex structures emerging from more basic, fundamental sets of rules unite these observations. While the beauty of these structures becomes more and more apparent from the patterns between different disciplines, searching for these patterns in other contexts can be more difficult such as human behavior.

    Recent writing like Genesis: The Deep Origin of Societies by biologist E.O. Wilson take on the debate over how altruism in humans evolved. While the shape of snowflakes can emerge from the interactions between water molecules, humans getting along with one another seems far more complicated and higher-level. Though you can find similar cooperation in ants and termites creating societies, how did science let this happen?

    Biologists have answered that organisms choose to mate with individuals and increase the survival chances of themselves and their offspring while passing on their genes. Though they’ve argued this for decades, Wilson offers a contrary point of view. In groups, selfish organisms defeat altruistic ones, but altruistic groups beat selfish groups overall. This group selection drives the emergence of altruism. Through these arguments, both sides have appealed to the mathematics of nature, showing its growing importance in recognizing the patterns of life.

    Wilson clarifies that data analysis and mathematical modeling should come second to the biology itself. Becoming experts on organisms themselves should be a priority. Regardless of what form it takes, the beauty is still there, even if it’s below the surface.

    How to Create Interactive Network Graphs (from Twitter or elsewhere) with Python

    In this post, a gentle introduction to different Python packages will let you create network graphs users can interact with. Taking a few steps into graph theory, you can apply these methods to anything from links between the severity of terrorist attacks or the prices of taxi cabs. In this tutorial, you can use information from Twitter to make graphs anyone can appreciate.

    The code for steps 1 and 2 can be found on GitHub here, and the code for the rest, here.

    Table of contents:

    1. Get Started
    2. Extract Tweets and Followers
    3. Process the Data
    4. Create the Graph
    5. Evaluate the Graph
    6. Plot the Map

    1. Get Started

    Make sure you’re familiar with using a command line interface such as Terminal and you can download the necessary Python packages (chart-studio, matplotlib, networkx, pandas, plotly and python-twitter). You can use Anaconda to download them. This tutorial will introduce parts of the script you can run from the command line to extract tweets and visualize them.

    If you don’t have a Twitter developer account, you’ll need to login here and get one. Then create an app and find your keys and secret codes for the consumer and access tokens. This lets you extract information from Twitter.

    2. Extract Tweets and Followers

    To extract Tweets, run the script below. In this example, the tweets of the UCSC Science Communication class of 2020 are analyzed (in screennames) so their Twitter handles are used. Replace the variables currently defined as None below with them. Keep these keys and codes safe and don’t share them with others. Set datadir to the output directory to store the data.

    The code begins with import statements to use the required packages including json and os, which should come installed with Python.

    import json
    import os
    import pickle
    import twitter 
    
    screennames = ["science_ari", "shussainather", "laragstreiff",                  "scatter_cushion", "jessekathan", "jackjlee",                 "erinmalsbury", "joetting13", "jonathanwosen",                 "heysmartash"] 
    
    CONSUMER_KEY = None
    CONSUMER_SECRET = None
    ACCESS_TOKEN_KEY = None
    ACCESS_TOKEN_SECRET = None
    
    datadir = "data/twitter"

    Extract the information we need. This code goes through each screen name and accesses their tweet and follower information. It then saves the data of both of them to output JSON and pickle files.

    t = twitter.Api(consumer_key = CONSUMER_KEY,
    consumer_secret = CONSUMER_SECRET,
    access_token_key = ACCESS_TOKEN_KEY,
    access_token_secret = ACCESS_TOKEN_SECRET)
    for sn in screennames:
    """
    For each user, get the followers and tweets and save them
    to output pickle and JSON files.
    """
    fo = datadir + "/" + sn + ".followers.pickle"
    # Get the follower information.
    fof = t.GetFollowers(screen_name = sn)
    with open(fo, "w") as fofpickle:
    pickle.dump(fof, fofpickle, protocol = 2)
    with open(fo, "r") as fofpickle:
    with open(fo.replace(".pickle", ".json"), "w") as fofjson:
    fofdata = pickle.load(fofpickle)
    json.dump(fofdata, fofjson) # Get the user's timeline with the 500 most recent tweets.
    timeline = t.GetUserTimeline(screen_name=sn, count=500)
    tweets = [i.AsDict() for i in timeline]
    with open(datadir + "/" + sn + ".tweets.json", "w") as tweetsjson:
    json.dump(tweets, tweetsjson) # Store the informtion in a JSON.

    This should extract the follower and tweets and save them to pickle and JSON files in the datadir.

    3. Process the Data

    Now that you have an input JSON file of tweets, you can set it to the tweetsjson variable in the code below to read it as a pandas DataFrame.

    For the rest of the tutorial, start a new script for convenience.

    import json
    import matplotlib.pyplot as plt
    import networkx as nx
    import numpy as np
    import pandas as pd
    import re

    from plotly.offline import iplot, plot
    from operator import itemgetter

    Use pandas to import the JSON file as a pandas DataFrame.

    df = pd.read_json(tweetsjson)

    Set tfinal as the final DataFrame to make.

    tfinal = pd.DataFrame(columns = ["created_at", "id", "in_reply_to_screen_name", "in_reply_to_status_id", "in_reply_to_user_id", "retweeted_id", "retweeted_screen_name", "user_mentions_screen_name", "user_mentions_id", "text", "user_id", "screen_name", "followers_count"])

    Then, extract the columns you’re interested in and add them to tfinal.

    eqcol = ["created_at", "id", "text"]
    tfinal[eqcol] = df[eqcol]
    tfinal = filldf(tfinal)
    tfinal = tfinal.where((pd.notnull(tfinal)), None)

    Use the following functions to extract information from them. Each function extracts information form the input df DataFrame and adds it to the tfinal one.

    First, get the basic information: screen name, user ID and how many followers.

    def getbasics(tfinal):
    """
    Get the basic information about the user.
    """
    tfinal["screen_name"] = df["user"].apply(lambda x: x["screen_name"])
    tfinal["user_id"] = df["user"].apply(lambda x: x["id"])
    tfinal["followers_count"] = df["user"].apply(lambda x: x["followers_count"])
    return tfinal

    Then, get information on which tweets have been retweeted.

    def getretweets(tfinal):
    """
    Get retweets.
    """
    # Inside the tag "retweeted_status" will find "user" and will get "screen name" and "id".
    tfinal["retweeted_screen_name"] = df["retweeted_status"].apply(lambda x: x["user"]["screen_name"] if x is not np.nan else np.nan)
    tfinal["retweeted_id"] = df["retweeted_status"].apply(lambda x: x["user"]["id_str"] if x is not np.nan else np.nan)
    return tfinal

    Figure out which tweets are replies and to who they are replying.

    def getinreply(tfinal):
    """
    Get reply info.
    """
    # Just copy the "in_reply" columns to the new DataFrame.
    tfinal["in_reply_to_screen_name"] = df["in_reply_to_screen_name"]
    tfinal["in_reply_to_status_id"] = df["in_reply_to_status_id"]
    tfinal["in_reply_to_user_id"]= df["in_reply_to_user_id"]
    return tfinal

    The following function runs each of these functions to get the information into tfinal.

    def filldf(tfinal):
    """
    Put it all together.
    """
    getbasics(tfinal)
    getretweets(tfinal)
    getinreply(tfinal)
    return tfinal

    You’ll use this getinteractions() function in the next step when creating the graph. This takes the actual information from the tfinal DataFrame and puts it into the format that a graph can use.

    def getinteractions(row): """ Get the interactions between different users. """ # From every row of the original DataFrame. # First we obtain the "user_id" and "screen_name". user = row["user_id"], row["screen_name"] # Be careful if there is no user id. if user[0] is None: return (None, None), []

    For the remainder of the for loop, get the information if it’s there.

        # The interactions are going to be a set of tuples.
        interactions = set()
    
        # Add all interactions. 
        # First, we add the interactions corresponding to replies adding 
        # the id and screen_name.
        interactions.add((row["in_reply_to_user_id"], 
        row["in_reply_to_screen_name"]))
        # After that, we add the interactions with retweets.
        interactions.add((row["retweeted_id"], 
        row["retweeted_screen_name"]))
        # And later, the interactions with user mentions.
        interactions.add((row["user_mentions_id"], 
        row["user_mentions_screen_name"]))
    
        # Discard if user id is in interactions.
        interactions.discard((row["user_id"], row["screen_name"]))
        # Discard all not existing values.
        interactions.discard((None, None))
        # Return user and interactions.
        return user, interactions

    4. Create the Graph

    Initialize the graph with networkx.

    graph = nx.Graph()

    Loop through the tfinal DataFrame and get the interaction information. Use the getinteractions function to get each user and interaction involved with each tweet.

    for index, tweet in tfinal.iterrows():
    user, interactions = getinteractions(tweet)
    user_id, user_name = user
    tweet_id = tweet["id"]
    for interaction in interactions:
    int_id, int_name = interaction
    graph.add_edge(user_id, int_id, tweet_id=tweet_id)
    graph.node[user_id]["name"] = user_name
    graph.node[int_id]["name"] = int_name

    5. Evaluate the Graph

    In the field of social network analysis (SNA), researchers use measurements of nodes and edges to tell what graphs re like. This lets you separate the signal from noise when looking at network graphs.

    First, look at the degrees and edges of the graph. The print statements should print out the information about these measurements.

    degrees = [val for (node, val) in graph.degree()]
    print("The maximum degree of the graph is " + str(np.max(degrees)))
    print("The minimum degree of the graph is " + str(np.min(degrees)))
    print("There are " + str(graph.number_of_nodes()) + " nodes and " + str(graph.number_of_edges()) + " edges present in the graph")
    print("The average degree of the nodes in the graph is " + str(np.mean(degrees)))

    Are all the nodes connected?

    if nx.is_connected(graph):
    print("The graph is connected")
    else:
    print("The graph is not connected")
    print("There are " + str(nx.number_connected_components(graph)) + " connected in the graph.")

    Information about the largest subgraph can tell you what sort of tweets represent the majority.

    largestsubgraph = max(nx.connected_component_subgraphs(graph), key=len)
    print("There are " + str(largestsubgraph.number_of_nodes()) + " nodes and " + str(largestsubgraph.number_of_edges()) + " edges present in the largest component of the graph.")

    The clustering coefficient tells you how close together the nodes congregate using the density of the connections surrounding a node. If many nodes are connected in a small area, there will be a high clustering coefficient.

    print("The average clustering coefficient is " + str(nx.average_clustering(largestsubgraph)) + " in the largest subgraph")
    print("The transitivity of the largest subgraph is " + str(nx.transitivity(largestsubgraph)))
    print("The diameter of our graph is " + str(nx.diameter(largestsubgraph)))
    print("The average distance between any two nodes is " + str(nx.average_shortest_path_length(largestsubgraph)))

    Centrality tells you how many direct, “one step,” connections each node has to other nodes in the network, and there are two ways to measure it. “Betweenness centrality” represents which nodes act as “bridges” between nodes in a network by finding the shortest paths and counting how many times each node falls on one. “Closeness centrality,” instead, scores each node based on the sum of the shortest paths.

    graphcentrality = nx.degree_centrality(largestsubgraph)
    maxde = max(graphcentrality.items(), key=itemgetter(1))
    graphcloseness = nx.closeness_centrality(largestsubgraph)
    graphbetweenness = nx.betweenness_centrality(largestsubgraph, normalized=True, endpoints=False)
    maxclo = max(graphcloseness.items(), key=itemgetter(1))
    maxbet = max(graphbetweenness.items(), key=itemgetter(1))

    print("The node with ID " + str(maxde[0]) + " has a degree centrality of " + str(maxde[1]) + " which is the max of the graph.")
    print("The node with ID " + str(maxclo[0]) + " has a closeness centrality of " + str(maxclo[1]) + " which is the max of the graph.")
    print("The node with ID " + str(maxbet[0]) + " has a betweenness centrality of " + str(maxbet[1]) + " which is the max of the graph.")

    6. Plot the Map

    Get the edges and store them in lists Xe and Ye in the x- and y-directions.

    Xe=[]
    Ye=[]
    for e in G.edges():
    Xe.extend([pos[e[0]][0], pos[e[1]][0], None])
    Ye.extend([pos[e[0]][1], pos[e[1]][1], None])

    Define the Plotly “trace” for nodes and edges. Plotly uses these traces as a way of storing the graph data right before it’s plotted.

    trace_nodes = dict(type="scatter",
                     x=Xn, 
                     y=Yn,
                     mode="markers",
                     marker=dict(size=28, color="rgb(0,240,0)"),
                     text=labels,
                     hoverinfo="text")
    
    trace_edges = dict(type="scatter",                  
                     mode="lines",                  
                     x=Xe,                  
                     y=Ye,                 
                     line=dict(width=1, color="rgb(25,25,25)"),                                         hoverinfo="none")

    Plot the graph with the Fruchterman-Reingold layout algorithm. This image shows an example of a graph plotted with this algorithm, designed to provide clear, explicit ways the nodes are connected.

    The force-directed Fruchterman-Reingold algorithm to draw nodes in an understandable way.

    pos = nx.fruchterman_reingold_layout(G)

    Use the axis and layout variables to customize what appears on the graph. Using the showline=False, option, you will hide the axis line, grid, tick labels and title of the graph. Then the fig variable creates the actual figure.

    axis = dict(showline=False,
    zeroline=False,
    showgrid=False,
    showticklabels=False,
    title=""
    )


    layout = dict(title= "My Graph",
    font= dict(family="Balto"),
    width=600,
    height=600,
    autosize=False,
    showlegend=False,
    xaxis=axis,
    yaxis=axis,
    margin=dict(
    l=40,
    r=40,
    b=85,
    t=100,
    pad=0,
    ),
    hovermode="closest",
    plot_bgcolor="#EFECEA", # Set background color.
    )


    fig = dict(data=[trace_edges, trace_nodes], layout=layout)

    Annotate with the information you want others to on each node. Use the labels variable to list (with the same length as pos) what should appear as an annotation.

    labels = range(len(pos))

    def make_annotations(pos, anno_text, font_size=14, font_color="rgb(10,10,10)"):
    L=len(pos)
    if len(anno_text)!=L:
    raise ValueError("The lists pos and text must have the same len")
    annotations = []
    for k in range(L):
    annotations.append(dict(text=anno_text[k],
    x=pos[k][0],
    y=pos[k][1]+0.075,#this additional value is chosen by trial and error
    xref="x1", yref="y1",
    font=dict(color= font_color, size=font_size),
    showarrow=False)
    )
    return annotations
    fig["layout"].update(annotations=make_annotations(pos, labels))

    Finally, plot.

    iplot(fig)

    An example graph

    Make a word cloud in a single line of Python

    Moby-Dick, visualized

    This is a concise way to make a word cloud using Python. It can teach you basics of coding while creating a nice graphic.

    It’s actually four lines of code, but making the word cloud only takes one line, the final one.

    import nltk
    from wordcloud import WordCloud
    nltk.download("stopwords")
    WordCloud(background_color="white", max_words=5000, contour_width=3, contour_color="steelblue").generate_from_text(" ".join([r for r in open("mobydick.txt", "r").read().split() if r not in set(nltk.corpus.stopwords.words("english"))])).to_file("wordcloud.png")

    Just tell me what to do now!

    The first two lines lines specify the required packages you must download with these links: nltk and wordcloud. You may also try these links: nltk and wordcloud to download them. The third line downloads the stop words (common words like “the”, “a” and “in”) that you don’t want in your word cloud.

    The fourth line is complicated. Calling the WordCloud() method, you can specify the background color, contour color and other options (found here). generate_from_text() takes a string of words to put in the word cloud.

    The " ".join() creates this string of words separated by spaces from a list of words. The for loop in the square brackets[] creates this list of each word from the input file (in this case, mobydick.txt) with the r variable letting you use each word one at a time in the list.

    The input file is open(), read() and split() into its words under the condition (using if) they aren’t in nltk.corpus.stopwords.words("english"). Finally, to_file() saves the image as wordcloud.png.

    How to use this code

    In the code, change "mobydick.txt" to the name of your text file (keep the quotation marks). Save the code in a file makewordcloud.py in the text file’s directory, and use a command line interface (such as Terminal) to navigate to the directory.

    Run your script using python makewordcloud.py, and check out your wordcloud.png!

    Global Mapping of Critical Minerals

    The periodic table below illustrates the global abundance of critical minerals in the Earth’s crust in parts per million (ppm). Hover over each element to view! Lanthanides and actinides are omitted due to lack of available data.

    Data is obtained from the USGS handbook “Critical Mineral Resources of the United States— Economic and Environmental Geology and Prospects for Future Supply.” The code used is found here.

    Bokeh Plot

    Because these minerals tend to concentrate in specific countries like niobium in Brazil or antimony in China and remain central to many areas of society such as national defense or engineering, governments like the US have come forward with listing these minerals as “critical.”

    The abundance across different continents is shown in the map above.

    You can find gallium, the most abundant of the critical minerals, in place of aluminum and zinc, elements smaller than gallium. Processing bauxite ore or sphalerite ore (from the sediment-hosted, Mississippi Valley-type and volcanogenic massive sulfide) of zinc yield gallium. The US meets its gallium needs through primary, recycled and refined forms of the element.

    German and indium have uses in electronics, flat-panel display screens, light-emitting diodes (LEDs) and solar power arrays. China, Belgium, Canada, Japan and South Korea are the main producers of indium while germanium production can be more complicated. In many cases, countries import primary germanium from other ones, such as Canada importing from the US or Finland from the Democratic Republic of the Congo, to recover them.

    Rechargable battery cathodes and jet aircraft turbine engines make use of cobalt. While the element is the central atom in vitamin B12, excess and overexposure can cause lung and heart dysfunction and dermatitis.

    As one of only three countries that processes beryllium into products, the US doesn’t put much time or money into exploring for new deposits within its own borders because a single producer dominates the domestic berllyium market. Beryllium finds uses in magnetic resonance imaging (MRI) and medical lasers.

    A Deep Learning Overview with Python

    This course proposes a quick introduction to deep learning and two of its major networks, convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The purpose is to give an intuitive sense of how to implement deep learning approaches for various tasks. To use this iPython notebook, run the python code in separate files for each cell. The content below each cell of this notebook is the output for running those cells.

    Simple perceptron

    In [1]:
    import numpy as np
    
    # sigmoid function
    def sigmoid(x,deriv=False):
        if(deriv==True):
            return x*(1-x)
        return 1/(1+np.exp(-x))
        
    # input dataset
    X = np.array([[0,0,1],
                  [0,1,1],
                  [1,0,1],
                  [1,1,1]])
        
    # output dataset            
    y = np.array([[0,0,1,1]]).T
    
    # seed random numbers to make calculation
    # deterministic (just a good practice)
    np.random.seed(1)
    
    # initialize weights randomly with mean 0
    syn0 = 2*np.random.random((3,1)) - 1
    
    for j in range(100000):
    
        # forward propagation
        l0 = X
        l1 = sigmoid(np.dot(l0,syn0))
    
        # how much did we miss?
        l1_error = y - l1
        if (j% 10000) == 0:
            print("Error:" + str(np.mean(np.abs(l1_error))))
    
        # multiply how much we missed by the 
        # slope of the sigmoid at the values in l1
        l1_delta = l1_error * sigmoid(l1,True)
    
        # update weights
        syn0 += np.dot(l0.T,l1_delta)
    
    print()
    print("Prediction after Training:")
    print(l1)
    
    Error:0.517208275438
    Error:0.00795484506673
    Error:0.0055978239634
    Error:0.00456086918013
    Error:0.00394482243339
    Error:0.00352530883742
    Error:0.00321610234673
    Error:0.00297605968522
    Error:0.00278274003022
    Error:0.0026227273927
    
    Prediction after Training:
    [[ 0.00301758]
     [ 0.00246109]
     [ 0.99799161]
     [ 0.99753723]]
    

    What is the loss function here? How is it calculated?

    Any idea how it would perform on non-linearly separable data? How could we test it?

    Multilayer perceptron

    Let’s use the fact that the sigmoid is differenciable (while the step function we saw in the slides is not). This allows us to add more layers (hence more modelling power).

    In [2]:
    import numpy as np
    
    def sigmoid(x,deriv=False):
    	if(deriv==True):
    	    return x*(1-x)
    
    	return 1/(1+np.exp(-x))
        
    X = np.array([[0,0,1],
                  [0,1,1],
                  [1,0,1],
                  [1,1,1]])
                    
    y = np.array([[0],
    			  [1],
    			  [1],
    			  [0]])
    
    np.random.seed(1)
    
    # randomly initialize our weights with mean 0
    syn0 = 2*np.random.random((3,4)) - 1
    syn1 = 2*np.random.random((4,1)) - 1
    
    for j in range(100000):
    
    	# Feed forward through layers 0, 1, and 2
        l0 = X
        l1 = sigmoid(np.dot(l0,syn0))
        l2 = sigmoid(np.dot(l1,syn1))
    
        # how much did we miss the target value?
        l2_error = y - l2
        
        if (j% 10000) == 0:
            print("Error:" + str(np.mean(np.abs(l2_error))))
            
        # in what direction is the target value?
        # were we really sure? if so, don't change too much.
        l2_delta = l2_error*sigmoid(l2,deriv=True)
    
        # how much did each l1 value contribute to the l2 error (according to the weights)?
        l1_error = l2_delta.dot(syn1.T)
        
        # in what direction is the target l1?
        # were we really sure? if so, don't change too much.
        l1_delta = l1_error * sigmoid(l1,deriv=True)
    
        syn1 += l1.T.dot(l2_delta)
        syn0 += l0.T.dot(l1_delta)
        
    print()
    print(l2)
    
    Error:0.496410031903
    Error:0.00858452565325
    Error:0.00578945986251
    Error:0.00462917677677
    Error:0.00395876528027
    Error:0.00351012256786
    Error:0.00318350238587
    Error:0.00293230634228
    Error:0.00273150641821
    Error:0.00256631724004
    
    [[ 0.00199094]
     [ 0.99751458]
     [ 0.99771098]
     [ 0.00294418]]
    

    Setting up the environment

    We have done toy examples for feedforward networks. Things quickly become complicated, so let’s go deeper by relying on high-level frameworks: TensorFlow and Keras. Most technicalities are thus avoided so that you can directly play with networks.

    In [ ]:
    !conda install tensorflow keras
    
    In [3]:
    import tensorflow as tf
    import keras
    
    /Users/syedather/.local/lib/python3.6/site-packages/matplotlib/__init__.py:1067: UserWarning: Duplicate key in file "/Users/syedather/.matplotlib/matplotlibrc", line #2
      (fname, cnt))
    Using TensorFlow backend.
    
    In [4]:
    hello = tf.constant('Hello, TensorFlow!')
    sess = tf.Session()
    print(sess.run(hello))
    
    b'Hello, TensorFlow!'
    

    CNNs

    We are going to use the MNIST dataset for our first task. The code below loads the dataset and shows one training example and its label.

    In [5]:
    from __future__ import print_function
    import keras
    from keras.datasets import mnist
    from keras.models import Sequential
    from keras.layers import Dense, Dropout, Flatten
    from keras.layers import Conv2D, MaxPooling2D
    from keras import backend as K
    from pylab import *
    
    # the data, split between train and test sets
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    
    print("The first training instance is labeled as: "+str(y_train[0]))
    
    The first training instance is labeled as: 5
    
    In [6]:
    figure(1)
    imshow(x_train[0], interpolation='nearest')
    
    Out[6]:
    <matplotlib.image.AxesImage at 0x1259b2320>

    Now study the following code. What is the network we use? How many layers? What hyper parameters?

    In [7]:
    # Setup some hyper parameters
    batch_size = 128
    num_classes = 10
    epochs = 15
    
    # input image dimensions
    img_rows, img_cols = 28, 28
    
    # This is some technicality regarding Keras' dataset
    if K.image_data_format() == 'channels_first':
        x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
        x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
        input_shape = (1, img_rows, img_cols)
    else:
        x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
        x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
        input_shape = (img_rows, img_cols, 1)
    
    # We convert the matrices to floats as we will use real numbers
    x_train = x_train.astype('float32')[:1000]
    x_test = x_test.astype('float32')[:200]
    x_train /= 255
    x_test /= 255
    print('x_train shape:', x_train.shape)
    print(x_train.shape[0], 'train samples')
    print(x_test.shape[0], 'test samples')
    
    # convert class vectors to binary class matrices
    y_train = keras.utils.to_categorical(y_train, num_classes)[:1000]
    y_test = keras.utils.to_categorical(y_test, num_classes)[:200]
    
    
    # Build network
    model = Sequential()
    model.add(Conv2D(32, kernel_size=(3, 3),
                     activation='relu',
                     input_shape=input_shape))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    # model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    # model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation='softmax'))
    
    model.compile(loss=keras.losses.categorical_crossentropy,
                  optimizer=keras.optimizers.Adam(),
                  metrics=['accuracy'])
    
    # Train
    model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              verbose=1,
              validation_data=(x_test, y_test))
    
    # Evaluate on test data
    score = model.evaluate(x_test, y_test, verbose=0)
    print()
    print('Test loss:', score[0])
    print('Test accuracy:', score[1])
    
    # Evaluate on training data
    score = model.evaluate(x_train, y_train, verbose=0)
    print()
    print('Train loss:', score[0])
    print('Train accuracy:', score[1])
    
    x_train shape: (1000, 28, 28, 1)
    1000 train samples
    200 test samples
    Train on 1000 samples, validate on 200 samples
    Epoch 1/15
    1000/1000 [==============================] - 4s 4ms/step - loss: 1.7244 - acc: 0.5660 - val_loss: 0.9116 - val_acc: 0.7900
    Epoch 2/15
    1000/1000 [==============================] - 4s 4ms/step - loss: 0.5967 - acc: 0.8320 - val_loss: 0.5148 - val_acc: 0.8100
    Epoch 3/15
    1000/1000 [==============================] - 3s 3ms/step - loss: 0.4394 - acc: 0.8670 - val_loss: 0.3056 - val_acc: 0.8600
    Epoch 4/15
    1000/1000 [==============================] - 3s 3ms/step - loss: 0.3296 - acc: 0.9050 - val_loss: 0.3263 - val_acc: 0.9000
    Epoch 5/15
    1000/1000 [==============================] - 3s 3ms/step - loss: 0.2205 - acc: 0.9360 - val_loss: 0.2092 - val_acc: 0.9200
    Epoch 6/15
    1000/1000 [==============================] - 3s 3ms/step - loss: 0.1684 - acc: 0.9560 - val_loss: 0.1870 - val_acc: 0.9450
    Epoch 7/15
    1000/1000 [==============================] - 3s 3ms/step - loss: 0.1325 - acc: 0.9690 - val_loss: 0.1597 - val_acc: 0.9350
    Epoch 8/15
    1000/1000 [==============================] - 3s 3ms/step - loss: 0.0990 - acc: 0.9740 - val_loss: 0.1617 - val_acc: 0.9400
    Epoch 9/15
    1000/1000 [==============================] - 3s 3ms/step - loss: 0.0636 - acc: 0.9840 - val_loss: 0.1434 - val_acc: 0.9450
    Epoch 10/15
    1000/1000 [==============================] - 3s 3ms/step - loss: 0.0393 - acc: 0.9960 - val_loss: 0.1545 - val_acc: 0.9400
    Epoch 11/15
    1000/1000 [==============================] - 3s 3ms/step - loss: 0.0267 - acc: 0.9950 - val_loss: 0.1444 - val_acc: 0.9400
    Epoch 12/15
    1000/1000 [==============================] - 4s 4ms/step - loss: 0.0158 - acc: 1.0000 - val_loss: 0.1642 - val_acc: 0.9350
    Epoch 13/15
    1000/1000 [==============================] - 3s 3ms/step - loss: 0.0090 - acc: 1.0000 - val_loss: 0.1475 - val_acc: 0.9450
    Epoch 14/15
    1000/1000 [==============================] - 4s 4ms/step - loss: 0.0057 - acc: 1.0000 - val_loss: 0.1556 - val_acc: 0.9350
    Epoch 15/15
    1000/1000 [==============================] - 4s 4ms/step - loss: 0.0041 - acc: 1.0000 - val_loss: 0.1651 - val_acc: 0.9350
    
    Test loss: 0.165074422359
    Test accuracy: 0.935
    
    Train loss: 0.00311407446489
    Train accuracy: 1.0
    

    Is there anything wrong here?

    How do you think a linear classifier performs?

    In [8]:
    # Setup some hyper parameters
    batch_size = 128
    num_classes = 10
    epochs = 15
    
    # input image dimensions
    img_rows, img_cols = 28, 28
    
    # This is some technicality regarding Keras' dataset
    if K.image_data_format() == 'channels_first':
        x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
        x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
        input_shape = (1, img_rows, img_cols)
    else:
        x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
        x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
        input_shape = (img_rows, img_cols, 1)
    
    # We convert the matrices to floats as we will use real numbers
    x_train = x_train.astype('float32')[:1000]
    x_test = x_test.astype('float32')[:200]
    x_train /= 255
    x_test /= 255
    print('x_train shape:', x_train.shape)
    print(x_train.shape[0], 'train samples')
    print(x_test.shape[0], 'test samples')
    
    # convert class vectors to binary class matrices
    y_train = keras.utils.to_categorical(y_train, num_classes)[:1000]
    y_test = keras.utils.to_categorical(y_test, num_classes)[:200]
    
    
    # Build network
    model = Sequential()
    model.add(Conv2D(32, kernel_size=(3, 3),
                     activation='relu',
                     input_shape=input_shape))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation='softmax'))
    
    model.compile(loss=keras.losses.categorical_crossentropy,
                  optimizer=keras.optimizers.Adam(),
                  metrics=['accuracy'])
    
    # Train
    model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              verbose=1,
              validation_data=(x_test, y_test))
    
    # Evaluate on test data
    score = model.evaluate(x_test, y_test, verbose=0)
    print()
    print('Test loss:', score[0])
    print('Test accuracy:', score[1])
    
    # Evaluate on training data
    score = model.evaluate(x_train, y_train, verbose=0)
    print()
    print('Train loss:', score[0])
    print('Train accuracy:', score[1])
    
    x_train shape: (1000, 28, 28, 1)
    1000 train samples
    200 test samples
    
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-8-a1470fe28059> in <module>()
         53           epochs=epochs,
         54           verbose=1,
    ---> 55           validation_data=(x_test, y_test))
         56 
         57 # Evaluate on test data
    
    ~/anaconda3/lib/python3.6/site-packages/keras/models.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
        961                               initial_epoch=initial_epoch,
        962                               steps_per_epoch=steps_per_epoch,
    --> 963                               validation_steps=validation_steps)
        964 
        965     def evaluate(self, x=None, y=None,
    
    ~/anaconda3/lib/python3.6/site-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
       1628             sample_weight=sample_weight,
       1629             class_weight=class_weight,
    -> 1630             batch_size=batch_size)
       1631         # Prepare validation data.
       1632         do_validation = False
    
    ~/anaconda3/lib/python3.6/site-packages/keras/engine/training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size)
       1478                                     output_shapes,
       1479                                     check_batch_axis=False,
    -> 1480                                     exception_prefix='target')
       1481         sample_weights = _standardize_sample_weights(sample_weight,
       1482                                                      self._feed_output_names)
    
    ~/anaconda3/lib/python3.6/site-packages/keras/engine/training.py in _standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
        111                         ': expected ' + names[i] + ' to have ' +
        112                         str(len(shape)) + ' dimensions, but got array '
    --> 113                         'with shape ' + str(data_shape))
        114                 if not check_batch_axis:
        115                     data_shape = data_shape[1:]
    
    ValueError: Error when checking target: expected dense_4 to have 2 dimensions, but got array with shape (1000, 10, 10)

    Let’s use this model to predict a value for the first training instance we vizualized.

    In [ ]:
    print(model.predict(np.expand_dims(x_train[0], axis=0)))
    

    Is the model correct here? What is the output of the network?

    RNNs

    We will now switch to RNNs. These require more resources, so we can’t do the fanciest applications during the workshop. We will do some sentiment classification of movie reviews.

    In [9]:
    from __future__ import print_function
    import numpy as np
    import keras
    from keras.preprocessing import sequence
    from keras.models import Sequential
    from keras.layers import Dense, Dropout, Embedding, LSTM, Bidirectional
    from keras.datasets import imdb
    
    # Number of considered words, based on frequencies
    max_features = 20000
    # cut texts after this number of words
    maxlen = 100
    batch_size = 32
    
    print('Loading data...')
    (x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=max_features, index_from=3)
    
    # This is just for pretty printing the sentences...
    word_to_id = keras.datasets.imdb.get_word_index()
    word_to_id = {k:(v+3) for k,v in word_to_id.items()}
    word_to_id["<PAD>"] = 0
    word_to_id["<START>"] = 1
    word_to_id["<UNK>"] = 2
    id_to_word = {value:key for key,value in word_to_id.items()}
    
    print("Here's the input for the first training instance:")
    print(' '.join(id_to_word[id] for id in x_train[0] ))
    
    Loading data...
    Downloading data from https://s3.amazonaws.com/text-datasets/imdb.npz
    17465344/17464789 [==============================] - 2s 0us/step
    Downloading data from https://s3.amazonaws.com/text-datasets/imdb_word_index.json
    1646592/1641221 [==============================] - 0s 0us/step
    Here's the input for the first training instance:
    <START> this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert <UNK> is an amazing actor and now the same being director <UNK> father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for retail and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also congratulations to the two little boy's that played the <UNK> of norman and paul they were just brilliant children are often left out of the praising list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all
    

    What do you think about this text? Is it a positive or negative review?

    In [10]:
    print("Here are the dataset shapes")
    print(len(x_train), 'train sequences')
    print(len(x_test), 'test sequences')
    
    print("And the input for the first instance is represented as:")
    print(x_train[0])
    
    Here are the dataset shapes
    25000 train sequences
    25000 test sequences
    And the input for the first instance is represented as:
    [1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]
    

    What do these numbers represent? Is there any limitation you can imagine coming from this?

    In [11]:
    print('Pad sequences (samples x time)')
    x_train = sequence.pad_sequences(x_train, maxlen=maxlen)[:5000]
    x_test = sequence.pad_sequences(x_test, maxlen=maxlen)[:5000]
    print('x_train shape:', x_train.shape)
    print('x_test shape:', x_test.shape)
    y_train = np.array(y_train)[:5000]
    y_test = np.array(y_test)[:5000]
    
    model = Sequential()
    model.add(Embedding(max_features, 128, input_length=maxlen))
    model.add(Bidirectional(LSTM(64)))
    model.add(Dropout(0.5))
    model.add(Dense(1, activation='sigmoid'))
    
    model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])
    
    print('Train...')
    model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=4,
              validation_data=[x_test, y_test])
    
    Pad sequences (samples x time)
    x_train shape: (5000, 100)
    x_test shape: (5000, 100)
    Train...
    Train on 5000 samples, validate on 5000 samples
    Epoch 1/4
    5000/5000 [==============================] - 54s 11ms/step - loss: 0.6032 - acc: 0.6570 - val_loss: 0.4283 - val_acc: 0.8056
    Epoch 2/4
    5000/5000 [==============================] - 54s 11ms/step - loss: 0.2761 - acc: 0.8918 - val_loss: 0.4403 - val_acc: 0.7948
    Epoch 3/4
    5000/5000 [==============================] - 61s 12ms/step - loss: 0.1101 - acc: 0.9670 - val_loss: 0.6366 - val_acc: 0.8026
    Epoch 4/4
    5000/5000 [==============================] - 56s 11ms/step - loss: 0.0478 - acc: 0.9868 - val_loss: 0.6637 - val_acc: 0.7954
    
    Out[11]:
    <keras.callbacks.History at 0x1392d76d8>
    In [12]:
    print("The neural net predicts that the first instance sentiment is:")
    print(model.predict(np.expand_dims(x_train[0], axis=0)))
    
    The neural net predicts that the first instance sentiment is:
    [[ 0.99445081]]
    

    Remarks? Comments?

    How do the training scores compare to the test scores? How can we improve this? What are the current limitations?

    This RNN use case takes more time to train but it is definitely more impressive. We will model the language, by training on a novel. For each (set of) word(s) in the novel, the objective is to predict the following word. This can be done on any text, and we don’t need annotated data – the text itself is enough.

    Have a look at the following piece of code and try to understand what it does. Then, run it and see the network generating text! At first, the output is not meaningful, but it becomes so over time. This is the magic I was referring to.

    Beware: this will take longer to run on a CPU. A GPU is recommended, but you can still try to run it for a while to see the predictions evolve. On my laptop, an epoch takes 6mins so the full training takes 6hrs. About 20 epochs are required for the generated text to be somewhat meaningful.

    Note, however, that although this seems long, training actual deep learning models for concrete tasks takes days, even on multiple GPUs. This is mostly because of the data size and the much deeper networks.

    In [ ]:
    from __future__ import print_function
    from keras.callbacks import LambdaCallback
    from keras.models import Sequential
    from keras.layers import Dense, Activation
    from keras.layers import LSTM
    from keras.optimizers import RMSprop
    from keras.utils.data_utils import get_file
    import numpy as np
    import random
    import sys
    import io
    
    # We load a text from Nietzsche
    path = get_file('nietzsche.txt', origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
    with io.open(path, encoding='utf-8') as f:
        text = f.read().lower()
    print('corpus length:', len(text))
    
    # We create dictionaries of character > index and the other way around
    chars = sorted(list(set(text)))
    print('total chars:', len(chars))
    char_indices = dict((c, i) for i, c in enumerate(chars))
    indices_char = dict((i, c) for i, c in enumerate(chars))
    
    # cut the text in semi-redundant sequences of maxlen characters
    maxlen = 40
    step = 3
    sentences = []
    next_chars = []
    for i in range(0, len(text) - maxlen, step):
        sentences.append(text[i: i + maxlen])
        next_chars.append(text[i + maxlen])
    print('nb sequences:', len(sentences))
    
    print('Vectorization...')
    x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
    y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
    for i, sentence in enumerate(sentences):
        for t, char in enumerate(sentence):
            x[i, t, char_indices[char]] = 1
        y[i, char_indices[next_chars[i]]] = 1
    
    
    # build the model: a single LSTM
    print('Build model...')
    model = Sequential()
    model.add(LSTM(128, input_shape=(maxlen, len(chars))))
    model.add(Dense(len(chars)))
    model.add(Activation('softmax'))
    
    optimizer = RMSprop(lr=0.01)
    model.compile(loss='categorical_crossentropy', optimizer=optimizer)
    
    
    def sample(preds, temperature=1.0):
        # helper function to sample an index from a probability array
        preds = np.asarray(preds).astype('float64')
        preds = np.log(preds) / temperature
        exp_preds = np.exp(preds)
        preds = exp_preds / np.sum(exp_preds)
        probas = np.random.multinomial(1, preds, 1)
        return np.argmax(probas)
    
    
    def on_epoch_end(epoch, logs):
        # Function invoked at end of each epoch. Prints generated text.
        print()
        print('----- Generating text after Epoch: %d' % epoch)
    
        start_index = random.randint(0, len(text) - maxlen - 1)
        for diversity in [0.2, 0.5, 1.0, 1.2]:
            print('----- diversity:', diversity)
    
            generated = ''
            sentence = text[start_index: start_index + maxlen]
            generated += sentence
            print('----- Generating with seed: "' + sentence + '"')
            sys.stdout.write(generated)
    
            for i in range(400):
                x_pred = np.zeros((1, maxlen, len(chars)))
                for t, char in enumerate(sentence):
                    x_pred[0, t, char_indices[char]] = 1.
    
                preds = model.predict(x_pred, verbose=0)[0]
                next_index = sample(preds, diversity)
                next_char = indices_char[next_index]
    
                generated += next_char
                sentence = sentence[1:] + next_char
    
                sys.stdout.write(next_char)
                sys.stdout.flush()
            print()
    
    print_callback = LambdaCallback(on_epoch_end=on_epoch_end)
    
    model.fit(x, y,
              batch_size=128,
              epochs=60,
              callbacks=[print_callback])
    
    Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
    606208/600901 [==============================] - 0s 0us/step
    corpus length: 600893
    total chars: 57
    nb sequences: 200285
    Vectorization...
    Build model...
    Epoch 1/60
    200285/200285 [==============================] - 281s 1ms/step - loss: 1.9553
    
    ----- Generating text after Epoch: 0
    ----- diversity: 0.2
    ----- Generating with seed: "to
    agree with many people. "good" is no "
    to
    agree with many people. "good" is no and it is the the of the same the of the sention of the strenge of the most the self-our of the inderent that the sensive indeed the one of the constitute of the most of the semple of the desire of the sensive of the most of the semple of the sempathy of the one of the into the every to a soul of the some of the persent the free of the semple of the most of the sention of the of the spiritual the 
    ----- diversity: 0.5
    ----- Generating with seed: "to
    agree with many people. "good" is no "
    to
    agree with many people. "good" is no may a suptimes and also orage mankind the one of indeed of one streng the possible the sensition and the inderenation of a sul the in a sould be the orting a solitiarity of religions in a man of such and a scient, in every of and the self-to and of a revilued it is the most in the indeed, and it is assual that the ord of the of the distiture in its all the manter of the soul permans the decours of
    ----- diversity: 1.0
    ----- Generating with seed: "to
    agree with many people. "good" is no "
    to
    agree with many people. "good" is no causest and hew the fown of every groktulr
    destined a the art it noteriness of one it all and
    and cothinded of that rendercaterfroe to doe," in the pational the is the onl yutre
    allor upitsoon,--one
    viburan mused a "master in the that niver if
    a pridicle quesiles of
    the shoold enss nowxing to
    feef ma.t--wute disequerly that then her rewadd finale the eeblive alse rusurefver" a selovery catte he re
    ----- diversity: 1.2
    ----- Generating with seed: "to
    agree with many people. "good" is no "
    to
    agree with many people. "good" is no likeurenes, it is novamentstisuser'stone, indos paces. fund, wethel feel the
    que let doee new eveny that is that the catel. thotgy is
    within ceoks of theregeritades) and itwas brutmes ageteron
    clyrelogilabl freephi; its. by an? andaver happ
    one of his absuman artificss? itself old a
    ooker himsood and bus hray
    fined in smuch is sudtirers of rerarder from and
    afutty
    mest utfered with to "bewnook one
    Epoch 2/60
     81664/200285 [===========>..................] - ETA: 2:37 - loss: 1.6395

    Web Scraping with Python Made Easy

    Imagine you run a business selling shoes online and wanted to monitor how your competitors price their products. You could spend hours a day clicking through page after page or write a script for a web bot, an automated piece of software that keeps track a site’s updates. That’s where web scraping comes in.

    Scraping websites lets you extract information from hundreds or thousands of webpages at once. You can search websites like Indeed for job opportunities or Twitter for tweets. In this gentle introduction to web scraping, we’ll go over the basic code to scrape websites such that anyone, regardless of background, can extract and analyze these kinds of results.

    Getting Started

    Using my GitHub repository on web scraping, you can install the software and run the scripts as instructed. Click on the src directory on the repository page to see the README.md file that explains each script and how to run them.

    Examining the Site

    You can use a sitemap file to located where websites upload content without crawling every single web page. Here’s a sample one. You can also find out how large a site is and how much information you can actually extract from it. You can search a site using Google’s Advanced Search to figure out how many pages you may need to scrape. This will come in handy when creating a web scraper that may need to pause for updates or act in a different manner after reaching a certain number of pages.

    You can also run the identify.py script in the src directory to figure out more information bout how each site was built. This should give info about the frameworks, programming languages, and servers used in building each website as well as the registered owner for the domain. This also uses robotparser to check for restrictions.

    Many websites have a robots.txt file with crawling restrictions. Make sure you check out this file for a website for more information about how to crawl a website or any rules that you should follow. The sample protocol can be found here.

    Crawling a Site

    There are three general approaches to crawling a site: Crawling a sitemap, Iterating through an ID for each webpage, and following webpage links. download.py shows how to download a webpage with methods of sitemap crawling, results.py shows you how to scrape those results while iterating through webpage IDs, and indeedScrape.py uses the webpage links for crawling. download.py also contains information on inserting delays, returning a list of links from HTML, and supporting proxies that can let you access websites through blocked requests.

    Scraping the Data

    In the file compare.py, you can compare the efficiency of the three web scraping methods.

    You can use regular expressions (known as regex or regexp) to perform neat tricks with text for getting information from websites. The script regex.py shows how this is done.

    You can also use the browser extension Firebug Lite to get information from a webpage. In Chrome, you can click View >> Developer >> View Source to get the source behind a webpage.

    Beautiful Soup, one of the requried packages to run indeedScrape.py, parses a webpage and provides a convenient interface to navigate the content, as shown in bs4test.py. Lxml also does this in lxmltest.py. A comparison of these three scraping methods are in the following table.

    Scraping methodPerformanceEase of useEase of install
    RegexFastHardEasy
    Beautiful SoupSlowEasyEasy
    lxmlFastEasyHard

    The callback.py script lets you scrape data and save it to an output .csv file.

    Caching Downloads

    Caching crawled webpages lets you store them in a manageablae format while only having to download them once. In download.py, there’s a python class Downloader that shows how to cache URLs after downloading their webpages. cache.py has a python class that maps a URL to a filename when caching.

    Depending on which operating system you’re using, there’s a limit to how much you can cache.

    Operating systemFile systemInvalid filename charactersMax filename length
    LinuxExt3/Ext4/, \0255 bytes
    OS X HFS Plus:, \0255 UT-16 code units
    WindowsNTFS \, /, ?, :, *, >, <, |255 characters

    Though cache.py is easy to use, you can take the hash of the URL itself to use as the filename to ensure your files directly map to the URLs of the saved cache. Using MongoDB, you can build ontop of the current file system database system and avoid the file system limitations. This method is found in mongocache.py using pymongo, a Python wrapper for MongoDB.

    Test out the other scripts such as alexacb.py for downloading information on the top sites by Alexa ranking. mongoqueue.py has functionality for queueing the MongoDB inquiries that can be imported to other scripts.

    You can work with dynamic webpages using the code from browserrender.py. The majority of leading websites using JavScript for functionality, meaning you can’t view all their content in barebones HTML.

    An introduction to philosophy

    Table of contents

    Ethics

    Classical ethics

    • Aristotle “Nichomachean Ethics” “On Virtues and Vices”

    Christian and Medieval ethics

    • Thomas Aquinas “Summa Theologica”

    • Saint Bonaventure “Commentary on the Sentences”

    • Duns Scotus “Philosophical Writings”

    • William of Ockham “Sum of Logic”

    Modern ethics

    • G. E. M. Anscombe “Modern Moral Philosophy”

    • David Gauthier “Morals by Agreement”

    • Alan Gewirth “Reason and Morality”

    • Allan Gibbard “Thinking How to Live”

    • Susan Hurley “Natural Reasons”

    • Christine Korsgaard “The Sources of Normativity”

    • John McDowell “Values and Secondary Qualities”

    • Alasdair MacIntyre “After Virtue”

    • J. L. Mackie “Ethics: Inventing Right and Wrong”

    • G. E. Moore “Principia Ethica”

    • Martha Nussbaum “The Fragility of Goodness”

    • Derek Parfit “Reasons and Persons”

    • Derek Parfit “On What Matters”

    • Peter Railton “Facts, Values, and Norms”

    • W. D. Ross “The Right and the Good”

    • Thomas M. Scanlon “What We Owe to Each Other”

    • Samuel Scheffler “The Rejection of Consequentialism”

    • Peter Singer “Practical Ethics”

    • Michael A. Smith “The Moral Problem”

    • Bernard Williams “Ethics and the Limits of Philosophy”

    Postmodern ethics

    • Zygmunt Bauman “Postmodern Ethics”

    • Terry Eagleton “The Illusions of Postmodernism”

    Bioethics

    • Don Marquis “Why Abortion is Immoral”

    • Paul Ramsey “The Patient as a Person” “Fabricated Man”

    • Judith Jarvis Thomson “A Defense of Abortion”

    Meta-ethics (Metaethics)

    • P. F. Strawson “Freedom and Resentment”

    Epistemology

    • Laurence Bonjour “The Structure of Empirical Knowledge”

    • Luc Bovens “Bayesian Epistemology”

    • Stanley Cavell “The Claim of Reason: Wittgenstein, Skepticism, Morality, and Tragedy”

    • Roderick Chisholm “Theory of Knowledge”

    • Keith DeRose “The Case for Contextualism”

    • René Descartes “Discourse on the Method”, “Meditations on First Philosophy”

    • Edmund Gettier “Is Justified True Belief Knowledge?”

    • Alvin Goldman “Epistemology and Cognition” “What is Justified Belief?”

    • Susan Haack “Evidence and Enquiry”

    • Hilary Kornblith “Knowledge and its Place in Nature”

    • Jonathan Kvanvig “The Value of Knowledge and the Pursuit of Understanding”

    • David K. Lewis “Elusive Knowledge”

    • G. E. Moore “A Defence of Common Sense”

    • Willard van Orman Quine “Epistemology Naturalized”

    • Richard Rorty “Philosophy and the Mirror of Nature”

    • Bertrand Russell “The Problems of Philosophy”

    • Jason Stanley “Knowledge and Practical Interest”

    • Stephen Stich “The Fragmentation of Reason”

    • Peter Unger “Ignorance: A Case for Scepticism”

    • Timothy Williamson “Knowledge and its Limits”

    Logic

    • Donald Davidson “Truth and Meaning”

    • Gottlob Frege “Begriffsschrift”

    • Kurt Gödel, “On Formally Undecidable Propositions of Principia Mathematica and Related Systems”

    • Saul Kripke, “Semantical Considerations on Modal Logic”

    • Charles Sanders Peirce “How to Make Our Ideas Clear”

    • Alfred Tarski “The Concept of Truth”

    Aesthetics

    • Theodor Adorno “Aesthetic Theory”

    • R.G. Collingwood “The Principles of Art”

    • Arthur C. Danto “After the End of Art”

    • Nelson Goodman “Languages of Art: An Approach to a Theory of Symbols”

    • George Santayana “The Sense of Beauty”

    Metaphysics

    • Aristotle “Metaphysics”

    • D.M. Armstrong “Universals and Scientific Realism”

    • A. J. Ayer “Language, Truth, and Logic”

    • Rudolf Carnap “Empiricism, Semantics, and Ontology”

    • David Chalmers “Constructing the World”

    • John Dewey “Experience and Nature”

    • William James “Pragmatism”

    • Immanuel Kant “Groundwork of the Metaphysics of Morals”

    • James Ladyman, Don Ross, David Spurrett, John Collier “Every Thing Must Go: Metaphysics Naturalized”

    • John McDowell “Mind and World”

    • David Kellogg Lewis “On the Plurality of Worlds”

    • Stephen Mumford “Dispositions”

    • Derek Parfit “Reasons and Persons”

    • Willard Van Orman Quine “Two Dogmas of Empiricism” “On What There Is”

    • Theodore Sider “Writing the Book of the World”

    • Alfred North Whitehead “Process and Reality”

    • Timothy Williamson “Modal Logic as Metaphysics”

    • Ludwig Wittgenstein “Tractatus Logico-Philosophicus” (a.k.a. The Tractatus)

    Philosophy of the mind

    • D. M. Armstrong “A Materialist Theory of the Mind”

    • Peter Carruthers “The Architecture of the Mind”

    • David Chalmers “Philosophy of Mind: Classical and Contemporary Readings” “The Character of Consciousness” “The Conscious Mind: In Search of a Fundamental Theory”

    • Paul Churchland “Matter and Consciousness: A Contemporary Introduction to the Philosophy of Mind”

    • Andy Clark “Supersizing the Mind: Embodiment, Action, and Cognitive Extension”

    • Daniel Dennett “Consciousness Explained”

    • Jaegwon Kim “Philosophy of Mind”

    • Ruth Millikan “Varieties of Meaning”

    • Gilbert Ryle “The Concept of Mind”

    History of philosophy

    Western civilization

    • Bertrand Russell “A History of Western Philosophy”

    Classical philosophy

    • Marcus Aurelius “Meditations””

    • Plato “Symposium” “Parmenides” “Phaedrus”

    Christian and Medieval

    • Augustine of Hippo “Confessions” “The City of God”

    • Anselm of Canterbury “Proslogion”

    Early modern

    • Sir Francis Bacon “Novum Organum”

    • Jeremy Bentham “An Introduction to the Principles of Morals and Legislation”

    • Henri Bergson “Time and Free Will” “Matter and Memory”

    • George Berkeley “Treatise Concerning the Principles of Human Knowledge”

    • Auguste Comte “Course of Positive Philosophy”

    • René Descartes “Principles of Philosophy” “Passions of the Soul”

    • Desiderius Erasmus “The Praise of Folly”

    • Johann Gottlieb Fichte “Foundations of the Science of Knowledge”

    • Hugo Grotius “De iure belli ac pacis”

    • Georg Wilhelm Friedrich Hegel “Phenomenology of Spirit” “Science of Logic” “The Philosophy of Right” “The Philosophy of History”

    • Thomas Hobbes “Leviathan”

    • David Hume “A Treatise of Human Nature” “Four Dissertationss” “Essays, Moral, Political, and Literary” “An Enquiry Concerning Human Understanding” “An Enquiry Concerning the Principles of Morals”

    • Immanuel Kant “A Critique of Pure Reason” “Critique of Practical Reason” “A Critique of Judgement”

    • Søren Kierkegaard “Either/Or” “Fear and Trembling” “The Concept of Anxiety”

    • Gottfried Leibniz “Discourse on Metaphysics” “New Essays Concerning Human Understanding” “Théodicée” “Monadology”

    • John Locke “Two Treatises of Government” “An Essay Concerning Human Understanding”

    • Niccolò Machiavelli “The Prince”

    • Karl Marx “The Communist Manifesto” “Das Kapital”

    • John Stuart Mill “On Liberty “Utilitarianism”

    • John Stuart Mill and Harriet Taylor Mill “The Subjection of Women”

    • Michel de Montaigne “Essays”

    • Friedrich Nietzsche “Thus Spoke Zarathustra” “Beyond Good and Evil” “On the Genealogy of Morals”

    • Blaise Pascal “Pensées”

    • Jean-Jacques Rousseau “Discourse on the Arts and Sciences” “Emile: or, On Education” “The Social Contract”

    • Arthur Schopenhauer “The World as Will and Representation”

    • Henry Sidgwick “The Methods of Ethics”

    • Adam Smith “The Theory of Moral Sentiments” “The Wealth of Nations”

    • Herbert Spencer “System of Synthetic Philosophy”

    • Baruch Spinoza “Ethics” “Tractatus Theologico-Politicus”

    • Max Stirner “The Ego and Its Own”

    • Mary Wollstonecraft “A Vindication of the Rights of Women”

    Contemporary

    Phenomenology and existentialism
    • Simone de Beauvoir “The Second Sex”

    • Albert Camus “Myth of Sisyphus”

    • Martin Heidegger “Being and Time”

    • Edmund Husserl “Logical Investigations” “Cartesian Meditations” “Ideas Pertaining to a Pure Phenomenology and to a Phenomenological Philosophy”

    • Maurice Merleau-Ponty “Phenomenology of Perception”

    • Jean-Paul Sartre, “Being and Nothingness” “Critique of Dialectical Reason”

    Hermeneutics and deconstruction
    • Jacques Derrida “Of Grammatology”

    • Hans-Georg Gadamer “Truth and Method”

    • Paul Ricœur “Freud and Philosophy: An Essay on Interpretation”

    Structuralism and post-structuralism
    • Michel Foucault “The Order of Things”

    • Gilles Deleuze “Difference and Repetition”

    • Gilles Deleuze and Felix Guattari “Capitalism and Schizophrenia”

    • Luce Irigaray “Speculum of the Other Woman”

    • Michel Foucault “Discipline and Punish”

    Critical theory and Marxism
    • Theodor Adorno “Negative Dialectics”

    • Louis Althusser “Reading Capital”

    • Alain Badiou “Being and Event”

    • Jürgen Habermas “Theory of Communicative Action”

    • Max Horkheimer and Theodor Adorno “Dialectic of Enlightenment”

    • Georg Lukacs “History and Class Consciousness”

    • Herbert Marcuse “Reason and Revolution” “Eros and Civilization”

    Eastern civilization

    Chinese philosophy

    • “The Record of Linji”

    • Han Fei “Han Feizi”

    • Kongzi “Analects” “Five Classics”

    • Laozi “Dao De Jing”

    • Mengzi “Mengzi”

    • Sunzi “Art of War”

    • Zhou Dunyi “The Taiji Tushuo”

    • Zhu Xi “Four Books” “Reflections on Things at Hand”

    Indian philosophy

    • “The Upanishads”

    • “The Bhagavad Gita” (“The Song of God”)

    • Aksapada Gautama “Nyaya Sutras”

    • Isvarakrsna “Sankhya Karika”

    • Kanada “Vaisheshika Sutra”

    • Patañjali “Yoga Sutras”

    • Swami Swatamarama “Hatha Yoga Pradipika”

    • Vyasa “Brahma Sutras”

    • Tami “Thiruvalluvar”

    Islamic philosophy

    • Al-Ghazali “The Incoherence of the Philosophers”

    Japanese philosophy

    • Hakuin Ekaku “Wild Ivy”

    • Honen “One-Sheet Document”

    • Kukai “Attaining Enlightenment in this Very Existence”

    • Zeami Motokiyo “Style and Flower”

    • Miyamoto Musashi “The Book of Five Rings”

    • Shinran “Kyogyoshinsho”

    • Dogen Zenji “Shōbōgenzō”

    Philosophy of other disciplines

    Education

    • John Dewey “Democracy and Education”

    • Terry Eagleton “The Slow Death of the University”

    • Paulo Freire “Pedagogy of the Oppressed”

    • Martha Nussbaum “Not for Profit: Why Democracy Needs the Humanities”

    • B.F. Skinner “Walden Two”

    • Charles Weingartner and Neil Postman “Teaching as a Subversive Activity”

    Religion

    • William Lane Craig “The Kalam Cosmological Argument”

    • J. L. Mackie “The Miracle of Theism”

    • Dewi Zephaniah Phillips “Religion Without Explanation”

    • Alvin Plantinga “God and Other Minds” “Is Belief in God Properly Basic”

    • William Rowe “The Evidential Argument from Evil: A Second Look”

    • J. L. Schellenberg “Divine Hiddenness and Human Reason”

    • Richard Swinburne “The Existence of God”

    Science

    • Paul Feyerabend “Against Method: Outline of an Anarchistic Theory of Knowledge”

    • Bas C. van Fraassen “The Scientific Image”

    • Nelson Goodman “Fact, Fiction, and Forecast”

    • Thomas Samuel Kuhn “The Structure of Scientific Revolutions”

    • Larry Laudan “The Demise of the Demarcation Problem”

    • David K. Lewis “How to Define Theoretical Terms”

    • Karl Pearson “The Grammar of Science”

    • Karl Popper “The Logic of Scientific Discovery”

    • Hans Reichenbach “The Rise of Scientific Philosophy”

    Mathematics

    • Alfred North Whitehead and Bertrand Russell “Principia Mathematica”

    • Paul Benacerraf “What Numbers Could not Be” “Mathematical Truth”

    • Paul Benacerraf and Hilary Putnam “Philosophy of Mathematics: Selected Readings”

    • George Boolos “Logic, Logic and Logic”

    • Hartry Field “Science without Numbers: The Defence of Nominalism”

    • Imre Lakatos “Proofs and Refutations”

    • Penelope Maddy “Second Philosophy”

    Physics

    • Aristotle “Physics”

    • Michel Bitbol “Mécanique quantique : Une introduction philosophique” “Schrödinger’s Philosophy of Quantum Mechanics”

    • Chris Isham and Jeremy Butterfield “On the Emergence of Time in Quantum Gravity”

    • Tim Lewens “The Meaning of Science: An Introduction to the Philosophy of Science”

    Computer science

    • Scott Aaronson “Why Philosophers Should Care About Computational Complexity”

    • Judea Pearl “Causality”

    • Ray Turner “The Philosophy of Computer Science” “Computational Artefacts-Towards a Philosophy of Computer Science”

    Neuroscience

    • John Bickle “Revisionary Physicalism” “Psychoneural Reduction of the Genuinely Cognitive: Some Accomplished Facts” “Psychoneural Reduction: The New Wave” ” Philosophy and Neuroscience: A Ruthlessly Reductive Account”

    • Patricia Churchland “Brain-Wise : Studies in Neurophilosophy” “Neurophilosophy : Toward a Unified Science of the Mind-Brain”

    • Carl Craver “Explaining the brain : mechanisms and the mosaic unity of neuroscience”

    • Georg Northoff “Philosophy of the Brain: The brain problem”

    • Henrik Walter “Neurophilosophy of Free Will: From Libertarian Illusions to a Concept of Natural Autonomy”

    Chemistry

    • Jaap van Brakel “Philosophy of Chemistry”

    Biology

    • Daniel C. Dennett “Darwin’s Dangerous Idea”

    • Ruth Garrett Millikan “Language, Thought, and Other Biological Categories”

    • Erwin Schrödinger, What is Life? The Physical Aspect of the Living Cell”

    • Elliott Sober “The Nature of Selection”

    Sociology

    • B. F. Skinner “Science and Human Behavior”

    Psychology

    • Donald Davidson “The Very Idea of a Conceptual Scheme”

    • William James “The Principles of Psychology”

    Economics

    • Kenneth Arrow “Social Choice and Individual Values”

    • Ludwig von Mises “The Ultimate Foundation of Economic Science”

    • Elizabeth S. Anderson “Value in Ethics and Economics”

    Arts and Humanities

    • Bernard Williams “Philosophy as a Humanistic Discipline”

    Art

    • Clive Bell “Art”

    • George Dickie “Art and the Aesthetic”

    Music

    • Roger Scruton “Music as an Art”

    Literature

    • Aristotle “Poetics”

    Language

    • J. L. Austin, “A Plea for Excuses” “How To Do Things With Words”

    • Robert Brandom “Making it Explicit”

    • Stanley Cavell “Must We Mean What We Say?”

    • David Chalmers “Two Dimensional Semantics”

    • Cora Diamond “What Nonsense Might Be”

    • Michael Dummett “Frege: Philosophy of Language”

    • Gottlob Frege “On Sense and Reference”

    • H. P. Grice “Logic and Conversation”

    • Saul Kripke “Naming and Necessity”

    • David K. Lewis “General Semantics”

    • Willard Van Orman Quine “Word and Object”

    • Bertrand Russell “On Denoting”

    • John Searle “Speech Acts”

    • Ludwig Wittgenstein “Philosophical Investigations”

    History

    • R.G. Collingwood “The Idea of History”

    • Karl Löwith “Meaning in History: The Theological Implications of the Philosophy of History”

    Medicine

    • Mario Bunge “Medical Philosophy: Conceptual Issues in Medicine”

    • R. Paul Thompson and Ross E. G. Upshur “Philosophy of Medicines”

    Law

    • Ronald Dworkin “Law’s Empire”

    • John Finnis “Natural Law and Natural Rights”

    • Lon L. Fuller “The Morality of Law”

    • H.L.A. Hart “The Concept of Law”

    Politics

    • Aristotle “Politics”

    • Isaiah Berlin “Two Concepts of Liberty”

    • Robert Nozick “Anarchy, State, and Utopia”

    • Plato “Republic”

    • Karl Popper “The Open Society and Its Enemies”

    • John Rawls “A Theory of Justice”

    • Michael Sandel “Liberalism and the Limits of Justice”