A Deep Learning Overview with Python

This course proposes a quick introduction to deep learning and two of its major networks, convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The purpose is to give an intuitive sense of how to implement deep learning approaches for various tasks. To use this iPython notebook, run the python code in separate files for each cell. The content below each cell of this notebook is the output for running those cells.

Simple perceptron

In [1]:
import numpy as np

# sigmoid function
def sigmoid(x,deriv=False):
    if(deriv==True):
        return x*(1-x)
    return 1/(1+np.exp(-x))
    
# input dataset
X = np.array([[0,0,1],
              [0,1,1],
              [1,0,1],
              [1,1,1]])
    
# output dataset            
y = np.array([[0,0,1,1]]).T

# seed random numbers to make calculation
# deterministic (just a good practice)
np.random.seed(1)

# initialize weights randomly with mean 0
syn0 = 2*np.random.random((3,1)) - 1

for j in range(100000):

    # forward propagation
    l0 = X
    l1 = sigmoid(np.dot(l0,syn0))

    # how much did we miss?
    l1_error = y - l1
    if (j% 10000) == 0:
        print("Error:" + str(np.mean(np.abs(l1_error))))

    # multiply how much we missed by the 
    # slope of the sigmoid at the values in l1
    l1_delta = l1_error * sigmoid(l1,True)

    # update weights
    syn0 += np.dot(l0.T,l1_delta)

print()
print("Prediction after Training:")
print(l1)
Error:0.517208275438
Error:0.00795484506673
Error:0.0055978239634
Error:0.00456086918013
Error:0.00394482243339
Error:0.00352530883742
Error:0.00321610234673
Error:0.00297605968522
Error:0.00278274003022
Error:0.0026227273927

Prediction after Training:
[[ 0.00301758]
 [ 0.00246109]
 [ 0.99799161]
 [ 0.99753723]]

What is the loss function here? How is it calculated?

Any idea how it would perform on non-linearly separable data? How could we test it?

Multilayer perceptron

Let’s use the fact that the sigmoid is differenciable (while the step function we saw in the slides is not). This allows us to add more layers (hence more modelling power).

In [2]:
import numpy as np

def sigmoid(x,deriv=False):
	if(deriv==True):
	    return x*(1-x)

	return 1/(1+np.exp(-x))
    
X = np.array([[0,0,1],
              [0,1,1],
              [1,0,1],
              [1,1,1]])
                
y = np.array([[0],
			  [1],
			  [1],
			  [0]])

np.random.seed(1)

# randomly initialize our weights with mean 0
syn0 = 2*np.random.random((3,4)) - 1
syn1 = 2*np.random.random((4,1)) - 1

for j in range(100000):

	# Feed forward through layers 0, 1, and 2
    l0 = X
    l1 = sigmoid(np.dot(l0,syn0))
    l2 = sigmoid(np.dot(l1,syn1))

    # how much did we miss the target value?
    l2_error = y - l2
    
    if (j% 10000) == 0:
        print("Error:" + str(np.mean(np.abs(l2_error))))
        
    # in what direction is the target value?
    # were we really sure? if so, don't change too much.
    l2_delta = l2_error*sigmoid(l2,deriv=True)

    # how much did each l1 value contribute to the l2 error (according to the weights)?
    l1_error = l2_delta.dot(syn1.T)
    
    # in what direction is the target l1?
    # were we really sure? if so, don't change too much.
    l1_delta = l1_error * sigmoid(l1,deriv=True)

    syn1 += l1.T.dot(l2_delta)
    syn0 += l0.T.dot(l1_delta)
    
print()
print(l2)
Error:0.496410031903
Error:0.00858452565325
Error:0.00578945986251
Error:0.00462917677677
Error:0.00395876528027
Error:0.00351012256786
Error:0.00318350238587
Error:0.00293230634228
Error:0.00273150641821
Error:0.00256631724004

[[ 0.00199094]
 [ 0.99751458]
 [ 0.99771098]
 [ 0.00294418]]

Setting up the environment

We have done toy examples for feedforward networks. Things quickly become complicated, so let’s go deeper by relying on high-level frameworks: TensorFlow and Keras. Most technicalities are thus avoided so that you can directly play with networks.

In [ ]:
!conda install tensorflow keras
In [3]:
import tensorflow as tf
import keras
/Users/syedather/.local/lib/python3.6/site-packages/matplotlib/__init__.py:1067: UserWarning: Duplicate key in file "/Users/syedather/.matplotlib/matplotlibrc", line #2
  (fname, cnt))
Using TensorFlow backend.
In [4]:
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
b'Hello, TensorFlow!'

CNNs

We are going to use the MNIST dataset for our first task. The code below loads the dataset and shows one training example and its label.

In [5]:
from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
from pylab import *

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

print("The first training instance is labeled as: "+str(y_train[0]))
The first training instance is labeled as: 5
In [6]:
figure(1)
imshow(x_train[0], interpolation='nearest')
Out[6]:
<matplotlib.image.AxesImage at 0x1259b2320>

Now study the following code. What is the network we use? How many layers? What hyper parameters?

In [7]:
# Setup some hyper parameters
batch_size = 128
num_classes = 10
epochs = 15

# input image dimensions
img_rows, img_cols = 28, 28

# This is some technicality regarding Keras' dataset
if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

# We convert the matrices to floats as we will use real numbers
x_train = x_train.astype('float32')[:1000]
x_test = x_test.astype('float32')[:200]
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)[:1000]
y_test = keras.utils.to_categorical(y_test, num_classes)[:200]


# Build network
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
# model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adam(),
              metrics=['accuracy'])

# Train
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))

# Evaluate on test data
score = model.evaluate(x_test, y_test, verbose=0)
print()
print('Test loss:', score[0])
print('Test accuracy:', score[1])

# Evaluate on training data
score = model.evaluate(x_train, y_train, verbose=0)
print()
print('Train loss:', score[0])
print('Train accuracy:', score[1])
x_train shape: (1000, 28, 28, 1)
1000 train samples
200 test samples
Train on 1000 samples, validate on 200 samples
Epoch 1/15
1000/1000 [==============================] - 4s 4ms/step - loss: 1.7244 - acc: 0.5660 - val_loss: 0.9116 - val_acc: 0.7900
Epoch 2/15
1000/1000 [==============================] - 4s 4ms/step - loss: 0.5967 - acc: 0.8320 - val_loss: 0.5148 - val_acc: 0.8100
Epoch 3/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.4394 - acc: 0.8670 - val_loss: 0.3056 - val_acc: 0.8600
Epoch 4/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.3296 - acc: 0.9050 - val_loss: 0.3263 - val_acc: 0.9000
Epoch 5/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.2205 - acc: 0.9360 - val_loss: 0.2092 - val_acc: 0.9200
Epoch 6/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.1684 - acc: 0.9560 - val_loss: 0.1870 - val_acc: 0.9450
Epoch 7/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.1325 - acc: 0.9690 - val_loss: 0.1597 - val_acc: 0.9350
Epoch 8/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.0990 - acc: 0.9740 - val_loss: 0.1617 - val_acc: 0.9400
Epoch 9/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.0636 - acc: 0.9840 - val_loss: 0.1434 - val_acc: 0.9450
Epoch 10/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.0393 - acc: 0.9960 - val_loss: 0.1545 - val_acc: 0.9400
Epoch 11/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.0267 - acc: 0.9950 - val_loss: 0.1444 - val_acc: 0.9400
Epoch 12/15
1000/1000 [==============================] - 4s 4ms/step - loss: 0.0158 - acc: 1.0000 - val_loss: 0.1642 - val_acc: 0.9350
Epoch 13/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.0090 - acc: 1.0000 - val_loss: 0.1475 - val_acc: 0.9450
Epoch 14/15
1000/1000 [==============================] - 4s 4ms/step - loss: 0.0057 - acc: 1.0000 - val_loss: 0.1556 - val_acc: 0.9350
Epoch 15/15
1000/1000 [==============================] - 4s 4ms/step - loss: 0.0041 - acc: 1.0000 - val_loss: 0.1651 - val_acc: 0.9350

Test loss: 0.165074422359
Test accuracy: 0.935

Train loss: 0.00311407446489
Train accuracy: 1.0

Is there anything wrong here?

How do you think a linear classifier performs?

In [8]:
# Setup some hyper parameters
batch_size = 128
num_classes = 10
epochs = 15

# input image dimensions
img_rows, img_cols = 28, 28

# This is some technicality regarding Keras' dataset
if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

# We convert the matrices to floats as we will use real numbers
x_train = x_train.astype('float32')[:1000]
x_test = x_test.astype('float32')[:200]
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)[:1000]
y_test = keras.utils.to_categorical(y_test, num_classes)[:200]


# Build network
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adam(),
              metrics=['accuracy'])

# Train
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))

# Evaluate on test data
score = model.evaluate(x_test, y_test, verbose=0)
print()
print('Test loss:', score[0])
print('Test accuracy:', score[1])

# Evaluate on training data
score = model.evaluate(x_train, y_train, verbose=0)
print()
print('Train loss:', score[0])
print('Train accuracy:', score[1])
x_train shape: (1000, 28, 28, 1)
1000 train samples
200 test samples
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-a1470fe28059> in <module>()
     53           epochs=epochs,
     54           verbose=1,
---> 55           validation_data=(x_test, y_test))
     56 
     57 # Evaluate on test data

~/anaconda3/lib/python3.6/site-packages/keras/models.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
    961                               initial_epoch=initial_epoch,
    962                               steps_per_epoch=steps_per_epoch,
--> 963                               validation_steps=validation_steps)
    964 
    965     def evaluate(self, x=None, y=None,

~/anaconda3/lib/python3.6/site-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1628             sample_weight=sample_weight,
   1629             class_weight=class_weight,
-> 1630             batch_size=batch_size)
   1631         # Prepare validation data.
   1632         do_validation = False

~/anaconda3/lib/python3.6/site-packages/keras/engine/training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size)
   1478                                     output_shapes,
   1479                                     check_batch_axis=False,
-> 1480                                     exception_prefix='target')
   1481         sample_weights = _standardize_sample_weights(sample_weight,
   1482                                                      self._feed_output_names)

~/anaconda3/lib/python3.6/site-packages/keras/engine/training.py in _standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
    111                         ': expected ' + names[i] + ' to have ' +
    112                         str(len(shape)) + ' dimensions, but got array '
--> 113                         'with shape ' + str(data_shape))
    114                 if not check_batch_axis:
    115                     data_shape = data_shape[1:]

ValueError: Error when checking target: expected dense_4 to have 2 dimensions, but got array with shape (1000, 10, 10)

Let’s use this model to predict a value for the first training instance we vizualized.

In [ ]:
print(model.predict(np.expand_dims(x_train[0], axis=0)))

Is the model correct here? What is the output of the network?

RNNs

We will now switch to RNNs. These require more resources, so we can’t do the fanciest applications during the workshop. We will do some sentiment classification of movie reviews.

In [9]:
from __future__ import print_function
import numpy as np
import keras
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding, LSTM, Bidirectional
from keras.datasets import imdb

# Number of considered words, based on frequencies
max_features = 20000
# cut texts after this number of words
maxlen = 100
batch_size = 32

print('Loading data...')
(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=max_features, index_from=3)

# This is just for pretty printing the sentences...
word_to_id = keras.datasets.imdb.get_word_index()
word_to_id = {k:(v+3) for k,v in word_to_id.items()}
word_to_id["<PAD>"] = 0
word_to_id["<START>"] = 1
word_to_id["<UNK>"] = 2
id_to_word = {value:key for key,value in word_to_id.items()}

print("Here's the input for the first training instance:")
print(' '.join(id_to_word[id] for id in x_train[0] ))
Loading data...
Downloading data from https://s3.amazonaws.com/text-datasets/imdb.npz
17465344/17464789 [==============================] - 2s 0us/step
Downloading data from https://s3.amazonaws.com/text-datasets/imdb_word_index.json
1646592/1641221 [==============================] - 0s 0us/step
Here's the input for the first training instance:
<START> this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert <UNK> is an amazing actor and now the same being director <UNK> father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for retail and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also congratulations to the two little boy's that played the <UNK> of norman and paul they were just brilliant children are often left out of the praising list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all

What do you think about this text? Is it a positive or negative review?

In [10]:
print("Here are the dataset shapes")
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

print("And the input for the first instance is represented as:")
print(x_train[0])
Here are the dataset shapes
25000 train sequences
25000 test sequences
And the input for the first instance is represented as:
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]

What do these numbers represent? Is there any limitation you can imagine coming from this?

In [11]:
print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)[:5000]
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)[:5000]
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)
y_train = np.array(y_train)[:5000]
y_test = np.array(y_test)[:5000]

model = Sequential()
model.add(Embedding(max_features, 128, input_length=maxlen))
model.add(Bidirectional(LSTM(64)))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=4,
          validation_data=[x_test, y_test])
Pad sequences (samples x time)
x_train shape: (5000, 100)
x_test shape: (5000, 100)
Train...
Train on 5000 samples, validate on 5000 samples
Epoch 1/4
5000/5000 [==============================] - 54s 11ms/step - loss: 0.6032 - acc: 0.6570 - val_loss: 0.4283 - val_acc: 0.8056
Epoch 2/4
5000/5000 [==============================] - 54s 11ms/step - loss: 0.2761 - acc: 0.8918 - val_loss: 0.4403 - val_acc: 0.7948
Epoch 3/4
5000/5000 [==============================] - 61s 12ms/step - loss: 0.1101 - acc: 0.9670 - val_loss: 0.6366 - val_acc: 0.8026
Epoch 4/4
5000/5000 [==============================] - 56s 11ms/step - loss: 0.0478 - acc: 0.9868 - val_loss: 0.6637 - val_acc: 0.7954
Out[11]:
<keras.callbacks.History at 0x1392d76d8>
In [12]:
print("The neural net predicts that the first instance sentiment is:")
print(model.predict(np.expand_dims(x_train[0], axis=0)))
The neural net predicts that the first instance sentiment is:
[[ 0.99445081]]

Remarks? Comments?

How do the training scores compare to the test scores? How can we improve this? What are the current limitations?

This RNN use case takes more time to train but it is definitely more impressive. We will model the language, by training on a novel. For each (set of) word(s) in the novel, the objective is to predict the following word. This can be done on any text, and we don’t need annotated data – the text itself is enough.

Have a look at the following piece of code and try to understand what it does. Then, run it and see the network generating text! At first, the output is not meaningful, but it becomes so over time. This is the magic I was referring to.

Beware: this will take longer to run on a CPU. A GPU is recommended, but you can still try to run it for a while to see the predictions evolve. On my laptop, an epoch takes 6mins so the full training takes 6hrs. About 20 epochs are required for the generated text to be somewhat meaningful.

Note, however, that although this seems long, training actual deep learning models for concrete tasks takes days, even on multiple GPUs. This is mostly because of the data size and the much deeper networks.

In [ ]:
from __future__ import print_function
from keras.callbacks import LambdaCallback
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import LSTM
from keras.optimizers import RMSprop
from keras.utils.data_utils import get_file
import numpy as np
import random
import sys
import io

# We load a text from Nietzsche
path = get_file('nietzsche.txt', origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
with io.open(path, encoding='utf-8') as f:
    text = f.read().lower()
print('corpus length:', len(text))

# We create dictionaries of character > index and the other way around
chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

# cut the text in semi-redundant sequences of maxlen characters
maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('nb sequences:', len(sentences))

print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1


# build the model: a single LSTM
print('Build model...')
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))

optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)


def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


def on_epoch_end(epoch, logs):
    # Function invoked at end of each epoch. Prints generated text.
    print()
    print('----- Generating text after Epoch: %d' % epoch)

    start_index = random.randint(0, len(text) - maxlen - 1)
    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print('----- diversity:', diversity)

        generated = ''
        sentence = text[start_index: start_index + maxlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

        for i in range(400):
            x_pred = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.

            preds = model.predict(x_pred, verbose=0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]

            generated += next_char
            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

model.fit(x, y,
          batch_size=128,
          epochs=60,
          callbacks=[print_callback])
Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
606208/600901 [==============================] - 0s 0us/step
corpus length: 600893
total chars: 57
nb sequences: 200285
Vectorization...
Build model...
Epoch 1/60
200285/200285 [==============================] - 281s 1ms/step - loss: 1.9553

----- Generating text after Epoch: 0
----- diversity: 0.2
----- Generating with seed: "to
agree with many people. "good" is no "
to
agree with many people. "good" is no and it is the the of the same the of the sention of the strenge of the most the self-our of the inderent that the sensive indeed the one of the constitute of the most of the semple of the desire of the sensive of the most of the semple of the sempathy of the one of the into the every to a soul of the some of the persent the free of the semple of the most of the sention of the of the spiritual the 
----- diversity: 0.5
----- Generating with seed: "to
agree with many people. "good" is no "
to
agree with many people. "good" is no may a suptimes and also orage mankind the one of indeed of one streng the possible the sensition and the inderenation of a sul the in a sould be the orting a solitiarity of religions in a man of such and a scient, in every of and the self-to and of a revilued it is the most in the indeed, and it is assual that the ord of the of the distiture in its all the manter of the soul permans the decours of
----- diversity: 1.0
----- Generating with seed: "to
agree with many people. "good" is no "
to
agree with many people. "good" is no causest and hew the fown of every groktulr
destined a the art it noteriness of one it all and
and cothinded of that rendercaterfroe to doe," in the pational the is the onl yutre
allor upitsoon,--one
viburan mused a "master in the that niver if
a pridicle quesiles of
the shoold enss nowxing to
feef ma.t--wute disequerly that then her rewadd finale the eeblive alse rusurefver" a selovery catte he re
----- diversity: 1.2
----- Generating with seed: "to
agree with many people. "good" is no "
to
agree with many people. "good" is no likeurenes, it is novamentstisuser'stone, indos paces. fund, wethel feel the
que let doee new eveny that is that the catel. thotgy is
within ceoks of theregeritades) and itwas brutmes ageteron
clyrelogilabl freephi; its. by an? andaver happ
one of his absuman artificss? itself old a
ooker himsood and bus hray
fined in smuch is sudtirers of rerarder from and
afutty
mest utfered with to "bewnook one
Epoch 2/60
 81664/200285 [===========>..................] - ETA: 2:37 - loss: 1.6395

Web Scraping with Python Made Easy

Imagine you run a business selling shoes online and wanted to monitor how your competitors price their products. You could spend hours a day clicking through page after page or write a script for a web bot, an automated piece of software that keeps track a site’s updates. That’s where web scraping comes in.

Scraping websites lets you extract information from hundreds or thousands of webpages at once. You can search websites like Indeed for job opportunities or Twitter for tweets. In this gentle introduction to web scraping, we’ll go over the basic code to scrape websites such that anyone, regardless of background, can extract and analyze these kinds of results.

Getting Started

Using my GitHub repository on web scraping, you can install the software and run the scripts as instructed. Click on the src directory on the repository page to see the README.md file that explains each script and how to run them.

Examining the Site

You can use a sitemap file to located where websites upload content without crawling every single web page. Here’s a sample one. You can also find out how large a site is and how much information you can actually extract from it. You can search a site using Google’s Advanced Search to figure out how many pages you may need to scrape. This will come in handy when creating a web scraper that may need to pause for updates or act in a different manner after reaching a certain number of pages.

You can also run the identify.py script in the src directory to figure out more information bout how each site was built. This should give info about the frameworks, programming languages, and servers used in building each website as well as the registered owner for the domain. This also uses robotparser to check for restrictions.

Many websites have a robots.txt file with crawling restrictions. Make sure you check out this file for a website for more information about how to crawl a website or any rules that you should follow. The sample protocol can be found here.

Crawling a Site

There are three general approaches to crawling a site: Crawling a sitemap, Iterating through an ID for each webpage, and following webpage links. download.py shows how to download a webpage with methods of sitemap crawling, results.py shows you how to scrape those results while iterating through webpage IDs, and indeedScrape.py uses the webpage links for crawling. download.py also contains information on inserting delays, returning a list of links from HTML, and supporting proxies that can let you access websites through blocked requests.

Scraping the Data

In the file compare.py, you can compare the efficiency of the three web scraping methods.

You can use regular expressions (known as regex or regexp) to perform neat tricks with text for getting information from websites. The script regex.py shows how this is done.

You can also use the browser extension Firebug Lite to get information from a webpage. In Chrome, you can click View >> Developer >> View Source to get the source behind a webpage.

Beautiful Soup, one of the requried packages to run indeedScrape.py, parses a webpage and provides a convenient interface to navigate the content, as shown in bs4test.py. Lxml also does this in lxmltest.py. A comparison of these three scraping methods are in the following table.

Scraping methodPerformanceEase of useEase of install
RegexFastHardEasy
Beautiful SoupSlowEasyEasy
lxmlFastEasyHard

The callback.py script lets you scrape data and save it to an output .csv file.

Caching Downloads

Caching crawled webpages lets you store them in a manageablae format while only having to download them once. In download.py, there’s a python class Downloader that shows how to cache URLs after downloading their webpages. cache.py has a python class that maps a URL to a filename when caching.

Depending on which operating system you’re using, there’s a limit to how much you can cache.

Operating systemFile systemInvalid filename charactersMax filename length
LinuxExt3/Ext4/, \0255 bytes
OS X HFS Plus:, \0255 UT-16 code units
WindowsNTFS \, /, ?, :, *, >, <, |255 characters

Though cache.py is easy to use, you can take the hash of the URL itself to use as the filename to ensure your files directly map to the URLs of the saved cache. Using MongoDB, you can build ontop of the current file system database system and avoid the file system limitations. This method is found in mongocache.py using pymongo, a Python wrapper for MongoDB.

Test out the other scripts such as alexacb.py for downloading information on the top sites by Alexa ranking. mongoqueue.py has functionality for queueing the MongoDB inquiries that can be imported to other scripts.

You can work with dynamic webpages using the code from browserrender.py. The majority of leading websites using JavScript for functionality, meaning you can’t view all their content in barebones HTML.

A Comparison of Copper in the U.S.

As humanity’s oldest metal, copper comes in many forms. People have used copper for thousands of years. When the ancient Romans mined the element “cyprium” from Cyprus, the metal soon became known in English as “copper.” 

Copper is produced and consumed in many forms, from the lining of electrical motors to the coating of pennies. Thanks to its high thermal and electrical conductivity, the material is frequently used in telecommunication technologies and as a building material.

The process of copper production includes mining, refining, smelting, and electrowinning. Through smelting and electrolytic refining, engineers and scientists transform mined ores to copper cathodes. Cathodes are thin sheets of pure copper used as raw material for processing the metal into high-quality products. 

Using data available to the public from the U.S. Geological Survey, the copper market has changed to society’s needs over the past years. 

The four major types of copper are mined copper, secondary copper, refined copper and refined electrowon copper. Secondary copper comes from recycled and scrap materials such as tubes, sheets, cables, radiators and castings, as well as from residues like dust or slag. 

Engineers and scientists transform mined pure copper metal and copper from concentrated low-grade ores through smelting and electrolytic refining in creating copper cathodes. Acid leaching of oxidized ores produces more copper.

Thanks to the chemical and physical properties of copper, the material is suitable for electrical and thermal conductivity. Copper’s high ductility and malleability give it key roles in industrial applications of coil wining, power transmission and generation and telecommunication technologies.

The different methods of processing copper have remained constant for the most part between 1990 and 2010. The data is from “U.S. Mineral Dependence—Statistical Compilation of U.S. and World Mineral Production, Consumption, and Trade, 1990–2010” by James J. Barry, Grecia R. Matos and W. David Menzie. The rise in refined copper reflects market trends for the rising demand for refined copper, according to a report in Mining.com. Oxide and sulfur ores generally have between 0.5 and 2.0% copper. The process involves concentrating the ore to remove gangue and other materials.


Differences between reported and apparent processed copper consumption in the U.S. have decreased from 2005 to 2009. Copper consumption itself has dropped.

The various types of copper produced by the U.S. have remained constant over the time period. 

Mined copper has remained the dominant copper produced around the world, though refined copper has come close or equal to it from 1996 to 2001. Refined electrowon copper has steadily surpassed secondary copper over the time period, too. 

The epistemology and metaphysics of causality

The epistemology of causality

There are two epistemic approaches to causal theory. Under a hypothetico-deductive account, we hypothesize causal relationships and deduce predictions based on them. We test these hypotheses and predictions by comparing empirical phenomena and other knowledge and information on what actually happens to these theories. We may also take an inductive approach in which we make a large number of appropriate, justified observations (such as a set of data) from which we can induce causal relationships directly from them.

Hypothetico-Deductive discovery

The testing phase of this account of discovery and causality uses the views on the nature of causality to determine whether we support or refute the hypothesis. We search for physical processes underlying the causal relationships of the hypothesis. We can use statistics and probability to determine which consequences of hypotheses are verified, like comparing our data to a distribution such as a Gaussian or Dirichlet one. We can further probe these consequences on a probabilistic level and show that changing hypothesized causes can predict, determine, or guarantee effects.

Philosopher Karl Popper advocated this approach for causal explanations of events that consist of natural laws, which are universal statements about the world. He designated initial conditions, single-case statements, from which we may deduce outcomes and form predictions of various events. These case initial conditions call for effects that we can determine, such as whether a physical system will approach thermodynamic equilibrium or how a population might evolve under the influence of predators or external forces. Popper delineated the method of hypothesizing laws, deducing their consequences, and rejecting laws that aren’t supported as a cyclical process. This is the covering-law account of causal explanation.

Inductive learning

Philosopher Francis Bacon promoted the inductive account of scientific learning and reasoning. From a very high number of observations of some phenomenon or event with experimental, empirical evidence where it’s appropriate, we can compile a table of positive instances (in which a phenomenon occurs), negative instances (it doesn’t occur), and partial instances (it occurs to a certain degree). This gives a multidimensionality to phenomena that characterize causal relationships from both a priori and a posterior perspectives.

Inductivist artificial intelligence (AI) approaches have in common the feature that causal relationships can be determined from statistical relationships. We assume the Causal Markov condition holds of physical causality and physical probability. This Causal Markov Condition plays a significant deterministic role in the various features of the model and the events or phenomena it predicts. A causal net must have the Causal Markov Condition as an assumption or premise. For structural equation models (SEM), Causal Markov Conditions result from representations of each variable as a function of its direct causes and an associated error variable with it. We assume probabilistic independence of each error variable. We then find the class of causal models or a single best causal model with probabilistic independences that are justified by the Causal Markov Condition. They should be consistent with independences we can infer from the data, and we might also make further assumptions about the minimality (no submodel of the causal model also satisfied the Causal Markov Condition), faithfulness (all independences in the data are implied via the Causal Markov Condition), linearity (all variables are linear functions of their direct causes and uncorrelated error variables). We may also define causal sufficiency, whether all common causes of measured variables are measured, and context generality, every individual or node in the model has causal relations of the population. These two features let us describe models and methods of scientific reasoning as causal in nature and, from there, we may apply appropriate causal models such as Bayesian, frequentist, or similar methods of prediction. We may even illustrate a causal diagram or model elements under various conditions such as those given by independence or constraints on variables.

This way, in the intercorrelatedness of the graph or model, we can’t change the value of a variable without affecting the way it relates to other variables, but there may conditions in which we construct models that have autonomous nodes or variables. The way these features and claims of inductivist AI interact with another is subject to debate by the underlying assumptions, justification, and methods of reasoning behind these models.

Metaphysics of causality

We can pose questions about the mathematization of causality even with the research and methods that have dominated the work on probability and its consequences. We can speculate what causality is and the opinions on the nature of causality as they relate to the axioms and definitions that have remained stable in the theories of probability and statistics.

We can elaborate three types of causality approaches. The first is that causality is only a heuristic and has no role in scientific reasoning and discourse, as philosopher Bertrand Russel argued. Science depends upon functional relationships, not causal laws. The second position is that causality is a fundamental feature of the world, a universal principle. We should, therefore, treat it as a scientific primitive. This position evolved out of conflict with purported philosophical analyses that appealed to asymmetry of time (that it moves in one direction) to explain the asymmetry of causation (that they move in one direction and one direction only). This raises concerns of how to interpret time in terms of causality. The third is we can reduce causal relations to other concepts that don’t involve causal notions. Many philosophers support this position, and, as such, there are four divisions within this position.

The first schism we discuss is that causality is a relation between variables that are single-case or repeatable according to the interpretation of causality in question. We interpret causality as a mental in nature given that causality is a feature of an agent’s epistemic state and physical if it’s a feature of the external world. We interpret it as subjective if two agents with the same relevant knowledge can disagree on a conclusion of the relationships with both positions correct, as though they were a matter of arbitrary choice. Otherwise we interpret it as objective. The subjective-objective schism raises issues between how different positions would be regarded as correct and what determines the subjective element or role subjectivity plays in these two different positions.

The second partition is the mechanistic account of causality – that physical processes link cause and effect. We interpret causal statements as giving information about these processes. Philosophers Wesley Salmon and Phil Dowe advocate this position as they argue causal processes transmit or have a conserved physical quantity to them. We may describe the relation between energy and mass (E = mc²) as causal relations from start (cause) to a finish (effect). One may argue against this position on the grounds that these relations in science have no specific direction one way or another and are symmetrical and not subject to causality. It does, however, relate single cases linked by physical processes even if we can induce causal regularities or laws from these connections in an objective manner. If two people disagree on the causal connections, one or both are wrong.

This approach is difficult to apply. The physics of these quantities aren’t determined by the causal relations themselves. The conservation of these physical quantities may suggest causal links to physicists, they aren’t relevant in the fields that emerge from physics such as chemistry or engineering. This would lead one to believe the epistemology of the causal concepts are irrelevant to their metaphysics. If this were the case, the knowledge of a causal relationship would have little to do with the causal connection itself.

The third subdivision is probabilistic causality in which we treat causal connections with probabilistic relationships of variables. We can debate which probabilistic relationships among variables of probabilistic causality determine or create causal relationships. One might say the Principle of Common Cause (if two variables are probabilistically dependent, then one causes the other or they’re effects of common causes that make them independent from one another). Philosopher Hans Reichenbach applied this to causality to provide a probabilistic analysis of time in its single direction. More recent philosophers use the Causal Markov Condition as a necessary condition for causality with other less central conditions. We normally apply probabilistic causality to repeatable variables such that probability handles them, but critics may argue the Principle of the Common Cause and the Causal Markov Conditions have counterexamples showing they don’t hold in under all conditions.

Finally, the fourth subclass is the counterfactual account, as advocated by philosopher David Lewis. In this way, we reduce causal relations to subjunctive conditions such that an effect depends causally on a cause if and only iff (1) if the cause were to occur, then the effect would occur (or its chance to occur would raise significantly) and (2) if the cause didn’t occur then the effect wouldn’t occur. The transitive closure of the Causal Depedendence (that a cause will either increase the probability of a direct effect or, if it’s a preventative, make the effect less likely, as long as the effect’s other direct causes are held fixed) holds. The causal relationships are what goes on in possible worlds that are similar to our own. Lewis introduced counterfactual theory to account of the causal relationships between single-case events and causal relationships that are mind-independent and objective. We may still press this account by arguing that we have no physical contact with these possible worlds or that there isn’t an objective way to determine which worlds are closest to our own or which worlds we should follow and analyze in determining causality. The counterfactualist may respond that the worlds we choose are the ones in which the cause-and-effect relationship occurs as closer to our own world and, from there, determine which appropriate world is closest to our own.

Contextual Emergence

What is contextual emergence?

The patterns that emerge from Conway’s Game of Life do so depending on the underlying theory.

Contextual emergence is a specific kind of relationship between different domains of scientific descriptions of particular phenomena. Although these domains are not ordered strictly hierarchically, one often speaks of lower and higher levels of description in which emergence occurs. From the lower levels (L), more fundamental in a certain sense, phenomena emerge in higher levels (H) in more complex phenomena. Strings of DNA in a genome may correspond to different transcripts on an transcriptome level for an individual. Chaotic conditions may emerge from certain differential equations subject to certain constraints. This complexity depends on the conditions of the context. Hence, contextual emergence.

Contextual emergence involves well-defined relationships between different levels of complexity. We can use a two-step procedure to create a systematic, formal way that an individual description (Li) creates a statistical description (Ls) among the lower level. This process can lead us to describe individuals at a higher level (Hi). We iterate this process (Li -> Ls -> Hi) through sets of descriptions connected with one another to reveal what emerges at higher levels.

During this method, we identify equivalence classes of individual states that are indistinguishable with respect to a certain property of the entire system. We can realize different statistical states in Ls by individual states in Li. Each state has limited knowledge, but, together, we can create probability distributions represent the statistical states Ls. This could be how spike signals from neural circuits encode for higher-level functions in the brain.

A property dualist position would also recognize three features of this emergence. The emergent property at the higher level Hi must have real instances, remain co-occurrent with some property or complex feature recognized in the lower level, and this property can’t be reduced to any property postulated by or definable within the lower level.

Then, we can assign individual states at the higher level H to coextensional statistical states at level L. We use a top-down constraint. This needs information about the higher description to choose a context setting the framework for the set of observable properties at level H created from L. We can implement stability criteria at level L such that the appropriate context emerges at level H. The stability refers to the ability for the features of the system to remain valid even under small changes. This includes equilibrium states of gas systems and homeostatic relationships between units of biological mechanisms such as glycolysis. We may also define stability as systems that have boundaries maintained under the dynamics specified for it We may choose to confine ourselves to certain electrochemical properties that emerge from membrane dynamics in synaptic networks. This allows the emergent properties to remain well-defined from the contextual topology of L. It also tells us which properties of L are relevant to the contextual emergence of H.

This interplay between upward and downward strategies lets the system remain self-consistent. Moving from a higher context to a lower one requires the stability conditions to lead to lower-level partitions of the system while moving to a higher context means the statistics of lower-level states extend to higher-level individual states we can observe.

Philosopher Aristotle argued emergent structures arise when their constituents interact in an interdependent manner, but others may argue that emergence may occur even if the parts act independently of one another or even be autonomous. In either case, to echo the theory of Gestalt, the whole is greater than the sum of its parts.

Point mechanics to statistical mechanics to thermodynamics

We can even demonstrate the relationship between different fields of science through contextual emergence. Moving from classical point mechanics, involving forces due to gravitational effects and electromagnetism, to statistical mechanics to thermodynamics illustrates this phenomena. From point mechanics to statistical mechanics particles or other individual units (Li) form ensemble distributions which can be studied using statistics. We can define many-particle systems with statistical ensemble descriptions (Ls) like momenta or energy of distributions, such as the Maxwell-Boltzmann distribution for N particles. From there, we can find mean kinetic energy, Gibbs free energy, entropy, and other statistical quantities.

We can observe expectation values of momenta distributions of particle ensembles to calculate temperature of the system as a higher-level function (Hi) on the assumption the system is in equilibrium. The zeroth law of thermodynamics does not come from statistical mechanics, but from thermodynamics. Other features such as irreversibility and adiabatic nature emerge as well. We can characterize this thermal equilibrium (Hi) using Kubo-Martin-Schwinger (KMS) states, defined by the condition that characterizes the structural stability of a KMS state against local perturbations or changes. This leads to stationarity, ergodicity, and mixing using the zeroth law of thermodynamics to define the system as stable. We can also use the second law of thermodynamics to express the stability in maximization of entropy for thermal equilibrium states.

The first step of the contextual emergence process (Li -> Ls) describes statistical states from the individual states, and the second gives individual thermal states from statistical mechanical states. Other examples may include emergence of geometric optics from electrodynamics, electrical engineering features from electrodynamics, chirality from quantum mechanics, and diffusion or friction of a quantum particle in a thermal medium. Neuroscientists have even found use in contextually emerging cognitive states from neural correlates.

Hodgkin-Huxley equations

The Hodgkin-Huxley equations that describe generation and propagation of action potential form a system of four ordinary nonlinear differential equations: an electric conductance equation for transmemberane currents and three master equations for the opening kinetics of sodium and potassium channels. These lower-level stochastic (using Markov processes as transition probabilities) phenomena lead to higher-level descriptions of ion channel function to characterize a deterministic dynamic system. We can treat ion channels as macro-molecular quantum objects with the Schrödinger equation for many particles. The Schrödinger equation describes a highly entangled state of electrons and atomic nuclei as a whole, and, on a molecular level, the structure of a closed or open pore of an ion channel through the Born-Oppenheimer approximation separates electronic and nucleonic wave functions. Then, we can use the electronic quantum dynamics in a constrained rigid nucleonic frame that has a classical spatial structure. This stochastic spatial structure gives the equations of the Hodgkin-Huxley system as a contextually emergent phenomenon.

Mental states emerging from neuroscience

To realize mental states from neural states, we specify the L level as neuron states of neural assemblies in the brain with respect to H, a class of mental states that reflects the situation under study. We may use experimental protocols that include a task for subjects to define mental states while recording brain states. We may use individual neuron properties Li to find Ls such that statistical states have equivalence classes of those individual states. The differences must be irrelevant with respect to the higher level H. Philosopher David Chalmers said a neural correlate of a conscious mental state can be multiply realized by “minimally sufficient neural subsystems correlated with states of consciousness” in “What is a neural correlate of consciousness?”

We can look at phenomenal families, sets of mutually exclusive phenomenal mental states that jointly partition a space of mental states. Creature consciousness can give us refined levels of phenomenal states of background consciousness (awake, dreaming, etc.), wake consciousness (perceptual, cognitive, affective, etc.), perceptual consciousness (visual, auditory, tactile, etc.), and visual consciousness (color, form, location, etc.). With one of these contexts, we choose stability criterion at Ls that has complicated neurodynamics to find robust, proper statistical states.

We may describe L-dynamics and H-dynamics meshing with one another if coarse graining and time evolution commute with one another. We create meshes, parts of space differentiated by complexes of cells between the two levels, that follow from higher-level stability criterion. The coarse graining means fine details of the system can be smoothed over, as entropy of the system increases, such that we can make predictions about the system as a whole.

Contextual emergence could help artificial intelligence approach its potential while accounting for the inherent, intrinsic differences between science and philosophy. We may model the mind as a contextual emergent phenomena of the neurophysiology of the brain. As we learn about the world, we can account for emergent phenomena when addressing issues in science and philosophy, and AI would benefit from these methods of understanding. AI could avoid the issues of reductionism using higher-level emergent behavior resulting from neural networks in the human brain. Backpropagation of neural networks lets us optimize the gap between reality and models they represent using feedback loops with optimal weights of individual neurons when optimized for emergent details. The same way a human can differentiate between a drawing of an lion and a photograph of a lion itself using the emergent phenomena of visual images that appear together to create a lion, intelligent machines can embrace contextual emergence to view the work with inquisitive wonder and curiosity to learn. Instead of having to show a computer hundreds of thousands of images of a lion to teach them how to identify a lion, they can realize a lion in another context, such as lines of a piece of artwork, through the emergent properties of a drawing of a lion itself.

Emergence in AI can account for emotional reactions and instincts by evolving using stochastic emergent phenomena the same way human intelligence has evolved. We may address the role emotions and biases play in decision-making and intelligence, as described by psychologists Daniel Kahneman, Amos Tversky, and Gerd Gigerenzer.

We can represent proper cells with basins of attraction and chaotic attractors with coarse-grained generating partitions. These partitions of the system lead to Markov chains with a rigorous theoretical constraint for the proper definition of stable mental states. The mathematical techniques come from ergodic theory and symbolic dynamics.

The emergence of mental states from electroencephalogram (EEG) dynamics shows that data from subjects with EEG data from sporadic epileptic seizures can correlate with mental states of the seizures themselves. Using a 20-channel EEG recording, we get a 20-dimension state space that we reduce to a lower number through principal component restrictions. We find a homogeneous grid of cells to set up a Markov transition matrix that reflects the EEG dynamics using fine-grained auxiliary partition. Then, this matrix gives eigenvalues that characterize time scales for which the dynamics can be ordered by size. The eigenvectors span an eigenvector space such that the measure principal component states form a simplex. The three leading eigenvalue give a neural state representation that has a 2-simplex with three vertices, or a triangle. We can further classify neural states by distance from the vertices of the simplex to clusters of neural data. In the principal component state space, the clusters appear as non-intersecting convex sets between mental states. We may also use recurrence structure analysis to partition the state space into recurrent clusters such that they overlap from the recurrence plot of the dynamical system. We figure out the metastable states and transitions between them using a Markov chain with one distinguished transient state and other states representing the metastable states in the dynamics.

Intentionality

Philosopher Daniel Dennett describes the intentional stance of the prediction of a system’s behavior too complex to be treated as either a physical or designed system. Intentional systems behave in predicted ways by ascribing beliefs and desires to their internal states. From thermostats to chess computers, we can make predictions of a system with necessary and sufficient conditions. The system’s dynamics have to be non-trivial, so this excludes linear systems with periodic oscillations or damped relaxations. We construct an intentional hierarchy from general case of nonlinear nonequilibrium dissipative systems to more specific intentional systems. A physical system’s physical nature is necessary for being a nonlinear dissipative nonequilibrium system while a nonlinear dissipative nonequiliibrium nature is necessary for an intentional system. An intentional system is necessary for being a true believer, according to Dennett. Sufficient conditions in the intentional hierarchy implement contextual stability conditions.

The transition from equilibrium thermodynamics to fluid dynamics represents phenomenal laws of fluid dynamics (like the Navier-Stokes equation) emerging from statistical mechanics under the assumption of local equilibrium. Sufficient boundary conditions give rise to self-organization, such as through “magnetic snakes.” We give a rationality constraint for optimal dissipation of pumped energy, and true believers emerge contextually as intentional systems under mutual adoption of the intentional stance.

The representational thought may reference aboutness, and the intentional approach concerns the contentfulness or meaningfulness of representational states. We may create a network theory of meaning that emerges from the semantics of a system. Philosopher Karl Popper argued against reductionism on the grounds there’s a world of abstract, nonphysical objects we interact with when we reason, discover proofs, speculate consequences, use language, and think about mathematics and philosophy. This autonomous reality (known as World 3, with World 1 being physical laws and World 2 as mental events and processes) we find dispositions to verbal behavior and wiring in the brain. Popper implies it’s more understandable how nonphysical states interact with intelligibilia than how neural states might.

Symbolic grounding

The symbolic grounding problem is the problem fo assigning meaning to symbols on purely syntactic grounds. Cognitivists such as philosophers Jerry Fodor and Zenon Pylyshyn have described this problem. It can also describe how the question of how conscious mental states can be characterized by neural correlates. The relation between analog and digital systems such that syntactic digital symbols relate to the analog behavior of a system they describe symbolically needs to be further examined through dynamical automata. Piecewise linear time-discrete maps over a two-dimensional state space assume the interpretation as symbolic computers through a rectangular partition of the unit square. A single point trajectory is not fully interpretable as symbolic computation. We need higher-level macrostates from ensembles of state space points, or probability distributions of points, that evolve under the dynamics.

Writer Beim Graben showed only uniform probability distributions that have rectangular support exhibit a stable dynamics can be interpreted as computation. The huge space of possible probability distributions can be contextually restricted to a subclass of uniform probability distributions to create meaningfully grounded symbolic processes. Symbolic grounding is contextually emergent.

Mental causation

Describing the mind as causally relevant in a physical world introduces the problem of mental causation, the question of how mental phenomena can be highly significant in psychology and cognitive neuroscience. It means creating a notion of agency that includes the causal efficacy of mental states. This causal efficacy of mental phenomena seems inconsistent with vertical (interlevel, synchronic) determination of the mental state by neural correlates. Philosopher Jaegwon Kim argued supervenience (also known as exclusion) describes the problem that mental states are either causally inefficacious or have the threat of overdetermining neural states. Either mental events play nor horizontally determining causal role at all or they’re the causes of the neural bases of their relevant horizontal mental effects. Contextual emergence through different levels of complexity means the conflict between horizontal and vertical determination of mental events isn’t an issue. We can define proper mental states from dynamics of an underlying neural system through statistical neural states on proper partitions with individual mental states.

This construction implies that the mental dynamics and the neural dynamics, related to each other by a so-called intertwiner, are topologically equivalent. Instead of some mutually exclusive duality of the mental and the neural, we have a monistic idea that they are part of one and the same concept, albeit related to one another
in a significant way. We can describe it using dual-aspect monism using symmetry breakdown conceptually prior to the opposite of generalization. When symmetries between entities restore themselves, we observe the similarities brought upon by the symmetries and generate equivalence classes of increasing size that can describe contextually emergent phenomena. Given properly defined mental states, the neural dynamics gives rise to a mental dynamics that is independent of those neurodynamical details that are irrelevant for a proper construction of mental states. Mental states can be causally and horizontally related to other mental states, and they neither cause their vertical neural determiners nor cause the horizontal effects of the neural determiners. This resolve the problem of mental causation in a deflationary manner. Vertical and horizontal determination don’t compete against one another. They work cooperatively.

Mental causation is a horizontal relation between previous and future mental states with effectiveness given by the vertical relation (the downward relation of neural states from higher-level mental constraints). Psychophysical neutral elementary entities are composed to sets of such entities that depend on the composition of these sets in a way they acquire mental or physical properties. The psychophysically neutral domain does not have elementary entities waiting to be composed, but, rather, has one overarching whole to be decomposed into its parts. The mental and material from a psychophysical neural whole causes a contextual emergence that requires a new technical explanation and a metaphysical one.

The technical framework refers to the contextual emergence of multiplicity from unity. The “primordial” decomposition of an undivided whole generates different domains that gives rise to differentiations, such as the mind-matter distinction. The psychophysical neutral reality is the trivial, completely symmetric partition in which nothing is distinguished from one another. We can decompose this to give rise to more and more refined partitions in which symmetries are broke and equivalence classes become smaller and smaller. Phenomenal families of mental states emerge.

On a metaphysical level, mental and physical epistemic limits describe the undivided whole as an ontic (physical factual existence) dimension. They reminisce of philosopher Plato’s abstract perfect ideas and philosopher Immanuel Kant’s things-in-thesmelves (empirically inaccessible in principle and specifically mute). The mind-matter problem causes an emergence of mind-matter correlations as direct and immediate consequence of the ontic, undivided whole that can’t be further divided without introducing more distinctions. Many describe determinism as a feature of ontic descriptions of states and observables while stochasticity uses epistemic descriptions.

Mathematical models of classical point mechanics are most common examples of deterministic descriptions and three properties of them are important. (1) The differential dynamics mean the system’s evolution obeys a differential equation in a space of ontic states. (2) The unique evolution of the system means initial and boundary conditions give a unique trajectory. (3) The value determinateness assumes that any state can be described with arbitrarily small error. These three features define a hierarchy for the contextual emergence of deterministic descriptions assuming (1) is a necessary condition for determinism, (2) can be proven under sufficient condition that trajectories created by a vector field obeying (1) pass through points whose distance is stable under small perturbations. We assume (2) for almost every initial condition as a necessary condition of determinism that defines a phase flow with weak causality. To prove (3), we need strong causality as a sufficient condition. The deterministic dynamics of Kolmogorov flow implement microscopic chaos as a stability condition. It’s also possible a continuous stochastic process that fulfills the Markov criterion can lead to a deterministic “mean-field equation.”

Different descriptive levels can correlate with different degrees of granularity. Lower-level descriptions address systems in terms of micro-properties while more global macro-properties account for higher-level descriptions. Philosophy Bas van Fraassen noted the explanatory relativity, in which explanations are not only
relationships between theories and facts, but three-place relations between theories, facts, and contexts. Contexts determine relevance of explanation backed by relevance criteria for reproducibility in science, especially in interdisciplinary fields such as bioinformatics or computational neuroscience. This gives a framework for discussing contextual emergence alongside theories and facts as they relate to explanations. We consider the granularity of descriptions that we observe when descriptive levels transform between one another and their associated granularities by the interlevel relation of contextual emergence. This gives a formally sound and empirically applicable procedure to construct level-specific criteria for relevant observables across disciplines.

Reductionism and ontology

It may seem appealing to reduce every system down to its fundamental components and conclude that every empirical phenomena in science or other disciplines is only applied mathematics. But this misses out on the features of the whole that emerge in the contexts of the higher layers which cannot be reduced. Consciousness among neural and mental correlates of different states provide one example, but we only need to look at any example, such as the emergence of transcriptome interactions from how a genome itself structures itself, to realize that these properties come about only at the higher levels, and, therefore, involve phenomena that are not completely reducible to mathematics. Biologist Peter Corning argued in “The Re-Emergence of “Emergence”: A Venerable Concept in Search of a Theory” that whole systems produce unique combined effects that may involve the context between and the interactions with the system and its environment.

Contextual emergence has been originally conceived as a relation between levels of descriptions, not levels of nature: It addresses questions of epistemology rather than ontology. In agreement with Esfeld, who advocated that ontology needs to regain more significance in science, it would be desirable to know how ontological considerations might be added to the picture that contextual emergence provides.

Various granularity degrees raises questions of descriptions with finer grains as they relate to the fundamental nature of systems when compared to coarser grains. The majority of scientists and philosophers of science answer believe this, so there’s one fundamental ontolgoy that elementary particle physics result from reducing other descriptive levels. This reductive premise produced critical assessments and alternative proposals. Philosopher Willard Van Oramn Quine introduced the ontological relatively that, if there is one ontology that fulfills a given descriptive theory, there is more than one. Philosopher Hilary Putnam developed a related kind of ontological relativity, first called internal realism, and later referred to as pragmatic realism.

We may apply Quine’s ideas to concrete scientific descriptions, their relationships with one another, and their referents. A descriptive framework can be ontic or epistemic depending on which other framework it relates to. An engineer may consider wires of an electrical circuit to be ontic, but a solid-state physicist may consider them epistemic. We can use the relevance criteria to distinguish between context-specific descriptions and avoid pitfalls of reductionism. We create a subtle and more flexible framework while still restricting ourselves to the premises and limits of the contextually emergent model.

Strong and weak emergence

Weak emergence involves emergent properties that computer simulations can control such that the interacting cells of the system retain their independence. Other emergent properties, irreducible to the system’s constituent parts, are strong. Both are supervenient and involve novel properties as the system grows, but the distinction introduces a scale-dependency to observable phenomena.

A Computational Theory of Mind

Brains are only like computers in a specific abstract sense. We can take apart this analogy in the context of the brain-computer analogy to determine knowledge for philosophy, neuroscience, artificial intelligence, and other research areas. It’s very harmful in many ways to treat the nervous system as the hardware in such a way that we need to understand the cognitive science as software when we don’t understand the limitations of such a metaphor. Any theory of anatomical connection we demonstrate in vertebrate nervous systems may give us a basic description of what happens at each stage, but don’t tell us how a given input relates to a certain output. Instead, they obfuscate the description of the brain by using unnecessary comparisons to explain phenomena that are better off explained by describing the phenomena directly and precisely.

An output of a computer depends on its program, input, and functional stages that lead to the output. We can theorize and speculate on artificial and biological computers by using this analogy with other phenomena such as artificial neural networks in computer science and mathematics or biological computers among the brains of different organisms. These computers show connections between the disciplines underlying computation with its theory from statistical mechanics and thermodynamics. We can use ideas from information theory, entropy dynamics, and constraint problems on the resulting artificial and biological computers.

Classicalism vs connectionism

The computational theory of mind is the leading contemporary version of the representational theory of mind, in which we use mental structures to represent mental processes. The computational theory of mind tries to explain all psychological states in terms of mental representations. Philosopher Stephen Stich argued cognitive psychology doesn’t and shouldn’t taxonomize mental states by their semantic properties. Those semantic properties are determined by the extrinsic properties of a mental state. Stich proposes a Syntactic Theory of the mind, arguing the semantic properties of mental states don’t have an explanatory role in the mental states. The Syntactic Theory of Mind uses computational theories of psychological states that only concern with the formal properties of the objects the state relate to. We use semantically evaluable objects with the computations of mental processes. Computational theory of mind proponents disagree on how personal-level representations (thoughts) and process (inferences) in the brain are realized. Classical Architecture proponents (classicists) such as Turing, Fodor, Pylyshyn, Newell, and Simon, believe mental representations are symbolic structures that have semantically evaluable constituents. Mental processes are rule-governed manipulations of them that are sensitive to their constituent nature. Connectionist Architecture proponents (connectionists) like McCulloch, Pitts, Rumelhart, and McClelland believe mental representations are realized by activation patterns in simple processors (nodes). These mental processes are made of the spreading activation of these patterns. The nodes aren’t semantically evaluable typically. One may argue that localist theories are neither definitive nor representative of the connectionist program.

Classicists want to find mental properties similar to language. Fodor’s Language of Thought Hypothesis (LOTH) uses mental symbols to make up the neural basis of a thought like a language. In the LOTH, the potential infinity of complex representational mental states comes from primitive representational states that form using recursive formation rules. We use a combinatorial structure to account for productivity and systematicity of the system of mental representations. We explain the properties of thought using the content of representational units and their combinability into contentful complexes. The semantics of language and thought is compositional.

Connectionists want to consider the architecture of the brain, networks of interconnected neurons. This architecture can’t carry out classical serial computations, but, instead, parallel computations lack semantic compositionality nor are semantically evaluable the way classicists argue. Representation is distributed, not local (unless it’s computationally basic). Connectionists argue information processes in these networks resembles human cognitive functioning. Connectionist networks trained by exposure to objects learn and distinguish. Some argue connectionism means there aren’t propositional attitudes. LOTH-style representation may, on the other hand, be necessary for the general features of connectionist architectures.

Stich believed mental processes are computational, but these computations aren’t sequences of mental representations. Other philosophers accept mental representation, but deny that the computational theory of mind gives the correct account of mental states and processes. Writer Tim Van Gelder doesn’t believe psychological processes are computational. Instead, dynamic cognitive systems give rise to states that are quantifiable of a complex system of the nervous system, the body, and the environment in which they are created. Cognitive processes aren’t rule-governed by discrete symbolic states. Instead, they’re continuous, evolving total states of dynamic systems by mutually determining states of the system’s components. The dynamic system leads to representation that is information-theoretic through state variables or parameters.

Philosopher Steven Horst wrote that computational models are useful in scientific psychology, but they don’t give us a philosophical understanding of intentionality of commonsense mental states. The computational theory of mind tries to reduce the intentionality of states to the intentionality of the mental symbols, but the relevant notion of symbolic content is bound by the notions of convention and intention. Horst believed the computational theory of the mind uses the very properties that it is supposed to reduce things to as a circular argument that need to be reduced themselves.

Intentionality

If we treat propositional attitudes with intentionality as a physical properties, we can build a computer with states that have genuine intentionality. But no computer model that stimulates human propositional attitudes will have genuine intentional states. Intentionality of propositional attitudes isn’t a physical property.

We may consider the network theory of meaning (or holistic theory or conceptual-role theory) such that the meaning of an expression plays a role in its internal representational economy. This way it relates to sensory input and behavioral output. Meaning is relational as an expression’s meaning is a function of its inferential and computational role in a person’s internal system. A robot that behaves like a human is still subject to the question of whether those thoughts it generates have the same meaning that represent our own meaning. Assigning meaning to the internal states of a robot would be applying a double standard arbitrarily with no useful purpose. The robot’s internal machinery doesn’t change that it believes, wants, and understands things. The robot’s intentional states depend on how complex its internal informational network of states it has.

We need altogether a better theory of representation in organisms much the same way we have theoretical definitions and ideas of what molecules, proteins, and neutrons are. We can also study the mind as it relates to the computer by differentiating between understanding its design and its function. Though we can perform actions such thinking, feeling, and arguing without knowing exactly the neuroscience of our brains, we can also use a computer for, more or less, what a computer is designed to do without knowing exactly how a computer. Albeit, we must know some computer basics such as turning on a computer by pressing a button as well taking care of our brains by taking care of our bodies, we must also account for intentionality in understanding why intentions works, rather than simply knowing that we have intentions and following in blind dogma.

Levels of organization

The brain-computer analogy presents a problem of complexity that we know we have in the brain as that relates to organization of a computer. The semantic, syntactic, and mechanistic levels introduce issues with the level of the algorithm and the structural implementation of those features. Neurobiological theory challenges the way of specifying the organizational description. The levels of membrane, cell, synapse, cell assembly, circuit, and behavior can be argued as levels, but even within them we have different partitions of the levels of themselves. We can also determine levels by the research methods such as how through learning and memory we can take a cellular approach to show modifications in presynaptic neurotransmitter releases in habituation. Which level is functional and which level is structural is difficult to determine, too.

Mental state semantics

According to the computational theory of mind, the mind operates on symbols and uses symbolic representations to represent mental states. We discuss the meaning of these symbols as the semantics and the relationships between them as the syntax. We may argue that more complicated mental states come from these basic symbolic “words” of the language of thought. The hypothesis that there’s a language of thought encoded within our brains is not obvious, nor is it agreed upon by everyone. There are many competing hypotheses and theories to how the logical form fo propositions relate to the structural form of the mental states that correspond to them. If we take an intentional stance to the mind (that we treat the object that has a behavior we want to predict as a rational agent that has beliefs, desires, and similar mental states that exhibit intentionality), we can uncover objective, real patterns of the world, and this is an empirical claim we can determine beyond the skepticism associated with it. Philosopher Daniel Dennett argued any object or system whose behavior we predict with this strategy is a believer. A true believer, Dennett argued, is an intentional system whose behavior we can reliably predict with the intentional stance. Our brains have somehow handled the statistical combinatorial explosion that accompanies its own complex nature such that we can use billions of cells in networks with one another, and the only representational system we have upon which to model is human language. We haven’t imagined any plausible alternatives in such detail as we do our own language.

Causality

A calculator’s representation and rules for manipulating representations can explain its behavior much the same way we describe how and why people do what they do. Philosopher Zenon Pylyshyn said we explain why a machine does something with certain interpretations of the symbols in a domain. Psychologcial theory would cross-classify categories of neurophysiology theory that would make neurophysiological generalizations miss important relations that are only describable at the level at which representations are referred to. The psychological maps only would map onto an indefinite mix of neurobiological categories.

Connectionism (Parallel distributed processing)

As philosopher Paul Churchland has argued, we may use connectionism or parallel distributed processing (PDP) in figuring out the computational operations in nervous systems in such a way we may use computer models of parallel distributed systems to generate the appropriate phenomena on a higher level (cognitive science, psychology, etc.) from basic processes (neuroscience, physics, etc.).

Tensor network theory

Neuroscientists began the theory began on the cerebellum because it has a limited number of neuron types that are each distinct on a physiological level and connected in a specific way that the cerebellar cortex produces the Purkinje cell with two different cell systems as input. Using wiring diagrams of cerebellar neurons to describe the connections accept input and result output in a parallel manner. We have a trade-off between detail to understand the system with how the array itself processes information. Through tensor network theory we attempt to use principles from mathematics, physics, and computer science in understanding how these systems may model the nervous system. We can create a schematic neuron to find out more about the patterns of neurons arranged in mathematical arrays. Though the model may be limited by the assumptions of casual theory and epistemic concerns of the phenomena we attempt to describe, it’s a nice heuristic to see something we wouldn’t otherwise see through single-cell data. We may use concepts from linear algebra and statistics to create output vectors in a coordinate system such that the corresponding tensor matrix governs the transformation of ensembles from input-output relationships by the corresponding reference frame. The spiking frequency defines a point on an axis of the coordinate system with the output a vector in the space of the output neurons. We may generalize a tensor mathematical to transform vectors into other vectors such that we address the basic problem of functionalist sensorimotor control as going from one different coordinate system to another.

When we figure out what the mind-brain does, then how it might implement various functions in a top-down manner among different levels of science, the theorizing is highly constrained, yet very well-informed, by the data of the level at which we implement. But, with tensor network theory, we wouldn’t label these processes as top-down, but, rather, from lower-level fundamental processes to higher-level descriptions.

We use a tensor transformer to transform in a way we still need: to transform vectors in sensory space to vectors in motor space. We may deform one phase space to get an object in the other one using representations as positions in phase space and computations as coordinate transformations between phase spaces. The Pellionisz-Llinás approach uses sensorimotor problems constrained by realistic creatures as a method of reducing at bottom the problem of making coordinate transformations between phase spaces. In tensor network theory, we look for functional relationships between connected cell assemblies and investigate them for properties relevant to phase spaces much the same way a computer or artificially intelligent machine searches for solutions among sentence-related criteria. Such AI would require this knowledge to determine what to do.

Tensor network theory still needs to unify results across the disciplines of cognitive science, psychology, and neuroscience in such a way that we can construct a universalized, common set of rules with coherent explanations that we can experimentally test and verify. Attempts to describe the vestibulu-ocular reflex, the method of determining movement from visual image stimuli, using semicircular canals of the vestibular system, we further imagine each eyeball detecting the images and communicating to those receptors. This system needs to determine how muscles contract so the eyes move in a way to reflect the head movements. The corresponding tensor approach would imagine the system converting a head position vector into a vector that describe muscle positions. The transformation from vestibular to oculomotor, according to the Pellionisz-Llinás hypothesis, takes a premotor vector intoa motor vector. The vestibulur organ, we can show, has a set of positions it prefers that we can call an eigenposition.

We further pose Churchland’s phase-space sandwich hypothesis that describes spatial organization of maps layer so that the corresponding neurons may perform any transformation from two dimensions to two dimensions. The maps representing phase spaces aren’t literally stacked upon one another. They may remain spatially distant from each other. With the topology of the cortical area, we still have to answer whether tensor network theory can account for neuroplasticity. Covariant proprioception vectors can give feedback about motor performance which can further provide information of transformations of the cerebellar matrix. The matrix would then turn into a state such that its eigenvectors are identical so that they are the “correct” coordinate transformation. Climbing fibers of the cerebellum may provide a pathway for reverbative feedback that modifies transformational properties of the cerebellar network. This is found in AI that use relaxation algorithms.

Mental states

If we determine how behavior related to cognition and complexity emerge from the basic neurophysiological theories that govern sensorimotor control, we can determine the nature and dynamics of cognition. We may construct representations at abstract levels of organization that correspond to cognitive activity as the way sentiential representations act according to logical rules. Phase spaces may recognize certain features as humans do, such as eyes of faces or shapes of animals. We may describe phase spaces in such a way that they’re occupied by these sensory stimuli. Using the cones of photoreceptors’ reflectances responsible for color, we can demonstrate a computational problem of how to represent a unique color with a triplet of reflectance values.

Parallel models

Sequential models can be powerful, but AI researchers have shown their ineffectiveness in simulation of fundamental cognitive processes in areas of pattern recognition and knowledge storage and retrieval. The differences between human brains and computer science phenomena only furthers these issues. Humans and computers use very different methods of storing memory as well as methods of connectivity among humans neurons against artificial ones.

The Hinton-Sejnowski visual recognition system uses a network of two sets of binary units: one for detecting input from external stimuli and the other for connecting detectors to nondetecting units. These networks determine the truth and validity of hypotheses by gauging which units fire and which don’t. It performs a cooperative search in which these assemblies vote for various outcomes and the one with the most votes wins. The relationships between various hypotheses depend upon synaptic weights using probability functions and distributions. They also perform relaxations that cool the system such that it may take different molecular organizations in an annealing process. During this process the crystalline structures have a global energy minimum that parallels adding noise to the system. From these fluctuations in noise, the system breaks out of superficial minimima. The Metropolis-Hastings algorithm lets us gauge locally improbably hypotheses such that they may win over other hypotheses.

To make the model reflect empirical data in neuroscience, we must show it accounts for processing of various neurobiological pathways. Computer vision models need to account for contours of perception as well as emergent phenomena such as recognizing how a property of an image emerges from various structures working in a dynamic, systemic manner of the visual image itself. Connectionists could update their brain-computer models using evolution the same way sensorimotor mechanisms have to suit a simultaneous solution in visual recognition.

We distinguish between different levels of description of computational processes. These levels have certain reducible relationships among them in which we can make varying levels of commitment to the reductionism between them. The theory of symbolic computational functionalism of the computational theory of mind (known as computationalism) lets minds manipulate discrete, defined symbols to model discrete, defined logical structures and computer languages. A human mind may be a deterministic finite state automata under this theory, and the theory is independent of implementation. Even if different beings have different physical structures of themselves, they may have similar or the same mental states. Philosopher Patricia Churchland and neuroscientist Terrence Sejnowski have criticized that the implementation is important, especially as lower theoretical levels (such as neuroscientific phenomena) are significant to higher ones. Opponents may also argue that the representations of computationalism don’t tell us anything more than the non-representational descriptions do. Using representation may just amount to an unnecessary model or analogy that only steers us away from the precise, defined meaning of the world.

The computationalist may respond she doesn’t want to make a physiologically accurate human mind model, but wants to find intelligent features for any agent. In AI, one might want to solve a problem in computational space that doesn’t represent human features. She may also respond that representational theories note when the features of representation, such as the similarity between representations and their objects and how accurate they are, in such a way that the representational theory is more effective, valid, and justified than non-representational theories.

We may account for the intentional nature of basic emotions even if they have a physiological component to them, such as changes in facial expression or bodily mechanisms. Weak content cognitivism, the belief that emotions are or are caused by propositional attitudes, may attack this relationship of emotions to a bodily response, but the relationship of emotions to beliefs doesn’t mean all emotions are caused by propositional attitudes like beliefs. A computational theory of mind should account for emotional effects and similar affects that influence perception and judgement. But the changes in emotions don’t seem discrete as though there were differences in logical systems as we described with the Hinton-Sejnowski theory or with tensor network theory. Emotions form a continuous gradient that doesn’t seem to arise from a sort of combinatorial engine that the computationalist theory would argue. We would need a semantic activation model that adheres to principles of symbolic computational functionalism as well.

The connectionist model describes effects of some emotions, but doesn’t model emotion itself. To allow semantic activation models to use emotions in a cognitive position would mean that emotions, in some sense, are the same as similar cognitive categories such as “visual stimuli” or “beliefs.” The other features of emotion, though, semantic activation models need to describe implementation-dependent details of the model itself.

The computationalist position also has issues with how to model affects, such as those of basic emotions, independently of cognition yet still play a role in rational human behavior. The computationalist may be inclined to treat emotions as external or even unnecessary to their models. Computationalists also can’t account for the effects of basic emotions on perception and categorization using their current models. These emotions themselves may be more fundamental to those perceptions and categories that we form, given their unique nature on intellectual perception.

Neural circuitry

We may imagine the brain as a computer through neural circuitry excitation/inhibition ratios as a property for cognitive function in cortical circuits. Research in circuit function on synaptic parameters in memory and decision-making can give us parameter spaces to reduce NMDAR conductance strengths from excitatory pyramidal neurons to inhibitory interneurons or excitatory pyramidal neurons. We may apply dopamine neuronal activity using a bifurcation diagram. In math, we generally use bifurcation plots to study dynamical system behavior with respect to parameter variations or similar perturbations. We may use Ohm’s law to relate current, potential, capacitance, and resistance among membrane channel dynamics. The dopamine neuron uses ionic currents using the Hodgkin-Huxley models. We can use these fundamentals to create circuit models of neuronal activity using population firing rates to calculate dopamine efflux in the nucleus accumbens.

Functional connectivity

Functional connectivity (FC) is the statistical correlation of neural activity to two different regions. We find evidence for this at the micro-circuit level (the relationship between structure and function through anatomical and neurophysiological research techniques). We can integrate information across brain networks using large-scale brain connectivity at finer temporal and spatial resolution. If we introduce spatiotemporal models of resting-state networks, we can analyze the time frequency of these networks using wavelet analysis, sliding-windows, and similar methods of describing temporal correlations between the networks.

FC is similar to functionalism in that we’re defining our representations in terms of their functions. Functionalism holds that qualitative states (e.g., pain) are functional states of a system, interrelated to inputs, outputs, and other internal states. For this reason, cognitive models of the mind have used FC in their explanations. If we had a neuroscientific system that realizes the same set of functional states a person, it still has the problem of liberalism and chauvinism, philosopher Ned Block argued. Liberalism is the problem a mentality theory faces when it attributes mentality to systems that don’t have it, such as behaviorism, Block believed. Functional connectivity in neuroscience must address the objection against functionalism of how mentality theories attribute mentality to systems without it. A behavioral disposition may be necessary for the possession of a certain mental state, but it isn’t sufficient. Chauvinism is the problem that a theory withholds attributing mentality to systems that seem to possess it. Block argued type physicalism falls to chauvinism because it’s the view that mental state types are equivalent to physical state types.

We may talk about the mental state of pain caused by sitting on a tack that causes behaviors such as loud cries and other mental states such as anger. We define these functional definitions (of analytic functionalism) using causal roles that are analytic and a priori truths about the other mental states alongside their propositional attitude. Identities are necessary and not subject to empirical observation. Psychofunctinoalism, on the other hand, uses empirical observation (in an posteriori manner) and experimentation to determine which mental state terms and concepts are contingent to their observations.

Structural connectivity

Structural connectivity (SC) are the long-range anatomical connections among brain areas through white-matter fiber projections. We use fiber tracking using bounded diffusion of molecules in water to create non-invasive connectivity maps. In the past scientists used diffusion tensor imaging (DTI), we track neural fibers, but more recent studies have used advances in graph theory for much more research on topological features in brain connectivity.

We can characterize the relationship between FC and SC as the former relying on connections between areas and the latter the physical characteristics of the fibers. Effective connectivity (EC) characterizes the interactions between visual processing regions (a psychophysiological interaction analysis) using structural equation modeling (SEM) based on minimization of predicted and observed dependent variables. EC also refers to the broader definition of SC that captures the features that shape connectivity like synaptic strengths, neurotransmitter concentrations, and neural excitability. Through both model-driven and data-driven approaches (the former generation signals under assumptions and the latter using statistics, information theoretical measures, or phase relationships to extract EC), we can infer EC and the topology of these networks. Using binary graphs, path length measures, clustering coefficients, and other ideas from graph theory alongside results from diffusion-based tractography, we can show the resting-state networks in various regions of the brain. Scientists have introduced Network Based Statistics for comparing whole-brain connectivity between different groups of connections.

We use the covariance between populations of neural activity with the Jacobian of the system of equations describing the neural activity in each node. For an input covariance matrix, we can describe the covariance between neural populations. The Kuramoto network model uses the global graph metrics of schizophrenia patients to account for the neurophysiological impairment to describe resting-state network activity between topological properties in schizophrenia. We may use either noise-driven spontaneous dynamics and complex interactions between phase-oscillators (with coupling, delays, and noise) to introduce a dynamic nature to the model, but these two factors contradict one another. The former implies temporal correlations in spontaneous activity emerge from uncorrelated noise propagation through connections while the latter uses complex interactions of oscillatory activities in regions of the brain. We may use a supercritical Hopf bifurcation to reconcile the two using synchronized networks and their corresponding temporal variations. From this, the Kolomogorov-Smirnov distance between empirical and simulated FC dynamic distributions is optimal at this critical point and more sensitive to deviations from the critical point.

Reinforcement learning

Reinforcement learning is emerging a dominant computational paradigm for modeling psychological and neural aspects of affectively charged decision-making tasks. The Markovian assumption lets us use decision-making models that describe how nervous tissue carries out perceptual inference. The Markovian assumption lets us use Markov models such that the various states that they use to describe processes are independent of the states that came before it. Hopfield neural networks alongside the work of Hinton-Sejnowski would let computational models use rules such as the Bush-Mosteller rule (learning based on trial-based differences between predictions and outcomes) or the Sutton-Barto approach (Monte Carlo methods and temporal-difference learning in artificial neural networks). We can introduce the temporal difference error such that the agent in the system chooses an action that maximizes a temporal reward. When diffusion ascending systems of nervous systems could use temporal difference learning as a general way biological systems could learn to value states. We can used a modified form of Hebbian learning such that it depends on incorrect prediction of the future to reinforce a bidirectional synaptic change. These Hebbian synapses could then store predictions of the future in a way that accounts for the actions of dopamine neurons.

Optimizing procedures

We may use optimizing methods from mathematics, physics, and computer science in neuroscience. If we assume artificial neural networks are similar to biological ones, we may use error minimization as an optimization procedure. The way we adjust parameters and weights we may analyze the computations of a neural system in how it generate ideas from the organization of a network. We may use backpropagation in creating models that have the capacities of a biological neural network, and speculate on how networks function in a computational theory of mind. The nervous systems of the brain have too many parameters to all be entirely controlled by genetics, neurodevelopment involves a massive synaptogenesis that grow using optimization processes, some parameters are used for feedback to adapt behavior to circumstances, and natural selection optimizes nervous systems in such a way that we may regard the nervous system’s selective pressures as error-minimizing.

The neural circuit in visual tracking of moving objects uses many unknown parameters and specific weights. We can construct a network by fixing the known parameters and train it on input and output to determine the unknown parameters. The probability inference methods depend on the degree of similarity between artificial and biological networks. We may use models to generate hypotheses because the nervous system evolution may be described with a cost function and artificial models use backpropagation to search through possibilities.

Conclusion

As 18th-century German philosopher Immanuel Kant said, studying concepts of the mind without empirical science is empty and studying science without philosophy is blind. Understanding how the brain works means going from simulating in a computer to making synthetic brains. We see how models interact with the actual world (whether they simulate the world or directly use it), determine which real-world parameters are relevant to our models, and extend models to cover all levels of organization. We wrestle with reduction, causation, and other phenomena through both science and philosophy.

Don’t Read this Book if you want Solutions in Life

Cartoonist Randall Munroe shares satirical advice about the world for anyone curious. The creator of the popular webcomic xkcd has come up with solutions to life’s problems. Results may vary.

Credit: Randall Munroe

Let’s say you wanted to find alternative methods to power your house. Given that the average American house uses about $1,000 per year on electricity, you turn to nature for answers. Creating an electric generator from the movement of Tectonic plates would provide a simple solution to natural electricity. If you lived on a fault line, you can figure out the force the ground exerts over a distance. Multiply this force by the distance to get energy. You decide to build a pair of giant pistons connected to the Earth’s crust. As shown above. As the pistons compress a reservoir of fluid between them, the pressure builds up to drive a turbine.

After giving this advice in his book, “How To: Absurd Scientific Advice for Common Real-World Problems,” cartoonist Randall Munroe admits the system would be “ridiculous and technically infeasible for a lot of reasons,” including cost and size. Yet these explanations make science enjoyable and entertaining no matter what your background is. Munroe’s book explores silly solutions to the most mundane problems in life such as boiling the Kansas river using teakettles so you can cross it, using butterflies to transport data or using liquid nitrogen to create snow when you want to ski. He uses scientific evidence and reasoning to back up his solutions but remains playful in explaining them, no matter how absurd they are. Setting things on fire to generate power and charge your phone can be a lifehack. Or just arson.   

A Gift to be Simple

Einstein is often paraphrased as saying, “everything should be made as simple as possible, but no simpler.” Regardless of how close this aphorism was to what Einstein actually said, simplicity is important in conveying information efficiently. Too much simplification can lead to poor representations of the universe. Munroe understands this and runs with it. His writing on simple solutions to life’s problems is friendly, lighthearted and approachable for all audiences. Much like his previous book, “Thing Explainer: Complicated Stuff in Simple Words,” he knows he can engage a broad audience through the simplicity of science. 

Though he seeks to entertain, Munroe remains cognizant about discerning seriousness from satire. He treats the reader like an intelligent being capable of understanding these tones and styles of writing. Even when he uses equations to calculate speed, force and other physical quantities, he presents them in a bite-sized, descriptive that’s easy to digest and follow. You can read the entire book in a single sitting because the explanations flow so naturally and fluidly in each chapter. Reading the book bit by bit, though, may help you become more curious about the world around you as you study Munroe’s explanations closely.

Laughing at Life

Munroe’s book is entertaining in an absurd, surreal way. He treats humans like a specimen under a microscope with enough sarcasm, wit and dry humor to keep you laughing throughout. His humor is more cultural as a satire on the rest of society – even in a self-aware sense. Making fun of the universe is how you understand it better. 

Still, some may find the humor isn’t meant for them. Munroe’s style of explaining can come across as pretentious and condescending. Readers may find that explaining simple things that they already understand only serves to show how smart Munroe is as though Munroe were some authoritative voice over all scientific knowledge. Others may find the book’s content short and thin even with thirty chapters. Some of the explanations may seem undeveloped. But the book’s personable, tongue-in-cheek nature leaves it free of presumptuous claims of the reader’s intelligence. 

The irony that an engineer would write such a treasonous attack on normalcy and established methods of scientific reasoning may put a smug, smirk on your face. But the book’s value goes beyond a few chuckles. Munroe’s humor instills curiosity and wonder of the world and how bizarre it can be. 

As Mythbusters co-host Adam Savage said he would reject reality and substitute his own, much of science and engineering come down to complicated, elaborate interpretations and explanations. You can make friends by physically running into them or jump off a mountain if you want to jump really high. Everything is up to interpretation. Munroe’s wit will let you better understand the craziness of the universe itself. Pick it up and give it a read for the sake of mad science itself. Then dispose of it by shooting it towards the sun. 

Raising the Alarm: Rhetoric on Climate Change

Shock! We realize the severe to protect the rights of individuals displaced by rising sea levels, storms, wildfires, floods and everything else brought upon by the nature of climate change.

Journalist David Wallace-Wells elucidates the assumptions, contexts, themes and other underlying features behind arguments on the future of Earth in his book “The Uninhabitable Earth: Life After Warming.” As though we were on a highway to Hell, the American journalist’s says, to avoid the doomsday scenarios of climate change spanning economic and political crises, we need a carbon tax, a method to fight against dirty energy, innovative agricultural techniques and overall funding for promoting green energy capturing waste carbon dioxide.

A Friendly Warning

As though you were meeting with him for coffee, Wallace-Wells’ writing is accessible and understandable. It lets the leader let feel at ease and understood despite the near-alarmist content of the book. Even though much of this book is content that has already been written, this book sets itself apart from others by being so frank, direct, and almost a detached objective look that Wallace-Wells takes as a journalist. As Aristotle wrote in Rhetoric, Book III, “For it is not enough to know what we ought to say; we must also say it as we ought.” Wallace-Wells provides a stunning re-contextualizing of future research, conversations and other features of existence due to climate change. The reader will feel empowered in her future ways of analyzing climate change rhetoric. It leaves the reader armed with the ability to formulate and analyze arguments on the nature of moral responsibility and power to make a difference in the world. 

The book also serves as an equalizer between contrary points of view on the issues of climate change. Wallace-wells’ writing encompasses so many perspectives to provide an accurate, multidimensional moral landscape of the issues of climate change. This makes the political message more powerful and persuasive in turning heads and changing minds. As Wallace-Wells says we have a tendency to be complacent even though we’re scared about the future of the Earth. Through comparisons and analogies, he forms predictions of how our actions affect the planet. By 2050, there will be more plastic than fish in the oceans. Even the everyday examples of our actions, such as a flight from London to New York destroying three square meters of Arctic ice, will leave you thinking twice about your role and responsibility in these global issues.

“Oh, the Humanities!”

Wallace-Wells explores many possibilities and options as he formulates his arguments. He draws comparisons from literature, history, philosophy and other disciplines in addition to science-backed conclusions. Through this, Wallace-Wells avoids pitfalls of reductionism that would come with relying on science alone. Instead of treating the issue of climate change as simply a mathematics problem with an optimal solution that we must use, it’s much more speculative. To address the crime, poverty, disease and economic collapse, he humanizes climate refugees and everyone else that shares our planet. He writes in a way we remember the fundamental ideals, values and principles we must protect. The reader may find herself in awe at how the dystopian futures found in works of “climate fiction” (or “cli-fi”) make the truth appear stranger than fiction. 

Digging deeper into the language of climate change, Wallace-Wells identifies terminology like “climactic regime,” for alleviating the effects of climate change. He uses these terms including “climate fatalism” and “ecocide” in characterizing the debates surrounding these issues. “Human futilitarianism” describes the psychoanalytic nature of climate despair, as writers Sam Kriss and Ellie Mae O’Hagan have said: 

The problem, it turns out, is not an overabundance of humans but a death of humanity. Climate change and the Anthropocene are a triumph of an undead species, a mindless shuffle towards extinction, but this is only a lopsided imitation of what we really are. This is why politics depression is important: zombies don’t feel sad, and they certainly don’t feel helpless; they just are. Political depression is, at root, the experience of a  creature that is being prevented from being itself; for all its crushing ness, for all its feebleness, it’s a cry of protest. Yes, political depressives feel as if they don’t know how to be humans buried in the despair and self-doubt is an important realization. If humanity is the capacity to act meaningfully within our surroundings, then we are really, or not yet, human.

Either way, the planet won’t grow colder or the planet won’t grow older.