Data mining with Machine Learning for the social sciences

Introduction, Challenges, the right & the wrong, Misunderstanding

Priv.-Doz. Dr. Stefan Bosse
University of Koblenz-Landau, Fac. Computer Science
University of Bremen, Dept. Mathematics & Informatics
18.5.2018

Content

Introduction to Artificial Intelligence


Artificial Intelligence

In social science big data volumes must be handled.
But big do not mean helpful or important!
Data is noisy and uncertain!?

  • One major task in data science is the derivation of fundamental mapping functions:

F(Input Data): Input Data Output Data

F(Sensor Data): Sensor Data Knowledge

  • Such a function F performs Feature Extraction

  • But often there are no or only partial numerical/mathematical models that can implement F!

Artificial Intelligence

  • Usage of Artificial Intelligence and their methods can be helpful to derive such fundamental mapping functions - or at least an approximation: Hypothesis

  • The input data is characterized commonly by a high dimensionality consisting of a vector of variables

[x1,x2,..,xn],

  • whereby the output data (information) has a much lower dimensionality (data reduction!) consisting of the variable vector

[y1,y2,..,ym]

  • This means:

F: RN RM with M N

  • Data reduction includes the pre-selection of suitable (high information entropy) data variables Feature Selection

Artificial Intelligence

figaisystem[3]


Fig. 1. A typical Aritificial Intelligence System

Machine Learning - Technical Sciences

  • Often there are no functional relations between two variables x and y.
    In technical applications x can be a camera image with 1 Million pixels and y a figure from the set {0,1,2,..,9} that represent a hand written character. Generally:

f(x):x y.

  • Machine Learning (ML) can be used to derive such relation from experimental/empirical training data!

  • Among the derivation of such functional relations the prediction of what will happen next or in the future is an important task of Machine Learning

Machine Learning - The Functional Approach

  • Machine learning means the derivation of a hypothesis of a simple input-output function from training data provided by humans (statistical data!)!

figmlinout


Fig. 2. A hypothesis of an input-output model function derived from training data

Machine Learning - Medicine

Diagnosis of Appendicitis from medicine and personal data

Input Data x

Patient Details [weight,age,sex,pain left, pain right, temperature, ..]

Output Data y

Diagnosis Label {Appendicitis, Dyspepsia, Unknown, .. }

Decision Learner

Returns one of the labels matching a new input vector x (the test object)

figapp

Machine Learning - Medicine

  • Decision classifiers only return one (good or bad) matching label

  • No information about matching probability

Probalistic Learner (Bayes Theorem)
Feature: Probability forecast estimating the conditional probability of best matching (or all) label(s) with a given observed object x
\[P(y|x) = \frac{P(x|y)P(y)}{P(x)}
\]