HIS-Seminar-ML
27.08.2013In the upcoming winter semester I will teach machine learning in my seminar “Elective Subjects V Current Topics in HIS”. The seminar title is “Big Data, Machine Learning and SVMs” as we will cover support vector machines (SVMs) as well. The seminar deals with the fascinating area of learning from data, i.e. how to make machines learn from existing data in order to being able to make predictions. This subject is an interesting mix of mathematics (old-school statistics and new research based on ideas such as Vapnik-Chervonenkis (VC) dimension) and heuristics based on practical engineering experience, i.e. algorithms that “just work”. Machine learning, a branch of artificial intelligence - sometimes also called statistical learning theory or computational learning theory -, is important in many fields including bio-informatics, medical diagnosis, computer vision, game playing, speech recognition, investment banking - yes, the (in) famous quants at Wall Street use it, too, search engines and many more.
Machine learning is also related to Big Data which is a (marketing) term for a collection of data sets too large and complex to process them the traditional way and henceforce demands artificial intelligence to process. Recently, machine learning has gained interest due to the NSA (and related) scandals, too.
The seminar is based partially on the excellent introductory book “Learning from data : a short course” by Yaser S. Abu-Mostafa , Malik Magdon-Ismail and Hsuan-Tien Lin, AMLbook, 2012, ISBN: 1-60049-006-9. You will find it in our library and there will be one copy available as a “Semesterapparat” for everybody to look into and make photocopies. There are excellent video lectures available, too.
Talks cover the following the subjects (not necessarily in this order or granularity):
- The Learning Problem (Setup and Feasibility)
- Theory of Generalization, including VC dimension
- The Generalization Bound
- Bias and Variance
- The Linear Model(s), Perceptron
- Logistic Regression
- Non-Linear Transformation
- Theory of Support Vector Machines, including connection to VC theory, kernels and implementation aspects
- Overfitting
- Learning Principles and Practical Advice
Depending on the audience we can illustrate some of the ideas and algorithms with practical implementations based on the R language. Relevant packages include class, kknn, and e1071. Perhaps also ElemStatLearn.
Please note, that although this is a course for the third semester in HIS, everybody with an interest in the subject and no fear of elegant mathematics (you will get help!) is invited to participate. Those of you who want to receive credits for their participation (talk and written elaboration is required) have to register in the HIS-System, please ask the PA for details. Shall You need more information do not hesitate to contact me.
Update: We can and will use W-LAN fingerprinting data from my research project MoCa from FH-Frankfurt to illustrate ML-algorithms. This will make seminar both practical and more interesting. Also, I am open to using any other (real) data as an example for applications.