Author
Abstract
Data Science is a technical discipline that associates statistical concepts to computer algorithms and calculations for processing and modeling mass data derived from observation phenomena (economic, industrial, commercial, financial, managerial, social, etc. ..). In the area of Business Intelligence, the Data Science has become an indispensable tool to help decision making for company managers in the sense that it allows to exploit and valorize the internal and external informational patrimony of the company. In recent years, Python has rapidly become one of the most used programming languages at by Data Scientists to exploit the growing potential of Big Data. The gain of popularity of this language, today, is largely explained by the numerous possibilities offered by its powerful libraries including that of numerical analysis and scientific computing (numpy, scipy, pandas), data visualization ( matplotlib) but also Machine Learning (scikit-learn). Presented in a pedagogical approach, this manuscript revisits the concepts essential for mastering Data Science with Python. The work is organized into seven chapters. The first chapter is is devoted to the presentation of the basics of programming on Python. The second chapter is devoted to the study of strings and regular expressions. The aim of this chapter is to familiarize with the processing and the use of strings values which constitute the values of variables commonly found in unstructured databases. The third chapter is devoted to presenting the methods of file management and text processing. The purpose of this chapter is to deepen the previous chapter by presenting the methods commonly used for the processing of unstructured data which are generally in the form of text files. The fourth chapter is devoted to the presentation of the methods of processing and organization of data originally stored as data tables. The fifth chapter is dedicated to presenting classical statistical analysis methods (descriptive analyzes, statistical tests, linear and logistic regression, ...). The sixth chapter is devoted to presenting of methods of datavisualization: histograms, bars graphs, pie-plots, box-plots, scatter-plots, trend curves, 3D graphs, ...). Finally, the seventh chapter is devoted to presenting of methods of data mining and machine-learning. In this chapter, we present methods such as data dimensions reductions (Principal Components Analysis, Factor Analysis, Multiple Correspondence Analysis) but also of classification methods (Hierarchical Classification, K-Means Clustering, Support Vector Machine, Random Forest).
Suggested Citation
Keita, Moussa, 2017.
"Data Science sous Python: Algorithme, Statistique, DataViz, DataMining et Machine-Learning [Data Science with Python: Algorithm, Statistics, DataViz, DataMining and Machine-Learning],"
MPRA Paper
76653, University Library of Munich, Germany.
Handle:
RePEc:pra:mprapa:76653
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pra:mprapa:76653. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Joachim Winter (email available below). General contact details of provider: https://edirc.repec.org/data/vfmunde.html .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.