Tales of Science & Data
Meta
About me
GitHub
Twitter
Search…
Tales of Science and Data
Meta & resources
The meta on all this
Beautiful web of data science
Probability, statistics and data analysis
Probability, its interpretation, and statistics
Foundational concepts on distribution and measures
Hypothesis testing
Methods, theorems & laws
Notable brain teasers, paradoxes and how to be careful with data
Machine Learning: concepts & procedures
Overview of the field
Learning algorithms
Feature building and modelling techniques
Dimensionality reduction and matrix factorisation
Machine Learning: fundamental algorithms
Learning paradigms
Supervised learning
Unsupervised learning
Machine Learning: model assessment
Generic problems models can have
Performance metrics and validation techniques
Diagnostics
Artificial neural networks
Overview of neural networks
Types of neurons and networks
Natural language processing
General concepts & tasks in NLP
Manipulating text and extracting information
Topic Modelling
Word Embeddings
Computer vision
Intro: quantifying images & some glossary
Processing an image
What's in an image
The computer science appendix
What's this
Notes on foundations
Essential algorithms
The mathematics appendix
Matrix algebra notes
Mathematical functions
Some geometry
Cross-field concepts
(Some) mathematical measures
Toolbox
The Python data stack
Databases and distributed frameworks
Apache Hadoop
Apache Spark
Elasticsearch
Notebook tools
Powered By
GitBook
Databases and distributed frameworks
Some sparse notes on tools used in the productionisation of data works and databases used for data storage.
Contents
Apache Hadoop
Apache Spark
Elasticsearch
​
Toolbox - Previous
The Python data stack
Next
Apache Hadoop
Last modified
1yr ago
Copy link