## Learning With Large Datasets

Draw a learning curve and determine if more data needs to be collected.

## Stochastic Gradient Descent

**Cost Function in SGD : **\(cost(\theta, (x^{(i)}, y^{(i)})) = \frac {1}{2}(h_{\theta}(x^{(i)})-y^{(i)})^2\)

- randomly shuffle the data set
- a little gradient descent step using just one single training example
- maybe head in a bad direction, generally move the parameters in the direction of the global minimum, but not always
- it ends up doing is wandering around continuously in some region that’s in some region close to the global minimum

## Mini-Batch Gradient Descent

In **Batch gradient descent** we will use **all m examples** in each generation.

Whereas in **Stochastic gradient descent** we will use a **single example** in each generation.

What **Mini-batch gradient descent** does is somewhere **in between**.

## Stochastic Gradient Descent Convergence

\(\alpha = \frac {const1}{iterationNumber + const2}\)We can **compute the cost function on the last 1000 examples or so**. And we can use this method both to make sure the stochastic gradient descent is okay and is converging or to use it to tune the learning rate alpha.

## Online Learning

The online learning setting allows us to model problems where we have **a continuous flood or a continuous stream of data coming in** and we would like an algorithm to learn from that.

We learn using that example like so **and then we throw that example away**.

If you really have a continuous stream of data, then an online learning algorithm can be very effective.

If you have **a changing pool of users**, or if the things you’re trying to **predict are slowly changing** like your user taste is slowly changing, the online learning algorithm can slowly adapt your learned hypothesis to whatever the latest sets of user behaviors are like as well.

## Map Reduce and Data Parallelism

In the MapReduce idea, one way to do, is split this training set in to different subsets and use many different machines.

- multi-core machine
- multiple machines
- numerical linear algebra libraries
- like Hadoop