### Kinds of learning:

Based on the information available: Supervised learning, Reinforcement learning, Unsupervised learning

Based on the role of the learner: Passive learning, **Active learning**

Scenarios:

- Membership Query Synthesis: the learner may request labels for any unlabeled instance in the input space
- Stream-Based Selective Sampling: it can first be sampled from the actual distribution, and then the learner can decide whether or not to request its label
- Pool-based Sampling: instances are queried in a greedy fashion, according to an informativeness measure used to evaluate all instances in the pool

Query strategy:

- Uncertainty sampling: label those points for which the current model is least certain as to what the correct output should be
- Query by committee: a variety of models are trained on the current labeled data, and vote on the output for unlabeled data; label those points for which the “committee” disagrees the most
- Expected model change: label those points that would most change the current model, the learner should query the instance x which, if labeled and added to L, would result in the new training gradient of the largest magnitude
- Expected error reduction: label those points that would most reduce the model’s generalization error
- Variance reduction: label those points that would minimize output variance, which is one of the components of error
- Balance exploration and exploitation: the choice of examples to label is seen as a dilemma between the exploration and the exploitation over the data space representation. This strategy manages this compromise by modelling the active learning problem as a contextual bandit problem. For example, Bouneffouf et al.
^{[7]}propose a sequential algorithm named Active Thompson Sampling (ATS), which, in each round, assigns a sampling distribution on the pool, samples one point from this distribution, and queries the oracle for this sample point label. - Exponentiated Gradient Exploration for Active Learning:
^{[8]}In this paper, the author proposes a sequential algorithm named exponentiated gradient (EG)-active that can improve any active learning algorithm by an optimal random exploration.

Bayesian models for active learning: ask about inputs for which the uncertainty in the value of the function is very high

Active Feature Acquisition and Classification: impute

Reinforcement learning+active learning: exploration-exploitation

Multi-task learning+ active learning: alternating selection, rank combination; Bayesian approach (the mutual information among labels)

Multi-task Learning + deep learning: optimizing more than one loss function

why it works(generalization, helps each other): implicit data augmentation, attention focusing, eavesdropping, representation bias, regularization

Block-sparse regularization (all models share a small set of features)

Learning task relationships: cluster, get close to a mean model (Bayesian,placing a prior on the model parameters to encourage similar parameters across tasks)

hard parameter sharing, soft parameter sharing

Deep relationship networks (still relies on a pre-defined structure for sharing), finding better task hierarchies, allows to learn what layers and subspaces should be shared, as well as at what layers the network has learned the best representations of the input sequences.

Auxiliary task

limited amount of data,We try to store this knowledge gained in solving the source task in the source domain and apply it to our problem of interest, target task in target domain

Learning from simulation, Using pre-trained CNN features, Learning the underlying structure of images(generative model), Learning domain-invariant representations(stacked denoising autoencoder)

### Reinforcement Learning:

Explicitly considers the whole problem of a goal-directed agent interacting with an uncertain environment

Exploration-exploitation trade-off (how to select action):

- \epsilon-greedy action selection
- Softmax action selection
- Optimistic initialization
- Confidence Intervals
- UCB(Upper confidence bounds)
- Thompson sampling(compute a posterior distribution over the reward function given the observed data)

Value function: expected return if the agent starts from state s and picks actions according to policy π

Bellman equations

Iterative policy evaluation

Policy iteration