Google ML Jam Course 1: How google does Machine learning?
Module 1&2
- Model: math function
- Supervised Learning: adjust model to match output and label
input, label –> model –> output
- ML have two stages: training and inference
- Both are important for ML be useful.
- Inference takes inputs that model has not seen.
- replace heuristics with model
- starts with heuristics but use ML when have data
Use case questions:
- If the use case was an ML problem….
- What is being predicted?
- What data is needed?
- Now imagine the ML problem is a question of software:
- What is the API for the problem during prediction?
- Inputs & Outputs
- Who will use this service? How are they doing it today?
- What is the API for the problem during prediction?
- Cast it in the framework of a data problem. What are some key actions to collect, analyze, predict, and react to the data/predictions
- What data are we collecting?
- What data are we analyzing?
- What data are we predicting?
- What data are we reacting to?
ML is scaling beyond hand-written rules. Simple model + more data > Complex model + less data
- Don’t jump into ML! Start with manual data analysis to know your data!
Training-serving skew
- Batch data in training v.s. streaming data in prediction
- The pipeline should handle both to get the same input for the model.
Fail fast and iterate!
ML business (Module 3)
- Defining KPI’s
- Collecting data
- Building infra
- Optimization
-
Integration
- Effort Allocation
- Top1: Collecting data: to gather more good-quality data to serve this problem.
- Top2: Building infra: make sure to serve smoothly to users.
- Top3: Integration
- What is the application for?
- Is your ML is improving the real world?
ML value comes along the way.
The generic feed-back loop
st=>start: inputs
e=>end: done
process=>operation: Process
output=>operation: Output
ig=>operation: Insight generation (Big Data)
tune=>operation: Tuning (ML)
cond=>condition: Good enough?
st->process->output->ig->tune->cond
cond(yes)->e
cond(no)->process
- Process:
- Individual: prototyping, trying new ideas.
- Delegation: Gently increase people involved, formalize the process.
- Digitization: Automate repetitive parts. Need IT + ML to both succeed.
- Insight generation (Analytics):
- Measure data and reflect the product to tune it.
- Redefine success.
- Tuning: Automated feedback that is faster than many humans.
Inclusive ML: fairness of ML (Module 4)
- Interactive bias: user interactions may inclined to interact with one over the other.
- Latent bias: More male scientists than female ones during history
-
Selection bias: do you sure data represent everyone?
- Confusion matrix: False Positive v.s. False negative
- Select the appropriate metric according to the problem (Avoid FP or avoid FN is more important?)
- Equality of oppurtunity: equal True positive rate for each groups.
- A data visualization tool: Facets
Python notebooks in the Cloud
- Serverless service:
- Jupyter notebooks: DataLab.
- BigQuery: aggregate big dataset using SQL.
- Google pre-trained APIs:
- vision
- intelligence (for image/video)
- speech
- translation
- natural-language
Thoughts
- It is cool that google shared its procedure of transforming to a AI-first company by going through the five steps.