ML and AI collection
MapReduce, Spark and Kafka relationship
MapReduce: my interruption
Kafka has two different constraints to determine the messages retention period.
- In the server.properties file, the following configuration parameter.
# The minimum age of a log file to be eligible for deletion due to age. 168 hr = 7 days log.retention.hours=168
- The size of the storage disk space.
Kafka should like a Queue (retention) rather than a Ring (size of the disk space).
https://anaconda.cloud/tutorials (demo video)
- Anaconda: Download Individual Edition
- Data Science requirement (version different between packages)
- Pre-installed 200+ packages which includes Analysis, Visualization, and Modeling. Anaconda packages: ready-to-use and easy-to-upgrade.
- This Environments > List of installed packages
- Conda package manager: auto installs dependencies (conda-forge), correctly matched versions
- Keep multiple versions installed (as separate Python installations)
- Anaconda manages 7,500+ data science and ML packages
Jupyter Notebook & JupyterHub
- Setup jupyterhub on CentOS 7 32G RAM i5 CPU 1T SSD: internal 11.7
- Setup jupyterhub on Windows 10
- Setup jupyterhub in Raspberry Pi 4b (32-bits)
- Just replace
x * tf.sigmoid(x)with
Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. It is the third-generation language prediction model in the GPT-n series (and the successor to GPT-2) created by OpenAI, a San Francisco-based artificial intelligence research laboratory. GPT-3's full version has a capacity of 175 billion machine learning parameters. GPT-3, which was introduced in May 2020, and was in beta testing as of July 2020, is part of a trend in natural language processing (NLP) systems of pre-trained language representations.
- YouTube: An Interview with GPT-3 16 min video
- 21 OpenAI GPT-3 Demos and Examples to Convince You that AI Threat is Real, or is it ?
- OpenAI GPT-3 Pricing Revealed – Bad News for Hobbyists
OpenAI GPT-3 Pricing Tiers
Tokens are the break down of a text into smaller units, usually words or characters.
Comment: Too high for hobbyist, but it makes sense since it has a big business potential for running real business. If it is too cheap or free, someone will make the advantage from the hobbyist offer.
- Explore: Free Tier
- 100K Tokens or 3 months free trial, whichever comes first.
- Create: $100/month
- 2 Millon Tokens, plus 8 cents for every extra 1000 token.
- Build: $400/month
- 10 Millon Tokens, plus 6 cents for every extra 1000 token.
- Scale: Custom Pricing
- Contact OpenAI for pricing details
Reference & Resource
- 3Brown1Blue web page
- YouTube 3Blue1Brown channel
- Top 5 Machine Learning Libraries
- TensorFlow - Google Nov 2015
- Theano - U of Montreal
- PyTorch - from Facebook
- Scikit-learn - built on NumPy, SciPy, and matplotllib.
- Keras - on top of TensorFlow, MS Cognitive Toolkit, Theano, or PlaidML
- Apple CreateML
Bloomberg ML course
- 30 videos (~1 hour each) and 7 Homeworks
Copied Yan CMEE4K setup to our jupyterhub (ubuntu 20.10 4G RAM Core 2 CPU: internal 11.185)
- jupyterhub interface https://www.cmee4k.com:8443/user/simon/tree
- jupyterlab interface https://www.cmee4k.com:8443/user/simon/lab