ML and AI collection

From CMEE4K
Jump to navigation Jump to search

Topics

MapReduce, Spark and Kafka relationship

MapReduce: my interruption
Kafka has two different constraints to determine the messages retention period.

  • In the server.properties file, the following configuration parameter.
# The minimum age of a log file to be eligible for deletion due to age.  168 hr = 7 days
log.retention.hours=168
  • The size of the storage disk space.

Kafka should like a Queue (retention) rather than a Ring (size of the disk space).

MapReduce MyUnderstanding.png Spark Kafka.pdf

Anaconda

https://anaconda.cloud/tutorials (demo video)

  • Anaconda: Download Individual Edition
  • Data Science requirement (version different between packages)
  • Pre-installed 200+ packages which includes Analysis, Visualization, and Modeling. Anaconda packages: ready-to-use and easy-to-upgrade.
  • command
    • anaconda-navigator
      • This Environments > List of installed packages
      • Conda package manager: auto installs dependencies (conda-forge), correctly matched versions
      • Keep multiple versions installed (as separate Python installations)
      • Anaconda manages 7,500+ data science and ML packages



Jupyter Notebook & JupyterHub

Installation



Google SWISH

Swish: Booting ReLU from the Activation Function Throne

  • Just replace x * tf.sigmoid(x) with tn.nn.swish(x)



GPT-3

Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. It is the third-generation language prediction model in the GPT-n series (and the successor to GPT-2) created by OpenAI, a San Francisco-based artificial intelligence research laboratory.[2] GPT-3's full version has a capacity of 175 billion machine learning parameters. GPT-3, which was introduced in May 2020, and was in beta testing as of July 2020, is part of a trend in natural language processing (NLP) systems of pre-trained language representations.

OpenAI GPT-3 Pricing Tiers

Tokens are the break down of a text into smaller units, usually words or characters.
Comment: Too high for hobbyist, but it makes sense since it has a big business potential for running real business. If it is too cheap or free, someone will make the advantage from the hobbyist offer.

  1. Explore: Free Tier
    • 100K Tokens or 3 months free trial, whichever comes first.
  2. Create: $100/month
    • 2 Millon Tokens, plus 8 cents for every extra 1000 token.
  3. Build: $400/month
    • 10 Millon Tokens, plus 6 cents for every extra 1000 token.
  4. Scale: Custom Pricing
    • Contact OpenAI for pricing details



Reference & Resource

Bloomberg ML course

https://bloomberg.github.io/foml/#home

  • 30 videos (~1 hour each) and 7 Homeworks

Yan Cheung

CMEE4K

Copied Yan CMEE4K setup to our jupyterhub (ubuntu 20.10 4G RAM Core 2 CPU: internal 11.185)

Hands-On ML

Hands-On ML Book.png