But PaLM + RLHF isn’t pre-trained. That is to say, the system hasn’t been trained on the example data from the web necessary for it to actually work. Downloading PaLM + RLHF won’t magically install a ChatGPT-like experience — that would require compiling gigabytes of text from which the model can learn and finding hardware beefy enough to handle the training workload.
Like ChatGPT, PaLM + RLHF is essentially a statistical tool to predict words. When fed an enormous number of examples from training data — e.g., posts from Reddit, news articles and e-books — PaLM + RLHF learns how likely words are to occur based on patterns like the semantic context of surrounding text.
ChatGPT and PaLM + RLHF share a special sauce in Reinforcement Learning with Human Feedback, a technique that aims to better align language models with what users wish them to accomplish. RLHF involves training a language model — in PaLM + RLHF’s case, PaLM — and fine-tuning it on a dataset that includes prompts (e.g., “Explain machine learning to a six-year-old”) paired with what human volunteers expect the model to say (e.g., “Machine learning is a form of AI…”). The aforementioned prompts are then fed to the fine-tuned model, which generates several responses, and the volunteers rank all the responses from best to worst. Finally, the rankings are used to train a “reward model” that takes the original model’s responses and sorts them in order of preference, filtering for the top answers to a given prompt.
That’s all to say that PaLM + RLHF isn’t going to replace ChatGPT today — unless a well-funded venture (or person) goes to the trouble of training and making it available publicly.
There’s now an open source alternative to ChatGPT, but good luck running it by Kyle Wiggers originally published on TechCrunch