AI has changed the way we work, whether we’re developers or not. Understanding how LLMs work behind the scenes is a good skill to have, and it also helps you work with AI chat prompts in much more precise and useful ways.
What Are LLMs, Fine-tuning & RAG?
Many of the models used by popular tools like ChatGPT, Claude, and DeepSeek are actually available as open-source at huggingface.com. We can use these as a base model and supply them with specific training data to fine-tune the model to be good at specific tasks.
Different models are good for different things, much like the way we work. I’d expect a plumber to be great at fixing a leaking water main, but I wouldn’t expect them to be great at handling my taxes. The same goes for models, although the training data each uses isn’t quite as drastically different at this point. In the future, I do think highly specialized models will be the norm.
If you’ve ever had an LLM lie to you confidently, you’ve experienced an AI concept known as a hallucination. AI hallucinations happen when an LLM makes up information that sounds real but isn’t true. It’s like when the computer is trying to answer a question but doesn’t actually know the answer, so it guesses and tries to sound confident about its guess. Just like how a person might make up a story that sounds believable when they don’t know the real facts, AI can do the same thing without meaning to lie. It just doesn’t know the difference between what’s real and what it’s making up.
If you’ve experienced this, then you’ve found a great case for fine-tuning your own model. As a programmer, it’s common to have an LLM do this when you’re asking specific questions about something. I had it happen this week while generating some fish CLI scripts. Claude just didn’t have the specialized knowledge to answer properly.
In circumstances like this, it would be great to fill that gap yourself. If you could just give the AI bot the missing information, you know it could work. That’s what fine-tuning provides. With fine-tuning, you’re adjusting its parameters using additional training on specific data to make it better at particular tasks or domains. This process takes a pre-trained model and teaches it to respond in ways that are more aligned with your specific needs, whether that’s adopting a certain writing style, following particular instructions, or becoming more knowledgeable about specialized topics.
RAG, or retrieval-augmented generation, is another way to provide data to an LLM, but it’s different from fine-tuning. RAG retrieves and injects relevant information at inference time rather than permanently modifying the model ahead of time with training. With RAG, you search external databases or documents for relevant context and feed that information to the model alongside your prompt, while fine-tuning actually changes how the model behaves by updating its internal parameters through additional training.
Why Create a Custom Writing Assistant?
There are many circumstances where fine-tuning your own custom model makes a lot of sense, and creating a ghostwriter with your own voice is a simple and effective way to learn how to do it.
Let’s explore a real-world use case to understand how all this works. I’ve been writing blog posts at Kevinleary.net since 2009, and I continue to write today (that’s obvious). I use AI to assist with research for writing and to help correct grammar and spelling, but generating full posts with tools like ChatGPT or Claude never captures my voice. I can tweak and adjust, but I always end up rewriting a lot myself.
But what if I used all of my blog posts as data to fine-tune an LLM base model, like Llama 3.1 8B? Doing this will provide me with a specialized LLM trained on 10 years of my own writing style. The likelihood it will be able to match my own voice and write full blog posts for me is substantially higher. So let’s do it.
Formatting Data for AI Fine-tuning
The first step in the process is to get our data and convert it to a format that can be used to fine-tune our model. For WordPress, the WP CLI is a fast, effective way to do this. We’ll use it to query my posts and convert them to JSONL, which is really a fancy way of saying this is JSON with one record per line.
To do this, I’ve used the following bash script, which allows me to create two JSONL files for fine-tuning. I ran this in my localhost after syncing my live site to include all articles; I recommend you do the same.
if (isset($_GET['export']) && in_array($_GET['export'], ['train', 'valid'])) {
global $wpdb;
$posts = $wpdb->get_results("
SELECT
p.post_title,
p.post_date,
p.post_content_filtered,
pm.meta_value as lead_in
FROM {$wpdb->posts} p
LEFT JOIN {$wpdb->postmeta} pm ON p.ID = pm.post_id AND pm.meta_key = 'extras_lead-in'
WHERE p.post_status = 'publish'
AND p.post_type = 'post'
AND p.post_content_filtered != ''
ORDER BY p.post_date DESC
");
$total = count($posts);
$train_count = floor($total * 0.8);
if ($_GET['export'] === 'train') {
$filtered_posts = array_slice($posts, 0, $train_count);
$filename = 'train.jsonl';
} else {
$filtered_posts = array_slice($posts, $train_count);
$filename = 'valid.jsonl';
}
header('Content-Type: application/json');
header("Content-Disposition: attachment; filename=\"{$filename}\"");
foreach ($filtered_posts as $post) {
$date = date('F j, Y', strtotime($post->post_date));
$title_with_lead = $post->post_title . (!empty($post->lead_in) ? ' - ' . $post->lead_in : '');
$jsonl = json_encode([
'messages' => [
[
'role' => 'user',
'content' => "write a blog post in markdown for kevinleary.net entitled \"{$title_with_lead}\" as if it were originally published on {$date}",
],
[
'role' => 'assistant',
'content' => $post->post_content_filtered,
],
],
], JSON_UNESCAPED_UNICODE);
echo $jsonl . "\n";
}
exit;
}
Once this is in place, we can generate the two files we need for fine-tuning by visiting your site with the following URL parameters added to the URL:
- ?export=train
- ?export=valid
Each will prompt you to save a .jsonl file to your system, which becomes our training data for fine-tuning. In this case, it formats my WordPress posts, which are written in markdown, into the following JSON data structure:
{
"messages": [
{
"role": "user",
"content": "write a blog post in markdown for kevinleary.net entitled '{TITLE} - {TEASER / LEAD-IN}' as if it were originally published on August 10, 2025"
},
{
"role": "assistant",
"content": "{MARKDOWN CONTENT}"
}
]
}
We’ll use this as our training data in the fine-tuning process.
Setting Up the Training Environment
To train our AI model, we’ll use a framework specifically built for Apple Silicon called MLX, which runs on Python, so we’ll need to install that if you don’t already have it. Using the package installer provided by Python is the best way I’ve found.
Once you’ve installed Python, install MLX with pip:
pip install mlx-lm
The first step is to download a local copy of the model we’ll be training, then feed our JSONL to it to create our custom model. I’ve saved mine into a ~/AI/
directory on my Mac.
Running the Fine-tuning Process
We can use it to start the fine-tuning process with a single command:
mlx_lm.lora --model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit --data . --train --batch-size 1 --num-layers 16 --iters 600 --learning-rate 1e-5 --save-every 100
When the training process completes, we’re left with an ./adaptors
directory containing adaptor files. These are the files that we can use to create a CLI or UI to work with our model.
Using the Custom LLM
Once we have our completed adaptors as output from the training process, we can use the model in the command line:
mlx_lm.generate --model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit --adapter-path adapters.npz --prompt "Write a blog post for kevinleary.net about JavaScript performance"
That’s a mouthful; it’s best to simplify with a custom CLI function. I use Fish shell, so I’ve created a helpful fish function to make it better:
Fish CLI Function
function kevinlearynet-writer
mlx_lm.generate \
--model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit \
--adapter-path /Users/kevinleary/AI/adapters/ \
--prompt $argv \
--max-tokens 4000 \
--temp 0.8 \
--top-p 0.9 \
--min-p 0.05
end
Bash Function
If you use Bash as a shell you can add this to your .bash_profile
and run the same command to generate content.
function kevinlearynet-writer() {
mlx_lm.generate \
--model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit \
--adapter-path /Users/kevinleary/AI/adapters/ \
--prompt "$*" \
--max-tokens 4000 \
--temp 0.8 \
--top-p 0.9 \
--min-p 0.05
}
Now I can use my custom AI model in the Terminal whenever I want like this:
kevinlearynet-writer "Write a blog post for kevinleary.net about JavaScript performance"
Conclusion
You should have a pretty good idea of how to create a custom LLM by fine-tuning an open-source model from huggingface.com. In this example, we’re creating a writing assistant that can match my voice and writing style by training on blog posts from Kevin Leary.net. If you’re a business that has done content marketing and has a fair amount of content written, you can do the same, and if you’re using WordPress, you can use the exact same methods for pulling the data out, but the same process could be used with any CMS, just using different methods.
This is only one example of how to use a custom LLM to improve your workflow over out-of-the-box tools like ChatGPT and Claude, but the general process can be used for many different applications. You provide data, fine-tune with training, and then you get a more specialized LLM that does what you want with more accuracy and less hallucination.
Creating a custom fine-tuned LLM provides a superior approach to working with AI for specialized situations such as debugging code in your company’s specific framework, generating marketing copy that matches your brand voice, or providing technical support answers using your product’s exact API documentation and error messages.
If you’re looking for a good AI consultant to help with training custom models, give me a shout.