Photo by Annie Spratt on Unsplash
I work for GitHub, and so I've been a happy Copilot user since it was first created. It's been amazing to watch the pace of improvements in all of the foundation models that we use, and all of the models out there. (For programming, I love o1-mini, and I'm sure I'll love o3-mini, too.)
I'm also an F# programmer. My version control system, Grace, is written almost entirely in F#.
That means that when I'm asking LLM's to generate code for me, I'm usually asking them for F#. In fact, my custom instructions for Copilot and ChatGPT include this:
For questions about programming, assume the programming language is F#, unless told otherwise.
Code samples should be provided in F#, unless told otherwise.
And here's the thing: none of the models are very good at generating modern, idiomatic F#.
This led me to two big questions:
Well, "duh", right?
Popular languages, by definition, have more lines of code written for LLM's to train on. They have more examples of more ways to do stuff, so the LLM's have more of an opportunity to build a rich, internal model of the language they can use when generating code for us.
Which languages am I talking about? I don't know, <waves hands> something like:
You know... popular ones.
Same thing for popular vs. not-as-popular frameworks, like React, Angular, Rails, etc.
The models have gotten particularly good at creating new projects in these languages (and frameworks) with nothing more than a natural-language requirements list.
They're also pretty good at refactoring and suggesting edits in those languages.
If you're using one of those languages, you're one of the lucky ones in these still-early days of AI coding assistants.
So... I write a lot of F#. F# is not one of the big languages - even though it should be because it's fucking awesome - and so the Generative AI results I get when asking for F# are... <sigh>...
They're just not good. I mean, sometimes, for easy things, they're fine and I can use them and go. But for anything of any complexity, I need to specify a lot of detail to get good code, and sometimes, even then, I don't get usable, correct F#.
I don't spend any time in Julia, Crystal, OCaml, or Zig, or any of the other dozens of languages that fall into the "smaller, less popular" category, so I can't confirm they all get the same lower-fidelity treatment that F# does, but, it can't be good today to have fewer examples of code in your favorite language available for training the models.
The custom instructions I shared above isn't the whole section I have about F#. What I actually have is:
For questions about programming, assume the programming language is F#, unless told otherwise.
Code samples should be provided in F#, unless told otherwise.
All F# code uses the task { } computation expression to handle asynchronous operations.
All F# code should be written in a functional style, using immutability and pure functions where possible.
All F# code should the new syntax when referring to arrays, lists, and sequences: use myArray[0] instead of myArray.[0].
Why do I have all of that? Because, without going into details about F#, to get even close to good output, I'm forced to explicitly say to the model: "Use modern, correct syntax, not syntax from older language versions and older examples."
And this isn't just a problem for small languages. I haven't written any C++ lately, but I follow developments in the language. I imagine that a lot of the code examples you'd get using LLM's right now are going to show out-of-date examples of C++98-style, less safe stuff, instead of the much safer, declaratively safe, C++23.
(Seriously, check out the videos from CPPCon. Anything with Herb Sutter, Andrei Alexandrescu, and Chandler Carruth is highly recommended.)
Here's an example I dealt with recently.
In C#, or any object-oriented-ish language, doing early exits from a method is normal. For instance:
public async Task<int> DoSomethingWithStorageAsync(string writeToStorage, string readFromStorage, string clearState)
{
if(String.IsNullOrEmpty(writeToStorage))
{
return -1;
}
// Keep doing whatever the method is supposed to do...
// ...
// ...
// Cool, we're done. Everything worked.
return 0;
}
Getting out early with return -1
is how you avoid using an else
and indenting everything after it, and, for better or worse, it's standard style these days.
Doing that in F#, though, is not allowed. In F#, the if
expression is exactly that: an expression. It returns a value.
(There is no if
statement in F#, because functional programming.)
You just can't use that early-exit construct without raising an exception, and that's not good functional practice.
I was asking the models to do something that I would describe as a medium-complexity function. I should start by saying: I expect that the use of the GitHub REST API, and, specifically, use of the GitHub .NET SDK, are well-enough-known to the LLM's that that shouldn't have been a problem. The prompt starts:
I am developing an F# command-line application that retrieves issues and pull requests from a specific GitHub repository using the GitHub API, and stores this data in a SQLite database...
and goes on with some detail, as I worked with GPT-4o first to craft the prompt before submitting it to o1-mini. And then to Claude Sonnet 3.5. And then to Google Gemini 2.0 Experimental. And even locally on LM Studio running Phi 3.5 and Llama 3.1.
None of them generated useful, working code. To get any of it working, I had to do way-too-much syntax fixing, and then way-too-much debugging.
There were problems with understanding how to use the F# Task Computation Expression - i.e. how F# does modern .NET async/await.
There were problems with understanding the output of the GitHub API and how to process it.
But, most worryingly... there were early exits.
These are obvious syntax errors, they're structurally wrong, they make no sense in F#, and I cannot imagine there are many examples of them in open-source F# code.
I guess the models are doing something like "F# is .NET; .NET is C#; C# patterns therefore work in F#...".
Even after I updated the prompt to include "do not use early exits" and then even "do not use early exits that look like bad example it generated
" I couldn't get proper code for it.
Well, that was two hours of my life I can't get back.
I went to bed, and woke up with an idea: what if I just generate it in C#, and then have the LLM translate it to F#? I know they're better at code generation in C# than F#, and translating between languages is something they all do pretty well already.
So, I did. And, well, it kind-of worked. Still took more hand-holding than I'd like, but, I got it done.
But, that's not the right solution for this problem. I'll get object-oriented-ish F# out of that, not idiomatic, functional code that's beautiful.
I was thinking it might be good to use some pipeline of LLM's + compilers + thoughtful prompting to generate entire repositories of code that demonstrate the latest syntax and are published only after they successfully compile and are post-processed by a style-checker that validates that older syntax isn't present. With some human-in-the-loop, of course.
I wish there were a standard mechanism that the LLM's could recognize during training where we tell them "here are examples of correct, modern, up-to-date code in this language; when generating new code, translate older syntax into this new syntax", or something like that.
But even if there is... is a repository full of LLM-generated code just spam? I think it doesn't have to be, but I see the point.
For over 20 years, we've all been aware of how important it is to do Search Engine Optimization (SEO) if you want your site to appear near the top of the search rankings.
Maybe, with LLM's, we need to do Language Model Optimization (LMO).
I would suggest that, if you want your programming language to be generated well by the models, that you'll need to take ownership of the LMO for your language.
This presents an entirely new workstream for language owners to handle. Hopefully, the LLM's can help us with it.
One idea is to work together to identify some set of tokens to tell the models: "this is the good stuff", and maybe also to give each language one owner who is allowed to create such repositories, so that the models have a trusted source during training.
I'm happy to hear other ideas... seriously, I have no idea how this would work, but I think we need it, and I think the programming business, in particular, deserves some special treatment during training.
If we don't do something to help the smaller languages, as code generation quality continues to improve for the more popular languages, the gap in quality between large and small languages will just continue to grow. Soon, the upside of trying a new language will be massively outweighed by that quality gap, and we'll become locked into the popular languages in an ever-tightening spiral of more popular = more quality.
Try talking your manager into letting the team learn that cool, new language when you can't use Copilot to get great results with it.
Anyway, I really love F#. I know people who love other, small languages as well. There's a good reason to have a healthy ecosystem of languages and ideas floating around, influencing each other. It's a Good Thing.
I hope Generative AI gets as good with idiomatic F# / Julia / Scheme / whatever as it already is with Java and TypeScript. I think it might need some help getting there.