Designing a programming language - Philosophically

While designing a programming language, I came to the realisation that it should be addressed from not only a technical perspective, but also a philosophical perspective. At a high level, you are specifying how and what ideas can be expressed, moreover you are specifying what ideas cannot be expressed.

What happens when we give the user too many words to express their ideas?

Yes, it will make the language feature rich, but will the user use them. I often refer to the 80/20 rule, you only need to know 20% of a language to be able to comfortably speak with a random person. For example, in english I seldom use the word onomatopoeia. Actually, I seldom use the word seldom. So if the goal is to make a language simple, it is not enough to keep adding and see what sticks. You need to be selective in what you do not add, expressivity is not necessarily a good thing.

Furthermore, notice that when you google the definition of a word, you get the explanation in quite simple terms. You rarely google one word and then have to google another word from the explanation.

The 20%

I refer to the 20% as the building blocks for a language. The 20% does not need to be atomic either. What I mean by this is, we can have higher level constructs in the language made from simpler things, that are used in the 20%. Example, in english we use the word breakfast a lot, it is composed of “break” and “fast”. Even though breakfast is in the top 20% it can be decomposed further.

So maybe it is better to call the 20% , the foundation rather than the building block.

Domain Specific Language (DSL)

So my goal was to put in what I believed the top 20% was in the language, and I hope this explanation will help you to argue against anything that is put into the language which does not seem to be foundational.

Talking points

I tend to use examples from the English language, but with a bit of brain juice, we can translate it over to programming languages. So it’s also interesting to notice the dichotomy between native languages and programming languages, when I mention the 20%. In the linguistics, unless you are a native, (or maybe you can use google), it is hard to distinguish what the 20% is. In the DSL, the 20% should be available to you once the programming language is downloaded. It is a mixture of the language features and the standard library. Anything else that is added by external libraries, is the 80%.

Going back to the previous example, you can think of the standard library as “breakfast” and the language features as “break” and “fast”.

What about the ideas that cannot be expressed in the language?

It’s always a tough decision to dictate what cannot be in a language. Especially since the criteria I gave above can be quite subjective. I am quite confident that, the question above should be re-worded to What about the ideas that cannot be succintly expressed in the language? You know in linguistics you usually hear someone from another language mention a word that is not directly translatable to english and even when you try to translate, it’s not exactly the definition, but it’s close enough. In other words, we don’t have a succint way to express this idea or emotion maybe due to cultural or societal differences, or maybe due to each language having building blocks that are polar opposites. This complex foreign word, is made up of simpler building blocks, which we cannot understand or map to our languages building blocks.

But you know, I think we still get along quite well without it, and if it was not for someone introducing this foreign word, we may have not known about it, I further argue that given a week or two, you will not use this foreign word again, because the language you speak usually has enough for you to express your ideas.

So in a round about way, I’m trying to say that if enough people ask for a short-cut to do something then the language should have a way to express it, but for corner cases, it would not be worth it. If our building blocks are good, we should be able to approximate these corner cases, but maybe not the most efficiently and I think for corner cases, this is fine. It would be a failure on the language’s part, if an idea was being used a lot and there was no succint way to express it using the building blocks we have. And you know the foundation can be added to over time, the ideas we want to express are not static. For example, in linguistics, they added “selfie” to the oxford dictionary. Also in cryptography, zero knowledge crypto systems have become really popular, so terms and idioms that relate to it, are becoming used quite a lot in the crypto sphere, compared to previous decades.

Things that make me know that the language is working

I think when complaints start with “This language is easy to use but”, “This language is boring”, “This language won’t let me do …”, then the language is working. The last complaint, for a working language is most likely because you are trying to do something which is not allowed, or is not safe. The language should try to avoid features where the meaning is based on the context. This I think may be the most common source of errors, if you want to do something, there should be no ambiguity, you need to be explicit in what you want to do.

In linguistics, you may want the opposite effect, because sometimes authors of books want to deploy a sentence which has layers of meaning hence allowing them to express multiple ideas in a short sentence. If you’ve studied english or poetry in your class, you would have had many different interpretations of the same sentence. This is not desirable for a DSL or a programming language. Even moreso, when the Domain for the DSL is cryptography.

Of course, this is just my philosohical treatment of the subject and you are free to disregard all or parts of it. Languages should however, be built with some philosphical backing and or mission, or else you will simply add any and everything into the language.

Backwards compatibility

Going back to the selfie example, where we added a new word. Before selfie, you could always say something like take a picture of your head while smiling with a camera.

If I go to my grandparents and say selfie, they might be perplexed. But If I say “take a picture of your head while smiling with a camera”, they will understand. So adding this new word gives us three options:

  • When the younger generation speak with the younger generation, they use the word Selfie and when the younger generation speaks to the older generation, they translate the meaning directly.
  • Tell grandpa to learn the word.
  • Never speak to grandpa about selfies.

Translating to programming languages, I guess the severity depends on what the language is used for and the stage that the language is at. We need to keep this notion of backwards compatibility, so that programs which were written with older syntax/words are always supported. This becomes more of a problem depending on how old the language is, because you are adding more things over time.

How should this DSL solves this?

I don’t know. Kinda; it depends. If the feature that was being used was not secure. It should be slowly deprecated and responsible disclosure should be practiced. This is even more so important for cryptography than it is for general programming languages. I do expect that with these decisions there is always a bit of push back; If you could remove all profanity from your language would you? If you answered yes, Do you think that your decision would be unanimous?

Dealing with ambiguity

One problem with backwards compatibility is ambiguity. Once you write code, you probably do not want to touch it again for every update which we should be able to guarantee. The only way we can guarantee this is if the language is not ambiguous in any case. If we have ambiguity, then it could be the case, that a newer compiler gets rid of this ambiguity and if your interpretation did not agree with the compilers new interpretation, you now have a bug in your program. Hence ambiguity is the enemy. You should read crypto code and be yawning your head off due to how simple it is.

Personally, I don’t think performance is the number one resource that we should be accounting for. I think developer time is so much more valuable, so the language should stop developers from stealing future developer time. When you write something that is ambiguous or really creative, you now save maybe 20 lines of code and have gained a lot of performance, but the people who will read your code will now spend more time trying to debug what you were doing, researching these new features in the language which they do not regularly use, to understand what is happening. This is a time sink, and sometimes this future developer is ourselves. You look at code a week later and it’s almost like another language, when you compile it, even the compiler is sweating bullets trying to decipher what you were doing.