• 10 Posts
  • 96 Comments
Joined 2 years ago
cake
Cake day: June 10th, 2023

help-circle




  • Yes, exactly. This is information that’s encoded by tone, and it is accounted for in the 7 bits per syllable (or lack of syllable, for periods for example). It was more of an example to show how if what you’re conveying is assumed to always be speech, the encoding you can use can be much more efficient.

    On that note, a thing if forgot to mention is that speech assumes that what will be said is pretty much always valid. For example, sure, ascii has a lot more information density at 8 bits per character as you point out, but in reality it’s capable of encoding things like “hsuuia75hs”. If you tried communicating this to someone over speech, you’d find that the average speed you can do this drops dramatically from the normal 7 bits/syllable, where the ascii used in my comment’s text has been constant-speed. That’s one of the trade-offs.


  • You’ve stumbled upon the dark arts of information theory.

    Sure, conveying “sandwich” in ascii or utf-8 takes 64 bits of information, but that’s in an encoding that is by default inefficient.

    For starters, ascii has a lot of unprintables that we normally don’t really use to write words. Even if we never use these characters, they take up bits in our encoding because every time we don’t use them, we specify that we’re using other characters.

    Second, writing and speaking are 2 different things. If you think about it, asking a question isn’t actually a separate (“?”) character. In speech, asking a question is just a modification of tone, and order of words, on a sentence. While, as literate people, we might think of sentence as written, the truth is that speech doesn’t have such a thing as question marks. The same is true of all punctuation marks. Therefore, a normal English sentence also encodes information about the tone of the sentence, including tones we don’t really know how to specify in text, and all of that is information.

    This is the linguistic equivalent of kolmogorov complexity which explores the absolute lowest amount of data required to represent something, which in effect requires devising the most efficient possible data encoding scheme.















  • Okay so basically this is saving bytes on a technicality but also good programming language design (for this specific purpose).

    The first aspect is that since you’re scored on bytes, it’s not really to your advantage to use a language that uses ascii (or utf-8) for it’s tokens, because a large part of it is unprintables like DEL or BELL. So people have designed specially crafted golfing programming languages that use a full 256 possible characters in order to pack as many features as possible in as few bytes as possible.

    The good design part of it is that if you really think about it hard, there’s really not that many things you expect a programming language to do. It turns out that 256 total different operands is about in the sweet spot, so each character that’s available in the 1-byte code page is mapped to one command, and the languages are also designed to make as many things as possible implicit, both at the cost of readability. Remember, all that matters here is getting the lowest score, not code maintainability or anything else.

    This leads to languages like japt (which is a terse form of JavaScript, I’m pretty sure) or pyth (same for python) or Vyxal (my personal favorite, used to be python based but is now bespoke) that look like this but absolutely own at getting a task out in as few bytes as possible.