Friday, October 7, 2016

More About Ruby And Japanese

My "Ruby is 和" blog post is about 8,500 words long. One of the things I cut, to get it down to that compact size, was a section about similarities between Japanese and Ruby, as languages. I also cut it because I spotted a problem with the argument.

First, the part I cut:

One way to understand 和 is as a harmonious blend of disparate elements. This is an accurate description of the Japanese writing system. Written Japanese employs kanji (for example, 和), which are Chinese letters that carry the same meaning they originally held in classical Chinese when Japan adopted them, plus new meanings that Japan has added in the subsequent centuries. Written Japanese further contains our entire alphabet, which, as it was originally the Roman alphabet, is called romaji in Japan. Written Japanese also has two syllabaries, which are like alphabets except each letter represents a complete syllable. The syllabaries are called kana. While kanji and romaji are both character sets in use outside of Japan, neither of the kana syllabaries are used by any other language. However, even with the kana, there's a magpie element in effect: both syllabaries evolved from streamlined, abbreviated ways of drawing kanji.

Each of these four different character sets has different uses in written Japanese, and despite the enormous complexity, it all kind of balances out. Different character sets have different purposes. For instance, one of the syllabaries, the katakana, is primarily used for loanwords — like English, Japanese has a lot of loanwords — but it also functions as an equivalent to italics. Romaji, meanwhile, allow you to type Japanese characters using Western keyboards. Kanji have to be individually memorized, which is a lot of work for Japanese schoolchildren, but kanji have some advantages over written English, which may spell two different pronounciations the same way — e.g., "read" in the present tense vs. "read" in the past tense — or introduce subtle and totally arbitrary distinctions in spelling, like "write" vs. "right." This is also just something that kids have to memorize in school.

Now, for comparison, here's a Python fan on Reddit, complaining about Ruby:

I started working on ruby with a friend and was amazed by the multiplying way they have of making methods. For instance, ruby has but it also has a .each. There are two different notations for ranges (inclusive and exclusive), and myriad unexpected methods for other things: unless (instead if not), upto (instead of the aforementioned range), etc.

It all seemed philosophically messy to me, like they just let whomever patch in whatever tool seemed most useful at the time, and did not imply the same rigid philosophical framework that the python community has adopted.

This could be a fan of English complaining about written Japanese, although English is perhaps a bad example, given that English has more corner cases than most languages. But it still has only one alphabet! And written Japanese definitely lacks unity by comparison.

I won't pick another example, as that could be a huge discussion. You get the idea. Ruby's cheerful willingness to accomodate multiple ways of doing the same thing has an obvious parallel in written Japanese. Fans of Sapir-Whorf will be thrilled to recognize this continuity, where a characteristic of the Japanese language reappears in a programming language created by a native Japanese speaker.

However, Matz himself has said that Ruby was not deliberately modelled on Japanese.

Japanese and Ruby? I try not to think too much about Japanese culture. The method chain looks like Japanese, but it’s just a coincident. Having said that, the support of M17N is heavily influenced by the use case of Japanese people. Otherwise, I wouldn’t spend too much time on such a hard problem.

This is from a tweet in Japanese. The translation comes from Makoto Inoue, a Japanese programmer based in London. He did a very interesting presentation about this, and wrote a blog post which summarizes it. He points out that Japanese sentence structure — subject first, verb last — works well with the Subject.verb syntax favored by object-oriented languages, and raises the same point I've made here about Ruby's versatility:

Doesn’t this “There are many ways to achieve one thing” concept [seem] familiar with Ruby’s philosophy?

Now for the problem with the argument, i.e., the second reason I didn't include this in my "Ruby is 和" post.

Here's a sentence in Japanese:


It's pronounced "shitsumon ga arimasu." You could translate it as "I have a question," and that's what it usually means in my extremely limited experience. I personally say this in emails to my Japanese teacher all the time. But that's not a literal translation.

This sentence has only three words. しつもん ("shitsumon") means a question, or some number of questions, since, like Cantonese, Japanese does not differentiate between the singular or the plural. が ("ga") is a modifying particle like the modifying particles in Attic Greek. It basically means "the word I just said is the subject of this sentence." Since it comes after しつもん, it means the sentence is about a question or some number of questions. The verb, あります ("arimasu"), literally means "exists," or "will exist." Japanese does not differentiate between the present and the future, at least not in the way that English does.
If the languages I'm using for comparison seem random, that's because they are. I'm not a linguist, I've just picked up a bunch of random knowledge here and there.

So the literal translation for しつもんがあります would be "a question, or some number of questions, exists, or will exist." It's up to the reader or listener to understand that, in context, this means roughly the same thing as "I have a question."

Because of all this ambiguity, idiomatic Japanese relies heavily on inference and implication. And this is the problem with the argument that Ruby mirrors Japanese in any meaningful way, in my opinion. Makoto Inoue has an interesting argument which relates this aspect of Japanese to Ruby's functional characteristics. But my concern is that you could imagine a possible programming language which emulated this aspect of Japanese, and Ruby differs from that imaginary language.

For example:

@user = User.find

Here, we have a bit of Ruby which finds a user and has that user register for something. We have to say @user twice. If Ruby were more like Japanese, it might implicitly capture the return value of each expression, and apply the next line to that return value — or resolve scope in the next line while favoring the return value — so that you would never have to say @user explicitly at all.

For example:


To my knowledge, no programming language does this. You can do stuff like that with method chaining:


And there was a with statement in Pascal and early versions of JavaScript that allowed something kind of remotely similar, but more explicit:

with(foo = 5) { console.log(foo); }

But it's not the same. I don't expect any programming language to ever really copy this aspect of Japanese, for the same reason that Google Translate basically sucks at Japanese, and I kind of expect that it always will: Japanese relies on context, i.e., information from outside the sentence, and writing code which can infer things from context, the way humans can, is not an easy thing to do. It's probably impossible to develop a programming language which properly emulates this aspect of Japanese without solving very hard problems in artificial intelligence and/or machine learning.

Update: I got a lot of responses to this about possible hacks to implement this, equivalents to with in other languages, and similar stuff. I plan to update with a new blog post semi-soonish but it's going to be after a slight delay, because some of the responses involve languages I'm not at all familiar with, so I want to investigate a little before I post the update.