Google Play icon

Modeling the evolution of programming languages

Posted September 4, 2014
There are hundreds of programming languages but they are not just separate mechanisms for more of the same -- each language leaves an imprint in technological evolution and now scientists have learned how to track that down.

There are hundreds of programming languages, but they are not just separate mechanisms for more of the same — each language leaves an imprint in technological evolution and now scientists have learned how to track that down. Image credit: Sergi Valverde, Ricard Solé.

Darwin’s notion of natural selection is undoubtedly one of the most important ideas in the history of science. Powerful as it is, explaining a great deal of the complexities intrinsic to biology, many wonder if the same or some similarly elegant idea could be used to explain cultural change as well.

Darwin himself was interested in the relation between natural and human-driven change. Although many ideas from evolutionary biology have been applied in studying cultural change, there’s still a lot of debate as to whether natural and cultural phenomena evolve similarly.

One of the reasons why the debate persists is that it is not easy to compare and tell definitely where and how the two differ, because cultural evolution and technological innovation have not been modelled as extensively as biological change. This is mainly because cultural phenomena lack a “genome” that would serve as a change measure, with the exception of natural language for which researchers in formal linguistics have devised more than several metrics accounting for grammatical, pragmatic, phonetic, orthographic and semantic changes.

Other cultural phenomena, such as technological innovation, are much more difficult to quantify and measure adequately.

That is why Sergi Valverde and Ricard Solé of Santa Fe Institute build a graph-based evolutionary model of programming languages. The aim was to formulate a general theory of technological innovation – that is, to answer questions such as “what drives technological progress?”, “why do some ideas adapt and others die quickly?”, etc. Evolution of programming languages is a good indicator of innovation dynamics and thus serves as a good starting point.

Valverde and Solé sampled 347 programming languages spanning a period starting from 1952 when the first piece of code was written, up until 2010. Some of these languages still exist today, many of them died, and the model could give some clues for the reasons.

The model itself is based on a simple idea: languages influence one another; the task was to track down the influences, starting from the earliest to the latest programming languages, and somehow measuring the overall influence that could be attached to each one of them. The influence was measured according to the interrelatedness of both the posterior influenced languages as well as those that came before. A simple idea, yet the results seem to be sensible at least in terms of what is already known about programming languages.

First of all, languages represented as graph edges easily form subgraphs according to their lineage of influence. The largest subset is formed by 197 so-called procedural languages. Their lineage starts from Speedcoding – the oldest language in the database and a direct ancestor of Fortran, a language still used today in scientific computing.

The next biggest subgraph is made up of declarative languages, whose origins are associated with the launch of artificial intelligence in the 50s. The difference between the two is that procedural languages are more suitable for engineering – or perhaps more engineer-like – and are tied closer to specific details of the hardware used. Whereas declarative languages, whose lineage stars with IPL and takes off with Lisp, allow programmers to write code in terms of the scientific domain, with no regard as to the inside workings of the machine itself.

Some historians of science associate procedural programming with the idea of a Turing machine, a conceptual computing device that laid the foundations for a digital computer, and declarative programming with lambda calculus, a formalism for computable functions – both of which attain the same purpose of defining that which is computable, yet differing in levels of conceptual abstraction.

Most of modern programming languages are a mix of the two paradigms. Some time in the 80s the mix formed a new, so-called object-oriented group that takes the best of the two worlds – and thanks to this medley we have Python, Perl, Java, C and other major programming languages.

But besides adequatelly modeling easily explainable clustering of programming languages, Valverde’s and Solé’s model offers some new insights as well. An important observation is that the vertices of influence clearly go from declarative to imperative languages but not the other way around. In other words, the model suggests that the Lisp-like programming languages largely influenced the family stemming from Fortran, but were not clearly affected by the latter themselves.

Moreover, some of the lineages go on from the very first decade of coding up until today. This suggests that some of the early ideas in computing were not useless embryos but rather innovations that were ahead of their time.

But the real merit of Valverde’s and Solé’s model is that the simple idea of influence trees can be extended to many other artifacts of technological and cultural innovation, way beyond programming languages, which only serve as an example of a highly dynamic change in seemingly homogenous field of ideas.

Article: Sergi Valverde, Ricard Solé Punctuated Equilibrium in the Large Scale Evolution of Programming Languages, source link.

Featured news from related categories:

Technology Org App
Google Play icon
86,843 science & technology articles