The researchers at Facebook claim to have developed an artificially intelligent program that converts programming code from one high level language to another.
Researchers are calling the program a ‘neural transcompiler.’ It essentially uses unsupervised learning, where the program does not have to have a training set of data; it detects patterns in the datasets with minimal human supervision. The program can convert code from languages like C++, Java, and Python, into other high-level languages.
The conversion of code has proven to be an expensive and tedious, yet necessary task in this modern era of coding. More and more programs need to be converted to modern language like C++ and Python for them to be compatible with the programs of today. Transcompilers can be very difficult to build as all languages differ from one another in the way that they issue commands. They also use different language-specific APIs, standard-library function, as well as variable types.
Facebook’s newly developed system – The TransCoder uses an unsupervised learning approach. This approach bypasses the need for the program to be explicitly told what method will be converted to what. The TransCoder is initialised with cross-lingual language model pre-training – this maps equivalent instructions in two pieces of code onto one another. A process called denoising auto-encoding is also run – this essentially trains the system to output valid sequences of code, even when the input data is filled with noise. Another process called back-translation is used by the TransCoder to generate parallel data for training purposes.
After the development of the TransCoder, researchers have evaluated the program by gathering up 852 parallel functions (A function is a block of code designed to perform a specific action) in C++, Java, and Python from the popular coding solutions website, GeeksforGeeks. The functions have then been fed into the TransCoder and the accuracy of the results measured.
Facebook has reported that while the results of the tests were not strictly accurate, the TransCoder demonstrated an understanding of syntax for each specific language and correctly mapped many of the data structures and methods of each language. It did, however, have a hard time accounting for certain variables while generations but also, it outperformed many frameworks that are manually built and rewrite rules using expert knowledge.
“TransCoder can easily be generalized to any programming language, does not require any expert knowledge, and outperforms commercial solutions by a large margin,” said the authors of the TransCoder. “Our results suggest that a lot of mistakes made by the model could easily be fixed by adding simple constraints to the decoder to ensure that the generated functions are syntactically correct, or by using dedicated architectures.”