Developers
July 1, 2020

Facebook Develops Programming AI to Convert Code

Facebook’s latest innovation may significantly improve the process of converting applications from one programming language to another.

One of the biggest challenges companies face is updating applications and migrating them from older, legacy programming languages to modern options. In fact, older languages create hardship and expense for organizations around the world, as they struggle to maintain aging systems.

The state of New Jersey is one such example. New Jersey Governor Phil Murphy recently had to ask for volunteer programmers proficient in COBOL. Despite being a 61-year-old programming language, much of the state’s systems run on older mainframes that rely on the language. As the coronavirus pandemic has caused an unprecedented need for state services, COBOL programmers have been in high demand.

The Bank of Australia is another example. COBOL has long been a popular choice for the financial industry, but the Bank of Australia wanted to upgrade and replace their legacy systems. The process ended up taking five years and costing some $750 million to convert the platform from COBOL to Java.

New Jersey and the Bank of Australia are not isolated cases. In fact, a 2017 survey found that 35 percent of companies need workers with legacy programming skills. Unfortunately, as these older languages continue to lose popularity, and the number of proficient programmers decreases, skilled developers can demand a premium, driving the cost of maintaining legacy systems up even more.

Facebook TransCoder AI

To help address this problem, researches at Facebook developed the TransCoder AI to help. The TransCoder AI is a neural transcompiler.

Being able to automatically translate code from one language or environment to another has long been considered the holy grail of programming, especially in the context of artificial intelligence (AI). One of the big challenges with existing transcompiler methods is the required expertise, both in the source and target language.

“Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one,” writes Facebook’s researchers. “They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is time- consuming and requires expertise in both the source and target languages, making code-translation projects expensive.”

Another big issue is the labor-intensive, rule-based approach that existing systems use.

“Currently, the majority of transcompilation tools are rule-based; they essentially tokenize the input source code and convert it into an Abstract Syntax Tree (AST) on which they apply handcrafted rewrite rules,” the researchers continue. “Creating them requires a lot of time, and advanced knowledge in both the source and target languages. Moreover, translating from a dynamically-typed language (e.g. Python) to a statically-typed language (e.g. Java) requires to infer the variable types which is difficult (and not always possible) in itself.”

Enter Machine Learning

The researchers took a cue from natural machine translation (NMT), where machine learning and neural network models have made significant advances in live translation software. In some cases, NMT models have reached the point where even professional translators use them to some degree. Using a neural model allows TransCoder to successfully convert code from one language to another without supervision.

“Although never provided with parallel data, the model manages to translate functions with a high accuracy, and to properly align functions from the standard library across the three languages, outperforming rule-based and commercial baselines by a significant margin,” describes the researchers. “Our approach is simple, does not require any expertise in the source or target languages, and can easily be extended to most programming languages. Although not perfect, the model could help to reduce the amount of work and the level of expertise required to successfully translate a codebase.”

Overall, the project was a big success, and showed what can be done when NMT and AI is used for even the more complicated tasks.

“In this paper, we show that approaches of unsupervised machine translation can be applied to source code to create a transcompiler in a fully unsupervised way,” conclude the researchers. “TransCoder can easily be generalized to any programming language, does not require any expert knowledge, and outperforms commercial solutions by a large margin. Our results suggest that a lot of mistakes made by the model could easily be fixed by adding simple constraints to the decoder to ensure that the generated functions are syntactically correct, or by using dedicated architectures. Leveraging the compiler output or other approaches such as iterative error correction could also boost the performance.”

The Future of Programming

While the use of AI has been steadily expanding across industries, programming has been largely immune—until now. With Facebook’s TransCoder AI, AI and machine learning take center stage, demonstrating what is possible.

It’s a safe bet that programming may never be the same again.

TagsProgramming LanguagesAITransCoderFacebook
Matt Milano
Technical Writer
Matt is a tech journalist and writer with a background in web and software development.

Related Articles

Back
DevelopersJuly 1, 2020
Facebook Develops Programming AI to Convert Code
Facebook’s latest innovation may significantly improve the process of converting applications from one programming language to another.

One of the biggest challenges companies face is updating applications and migrating them from older, legacy programming languages to modern options. In fact, older languages create hardship and expense for organizations around the world, as they struggle to maintain aging systems.

The state of New Jersey is one such example. New Jersey Governor Phil Murphy recently had to ask for volunteer programmers proficient in COBOL. Despite being a 61-year-old programming language, much of the state’s systems run on older mainframes that rely on the language. As the coronavirus pandemic has caused an unprecedented need for state services, COBOL programmers have been in high demand.

The Bank of Australia is another example. COBOL has long been a popular choice for the financial industry, but the Bank of Australia wanted to upgrade and replace their legacy systems. The process ended up taking five years and costing some $750 million to convert the platform from COBOL to Java.

New Jersey and the Bank of Australia are not isolated cases. In fact, a 2017 survey found that 35 percent of companies need workers with legacy programming skills. Unfortunately, as these older languages continue to lose popularity, and the number of proficient programmers decreases, skilled developers can demand a premium, driving the cost of maintaining legacy systems up even more.

Facebook TransCoder AI

To help address this problem, researches at Facebook developed the TransCoder AI to help. The TransCoder AI is a neural transcompiler.

Being able to automatically translate code from one language or environment to another has long been considered the holy grail of programming, especially in the context of artificial intelligence (AI). One of the big challenges with existing transcompiler methods is the required expertise, both in the source and target language.

“Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one,” writes Facebook’s researchers. “They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is time- consuming and requires expertise in both the source and target languages, making code-translation projects expensive.”

Another big issue is the labor-intensive, rule-based approach that existing systems use.

“Currently, the majority of transcompilation tools are rule-based; they essentially tokenize the input source code and convert it into an Abstract Syntax Tree (AST) on which they apply handcrafted rewrite rules,” the researchers continue. “Creating them requires a lot of time, and advanced knowledge in both the source and target languages. Moreover, translating from a dynamically-typed language (e.g. Python) to a statically-typed language (e.g. Java) requires to infer the variable types which is difficult (and not always possible) in itself.”

Enter Machine Learning

The researchers took a cue from natural machine translation (NMT), where machine learning and neural network models have made significant advances in live translation software. In some cases, NMT models have reached the point where even professional translators use them to some degree. Using a neural model allows TransCoder to successfully convert code from one language to another without supervision.

“Although never provided with parallel data, the model manages to translate functions with a high accuracy, and to properly align functions from the standard library across the three languages, outperforming rule-based and commercial baselines by a significant margin,” describes the researchers. “Our approach is simple, does not require any expertise in the source or target languages, and can easily be extended to most programming languages. Although not perfect, the model could help to reduce the amount of work and the level of expertise required to successfully translate a codebase.”

Overall, the project was a big success, and showed what can be done when NMT and AI is used for even the more complicated tasks.

“In this paper, we show that approaches of unsupervised machine translation can be applied to source code to create a transcompiler in a fully unsupervised way,” conclude the researchers. “TransCoder can easily be generalized to any programming language, does not require any expert knowledge, and outperforms commercial solutions by a large margin. Our results suggest that a lot of mistakes made by the model could easily be fixed by adding simple constraints to the decoder to ensure that the generated functions are syntactically correct, or by using dedicated architectures. Leveraging the compiler output or other approaches such as iterative error correction could also boost the performance.”

The Future of Programming

While the use of AI has been steadily expanding across industries, programming has been largely immune—until now. With Facebook’s TransCoder AI, AI and machine learning take center stage, demonstrating what is possible.

It’s a safe bet that programming may never be the same again.

Programming Languages
AI
TransCoder
Facebook
About the author
Matt Milano -Technical Writer
Matt is a tech journalist and writer with a background in web and software development.

Related Articles