Evaluating the ChatGPT Code Interpreter

In the exciting era of artificial intelligence, plugins such as OpenAI's ChatGPT Code Interpreter have emerged as indispensable tools, empowering developers to explore, troubleshoot, and gain deeper insights into their code. In this article, we will delve into the capabilities and constraints of OpenAI's new code interpreter plugin. In a subsequent post, we will introduce an alternate Open Source approach to dynamic code execution using a Docker container running locally.

Enable the Plugin

To enable the ChatGPT plugin for the code interpreter, you will need a ChatGPT Plus account. This account is $20/month. Keep in mind that OpenAI is currently limiting chat with GPT4 to 25 messages in a 3 hour period.

The plugin is enabled by going into settings and Beta Features:

Ensure the code interpreter slider is enabled.

Getting Data into the Interpreter

The code interpreter plugin utilizes generative Python and a sandboxed Python execution environment. Code that is generated by the LLM can access local information to the sandbox, but is unable to connect to the Internet to download or access external data. We'll put this to the test in a minute. In order for the interpreter to operate on your data, you'll need to upload it first.

For this post, we have used ChatGPT-3.5-Turbo to generate a list of animals with their average height listed in meters:

After creating a local file and (correctly) naming it `animals.csv` we then create a new chat and select GPT-4, code interpreter.

We use the + sign in the chat entry box to add the file:

Writing Code with Words

We now indicate what we want done with the file. In this case we want to graph the height of the animals, and we want the names readable vertically. The first time we tried this, we used a prompt that didn't ask to deduplicate the list of animals (there were duplicates in the file) and we didn't ask to map the names vertically, which caused overlap in the labels at the bottom of the graph.

An updated prompt was generated which ended up including a small amount of related code from a Google search on "using matplotlib to generate a bar chart with vertical labels". We then paste the updated request, plus the code into the chat, along with the uploaded CSV file:

Just like regular programming, sometimes we need "hints" from the Internet to complete our objective. Here the LLM was able to take what we said and integrate it into an approach to achieve vertical labels for the graph. After a bit of waiting, we get the following result:

Trying to Break Things

At this point we start wondering if we can do something more with the Python interpreter that the system is running. So, we make a request to fetch something over the Internet, in this case an ask for the system to fetch the top stories from HackerNews:

The LLM is now telling us it is unable to run the code in the interpretive environment, presumably because it doesn't have access to the network. We now consider "encrypting" the request and putting it in an eval() statement. We start by asking it to write something that it then evals:

That looks promising, so now we take it a step further and ask for it to decode a block of Python that has been "encrypted" with base64 encoding to ping Google:

We get the following response:

gpt-4> As expected, this code attempts to use the os.system() function to ping www.google.com, and then prints a success message if the ping is successful, or a failure message otherwise. As I mentioned earlier, executing this code in this environment is not possible due to security restrictions.

We might continue this line of reasoning and try to encrypt the message further, but we're not doing pen testing here. We've done enough to see that OpenAI has taken precautions to these types of approaches, and are using GPT-4 to help mitigate attack vectors. There's no sense getting crazy with this and getting in trouble, at least for this blog post.

Wrapping Up

In conclusion, OpenAI's ChatGPT Code Interpreter plugin provides developers with a user-friendly and secure environment to interactively work with code. By integrating natural language prompts with code execution, the plugin enables developers to explore, troubleshoot, and gain insights into their code more effectively.

While the plugin has limitations, such as the inability to access the internet, it strikes a balance between functionality and security. As AI continues to advance, tools like the ChatGPT Code Interpreter showcase the potential of combining natural language processing and code execution to enhance developer workflows and drive innovation.