How to convert XML to JSON in Python – Step by Step guide

Posted by Marta on May 8, 2021 Viewed 12789 times

In this article I will share how you can convert an XML to JSON in Python and the different python modules available to achieve this.

Card image cap

In this article, I will share how you can convert an XML to JSON in Python. Converting XML can be useful if you work on an API that returns data in JSON format and the data source is in XML format.

The data source could be an XML file. It could also be another API that returns XML, for example, a SOAP API. Or the data source could be an HTML page from which you want to extract some information. These are just some possible scenarios where you will need to deal with XML.

Besides converting XML to JSON, I will also share a few more libraries and useful operations when manipulating XML. Let’s get started.

Convert python XML to JSON

Python doesn’t include any built-in library to work with XML. Therefore you will need to install it using pip. The library I will be using to work with XML is xmltodict. Execute the following command from your terminal to install this module:

pip install xmltodict

You will also use the json library; however, this module is a built-in module, so you got all the libraries you need now. The code to convert XML to JSON is quite simple, just two lines.

Here is the XML I will use for this example:

<employees>
	<employee>
  		<name>Dave</name>
        <role>Sale Assistant</role>
        <age>34</age>
    </employee>
</employees>

Here is the code:

import xmltodict, json

obj = xmltodict.parse("""
<employees>
	<employee>
  		<name>Dave</name>
        <role>Sale Assistant</role>
        <age>34</age>
    </employee>
</employees>
""")
print(json.dumps(obj))

Output:

{"employees": {"employee": {"name": "Dave", "role": "Sale Assistant", "age": "34"}}}

The xmltodict.parse() method will convert the XML to a python object that can then be converted to JSON.

Let’s say that you need to access the employee name.

print(obj["employees"]["employee"]['name'])

Output:

Dave

XML File to JSON

What if the XML you would like to convert it into a file. Here is how you can convert it.

data.xml

<employees>
	<employee>
  		<name>Dave</name>
        <role>Sale Assistant</role>
        <age>34</age>
    </employee>
</employees>

The code:

with open('data.xml', 'r') as myfile:
    obj = xmltodict.parse(myfile.read())
print(json.dumps(obj))

Output:

{"employees": {"employee": {"name": "Dave", "role": "Sale Assistant", "age": "34"}}}

Convert XML to Python Object

Another library used to convert XML is untangle. Run the following command in your terminal to install the library:

pip install untangle

Here is the code to convert an XML to a python object using untangle:

import untangle
obj = untangle.parse("""
    <employees>
	    <employee>
  		    <name>Dave</name>
            <role>Sale Assistant</role>
            <age>34</age>
        </employee>
    </employees>
""")
#Access the name
print(obj.employees.employee.name.cdata)

Output:

Dave

The untangle.parse() method also supports reading from a file:

import untangle
obj = untangle.parse('data.xml')
print(obj.employees.employee.name.cdata)

Output:

Dave

A nice thing about the untangle library is that it provides slightly more convenient access to the data, and supports file access.

#xmltodict
print(obj["employees"]["employee"]['name'])
#untangle
print(obj.employees.employee.name.cdata)

The downside is that you can’t convert the python object to JSON using the json library.

Pandas XML to JSON

Another module that can convert XML to JSON is Pandas. Pandas is a library mainly used in data science for data cleaning. Pandas supports reading for .csv and .json out of the box. Just with a single line, you can convert a CSV file to a dataframe.

XML reading is not supported out of the box, though. If you like to read XML, you will need to install the library pandas-read-xml. You can install it by running the following command on your terminal:

pip install pandas-read-xml

Once you installed the module, use the following code to convert from XML to JSON:

import pandas_read_xml as pdx
df = pdx.read_xml("data.xml")
print(df.to_json())

This module needs other modules to work(idna, numpy, chardet, pytz, six, python-dateutil, pandas, certifi, urllib3, requests, pyarrow); therefore, it takes up a lot more memory. If you only want to convert an XML file to JSON, using xmltodict is a lot more efficient.

Beautifulsoup XML to Json

Beautifulsoup is a library whose primary purpose is parsing HTML. Besides HTML, Beautifulsoup can parse XML using a third-party parser called lxml. First, you will need to install both the beautifulsoup and the lxml module, running the following commands:

pip install beautifulsoup4
pip install lxml

Once you installed both modules, use the following code to load the XML and convert it to JSON.

from bs4 import BeautifulSoup
import json

#Load xml
xml_parser = BeautifulSoup(open('data.xml'), 'xml')

#Extract relevant information
name = xml_parser.find('name').contents[0]
age = xml_parser.find('age').contents[0]
role = xml_parser.find('role').contents[0]

employee = {
    'name':name,
    'age': age,
    'role': role
}
print(json.dumps(employee))

Output

{"name": "Dave", "age": "34", "role": "Sale Assistant"}

What does this code do? Initially, It loads the XML and extracts the information to convert to JSON. Lastly, please put this information into a python dictionary and convert it to JSON using the json built-in module.

Python XML to Dict

The xmltodict.parse() is all you need to convert an XML to a Python Dict. Just importing the library as we have seen in a previous section, and then parse the XML. Here is a code example:

import xmltodict, json

obj = xmltodict.parse("""
<employees>
	<employee>
  		<name>Dave</name>
        <role>Sale Assistant</role>
        <age>34</age>
    </employee>
</employees>
""")
print(tytpe(obj))

Output:

<class 'collections.OrderedDict'>

Python deserialize XML

Deserialising means that you are converting a byte stream into a python object in memory. An example is loading the content of an XML file into memory. As we have seen, you can do this using several modules: xmltodict, untangle, pandas, and beautifulsoup. If you are only deserializing XML, xmltodict is the lighter and most straightforward option. Because although all modules allow you to do the conversion just with a few lines of codes, pandas and beautilsoup provide a lot more functionality, and therefore the modules take up a lot more memory.

XML Pretty Print in Python

Some of the libraries used in the previous sections provide “prettyfy” functionality. This capability is convenient since it makes the XML a lot more readable and manageable.

The beautifulsoup provides pretty print functionality. Here is an example of how to use it:

xml_parser = BeautifulSoup(open('data.xml'), 'xml')
print(xml_parser.prettify())

Please note that you will need to install the beautifulsoup module and lxml for this code to work. The output of the above code will be:

<?xml version="1.0" encoding="utf-8"?>
<employees>
 <employee>
  <name>
   Dave
  </name>
  <role>
   Sale Assistant
  </role>
  <age>
   34
  </age>
 </employee>
</employees>

The xmltodict module also provides ‘pretty print’ functionality. Here is an example of how to use it:

obj = xmltodict.parse("""
<employees>
	<employee>
  		<name>Dave</name>
        <role>Sale Assistant</role>
        <age>34</age>
    </employee>
</employees>
""")

print(xmltodict.unparse(obj, pretty=True))

Output:

<?xml version="1.0" encoding="utf-8"?>
<employees>
	<employee>
		<name>Dave</name>
		<role>Sale Assistant</role>
		<age>34</age>
	</employee>
</employees>

Conclusion

To summarise, there are a few modules available in Python that allow you to convert XML to JSON. Which one you choose depends on what you are trying to achieve. If you are converting XML to JSON and don’t need to analyze or do further parsing, I will choose xmltodict, which is the lighter option. However, if you need to extract some specific information or clean up the data extracted from the XML, pandas and beautifulsoup, it is more appropriate since you get all in one.

I hope this article was useful and thank you for reading, and supporting this blog. Happy coding!

Recommended articles

Project-Based Programming Introduction

Steady pace book with lots of worked examples. Starting with the basics, and moving to projects, data visualisation, and web applications

100% Recommended book for Java Beginners

Unique lay-out and teaching programming style helping new concepts stick in your memory

90 Specific Ways to Write Better Python

Great guide for those who want to improve their skills when writing python code. Easy to understand. Many practical examples

Grow Your Java skills as a developer

Perfect Boook for anyone who has an alright knowledge of Java and wants to take it to the next level.

Write Code as a Professional Developer

Excellent read for anyone who already know how to program and want to learn Best Practices

Every Developer should read this

Perfect book for anyone transitioning into the mid/mid-senior developer level

Great preparation for interviews

Great book and probably the best way to practice for interview. Some really good information on how to perform an interview. Code Example in Java