Posted by Marta on February 2, 2023 Viewed 78271 times
In this article, I will share with you how you can convert an XML to JSON in Python. Converting XML to JSON can be useful if you work on an API that returns data in JSON format, and the data source is in XML format.
For instance, the data source could be an XML file. It could also be another API that returns XML, like a SOAP API. Or the data source could be an HTML page from which you want to extract some information. These are just some possible scenarios where you will need to deal with XML.
Besides converting XML to JSON, I will also share a few more libraries and useful operations when manipulating XML. Let’s get started.
One of the possibilities is using xmltodict
. This is a third-party library, meaning you will need to install it using pip
. To install the library, you should run the following command from your terminal:
pip install xmltodict
You will also use the json
library; however, this module is a built-in module, so you got all the libraries you need now. The code to convert XML to JSON is quite simple, just two lines.
Here is the XML I will use for this example:
<employees> <employee> <name>Dave</name> <role>Sale Assistant</role> <age>34</age> </employee> </employees>
Here is the code:
import xmltodict, json obj = xmltodict.parse(""" <employees> <employee> <name>Dave</name> <role>Sale Assistant</role> <age>34</age> </employee> </employees> """) print(json.dumps(obj))
Output:
{"employees": {"employee": {"name": "Dave", "role": "Sale Assistant", "age": "34"}}}
The xmltodict.parse()
method will convert the XML to a python object that can then be converted to JSON.
Let’s say that you need to access the employee name.
print(obj["employees"]["employee"]['name'])
Output:
Dave
What if the XML you would like to convert it into a file. Here is how you can convert it.
data.xml
<employees> <employee> <name>Dave</name> <role>Sale Assistant</role> <age>34</age> </employee> </employees>
The code:
with open('data.xml', 'r') as myfile: obj = xmltodict.parse(myfile.read()) print(json.dumps(obj))
Output:
{"employees": {"employee": {"name": "Dave", "role": "Sale Assistant", "age": "34"}}}
Another library used to convert XML is untangle
. Run the following command in your terminal to install the library:
pip install untangle
Here is the code to convert an XML to a python object using untangle
:
import untangle obj = untangle.parse(""" <employees> <employee> <name>Dave</name> <role>Sale Assistant</role> <age>34</age> </employee> </employees> """) #Access the name print(obj.employees.employee.name.cdata)
Output:
Dave
The untangle.parse()
method also supports reading from a file:
import untangle obj = untangle.parse('data.xml') print(obj.employees.employee.name.cdata)
Output:
Dave
A nice thing about the untangle
library is that it provides slightly more convenient access to the data, and supports file access.
#xmltodict print(obj["employees"]["employee"]['name']) #untangle print(obj.employees.employee.name.cdata)
The downside is that you can’t convert the python object to JSON using the json
library.
Another module that can convert XML to JSON is Pandas. Pandas is a library mainly used in data science for data cleaning. Pandas supports reading for .csv
and .json
out of the box. Just with a single line, you can convert a CSV file to a dataframe.
XML reading is not supported out of the box, though. If you like to read XML, you will need to install the library pandas-read-xml
. You can install it by running the following command on your terminal:
pip install pandas-read-xml
Once you installed the module, use the following code to convert from XML to JSON:
import pandas_read_xml as pdx df = pdx.read_xml("data.xml") print(df.to_json())
This module needs other modules to work(idna, numpy, chardet, pytz, six, python-dateutil, pandas, certifi
, urllib3, requests, pyarrow
); therefore, it takes up a lot more memory. If you only want to convert an XML file to JSON, using xmltodict is a lot more efficient.
Beautifulsoup is a library whose primary purpose is parsing HTML. Besides HTML, Beautifulsoup can parse XML using a third-party parser called lxml
. First, you will need to install both the beautifulsoup and the lxml module, running the following commands:
pip install beautifulsoup4 pip install lxml
Once you installed both modules, use the following code to load the XML and convert it to JSON.
from bs4 import BeautifulSoup import json #Load xml xml_parser = BeautifulSoup(open('data.xml'), 'xml') #Extract relevant information name = xml_parser.find('name').contents[0] age = xml_parser.find('age').contents[0] role = xml_parser.find('role').contents[0] employee = { 'name':name, 'age': age, 'role': role } print(json.dumps(employee))
Output
{"name": "Dave", "age": "34", "role": "Sale Assistant"}
What does this code do? Initially, It loads the XML and extracts the information to convert to JSON. Lastly, please put this information into a python dictionary and convert it to JSON using the json
built-in module.
The xmltodict.parse()
is all you need to convert an XML to a Python Dict. Just importing the library as we have seen in a previous section, and then parse the XML. Here is a code example:
import xmltodict, json obj = xmltodict.parse(""" <employees> <employee> <name>Dave</name> <role>Sale Assistant</role> <age>34</age> </employee> </employees> """) print(tytpe(obj))
Output:
<class 'collections.OrderedDict'>
Deserialising means that you are converting a byte stream into a python object in memory. An example is loading the content of an XML file into memory. As we have seen, you can do this using several modules: xmltodict
, untangle
, pandas
, and beautifulsoup
. If you are only deserializing XML, xmltodict
is the lighter and most straightforward option. Because although all modules allow you to do the conversion just with a few lines of codes, pandas
and beautilsoup
provide a lot more functionality, and therefore the modules take up a lot more memory.
Some of the libraries used in the previous sections provide “prettyfy” functionality. This capability is convenient since it makes the XML a lot more readable and manageable.
The beautifulsoup
provides pretty print functionality. Here is an example of how to use it:
xml_parser = BeautifulSoup(open('data.xml'), 'xml') print(xml_parser.prettify())
Please note that you will need to install the beautifulsoup
module and lxml
for this code to work. The output of the above code will be:
<?xml version="1.0" encoding="utf-8"?> <employees> <employee> <name> Dave </name> <role> Sale Assistant </role> <age> 34 </age> </employee> </employees>
The xmltodict
module also provides ‘pretty print’ functionality. Here is an example of how to use it:
obj = xmltodict.parse(""" <employees> <employee> <name>Dave</name> <role>Sale Assistant</role> <age>34</age> </employee> </employees> """) print(xmltodict.unparse(obj, pretty=True))
Output:
<?xml version="1.0" encoding="utf-8"?> <employees> <employee> <name>Dave</name> <role>Sale Assistant</role> <age>34</age> </employee> </employees>
To summarise, there are a few modules available in Python that allow you to convert XML to JSON. Which one you choose depends on what you are trying to achieve. If you are converting XML to JSON and don’t need to analyze or do further parsing, I will choose xmltodict
, which is the lighter option. However, if you need to extract some specific information or clean up the data extracted from the XML, pandas
and beautifulsoup
, it is more appropriate since you get all in one.
I hope this article was useful and thank you for reading, and supporting this blog. Happy coding!
Steady pace book with lots of worked examples. Starting with the basics, and moving to projects, data visualisation, and web applications
Unique lay-out and teaching programming style helping new concepts stick in your memory
Great guide for those who want to improve their skills when writing python code. Easy to understand. Many practical examples
Perfect Boook for anyone who has an alright knowledge of Java and wants to take it to the next level.
Excellent read for anyone who already know how to program and want to learn Best Practices
Perfect book for anyone transitioning into the mid/mid-senior developer level
Great book and probably the best way to practice for interview. Some really good information on how to perform an interview. Code Example in Java