Posted by Marta on March 22, 2023 Viewed 46342 times
In this article, you will learn how you can convert an XML to CSV in Python. I will show you how to convert from XML to CSV in different ways using different libraries.
XML is a format that is slowly fading out and being replaced by JSON, however it is still present and used mainly in old platforms.
Converting XML to CSV generally involves the following steps:
In this tutorial, we will work with a file containing employees, where each employee has a name, a role and age. See xml file below:
<employees> <employee> <name>Dave</name> <role>Sale Assistant</role> <age>34</age> </employee> <employee> <name>Robert</name> <role>Product Expert</role> <age>38</age> </employee> <employee> <name>Jane</name> <role>Marketing Manager</role> <age>31</age> </employee> </employees>
Let see the details to code this task using three different Python modules. Let’s get started.
Watch this tutorial on Youtube
Firstly we will see an approach for converting XML to CSV which doesn’t require you to install any extra libraries. Python includes a module to manipulate XML called xml.etree.ElementTree
. This module will allow you to easily parse and create XML files. See official documentation here.
The code below will import the xml data, extract the employees details( name, role and age) and create a row in the CSV file for each employee. To do so we will use three essential methods:
ElementTree.parse("<REPLACE BY XML FILE PATH">)
: to import the xml data into your program. .findall("<REPLACE_BY_XML_TAG>")
: it will find all elements with a given tag that are direct children of the current element..find("<REPLACE_BY_XML_TAG>")
: This method will find the first child element with the a given tag.from xml.etree import ElementTree import csv # PARSE XML xml = ElementTree.parse("data.xml") # CREATE CSV FILE csvfile = open("data.csv",'w',encoding='utf-8') csvfile_writer = csv.writer(csvfile) # ADD THE HEADER TO CSV FILE csvfile_writer.writerow(["name","role","age"]) # FOR EACH EMPLOYEE for employee in xml.findall("employee"): if(employee): # EXTRACT EMPLOYEE DETAILS name = employee.find("name") role = employee.find("role") age = employee.find("age") csv_line = [name.text, role.text, age.text] # ADD A NEW ROW TO CSV FILE csvfile_writer.writerow(csv_line) csvfile.close()
The output is a file called data.csv
, generated in the same folder as your python program, containing the following:
name,role,age Dave,Sale Assistant,34 Robert,Product Expert,38 Jane,Marketing Manager,31
Converting an XML file to a Python object is an alternative approach to working with XML. It enables you to circumvent the XML processing, therefore you would need to search elements throughout the file. By utilizing the xmltodict
module to parse the XML, you can achieve the same outcome as the previous code demonstrated, using xml.etree instead.
The xmltodict
is a third-party module, meaning you will need to install it separately. To do so, run the following command from your terminal:
>> pip install xmltodict
The module xmltodict
offers a .parse()
which will read the xml and then convert it to an ordered dictionary so the data is easily accesible.
import xmltodict, csv # PARSE XML FILE with open("data.xml") as xmlfile: xml = xmltodict.parse(xmlfile.read()) # CREATE CSV FILE csvfile = open("data2.csv",'w',encoding='utf-8') csvfile_writer = csv.writer(csvfile) # ADD HEADER csvfile_writer.writerow(["name","role","age"]) #FOR EACH EMPLOYEE for employee in xml["employees"]["employee"]: # EXTRACT EMPLOYEE DETAILS csv_line = [employee["name"], employee["role"],employee["age"]] # ADD A NEW ROW TO CSV FILE csvfile_writer.writerow(csv_line)
The output of the above code snippet is a file called data2.csv
containing the employees information.
Another useful thing to know about the xmltodict
module is that it allows you to convert a dictionary to xml.
Lastly you can convert XML to CSV using a Python module called untangle. Similarly to the xmltodict
, the untangle
module parses an XMl document and make the data easily accessible.
untangle
is a third-party module as well, therefore you will need to install it separetaly usign pip as follows:
>> pip install untangle
Below there is a code snippet that will import the content of an XML file, create a CSV file, and then extract the data in the XML file and write in the CSV, using the untangle
module.
import untangle, csv # PARSE XML FILE xml = untangle.parse("data.xml") # CREATE CSV FILE csvfile = open("data3.csv",'w',encoding='utf-8') csvfile_writer = csv.writer(csvfile) # ADD HEADER csvfile_writer.writerow(["name","role","age"]) # FOR EACH EMPLOYEE for employee in xml.employees.employee: # EXTRACT EMPLOYEE DETAILS csv_line = [employee.name.cdata, employee.role.cdata, employee.age.cdata] # ADD NEW ROW TO CSV FILE csvfile_writer.writerow(csv_line)
Since the approaches to convert xml and csv are quite similar, which one should you choose? Choosing the fastest is always a good idea.
To compare these three modules, I have created an XML file with 600 employees and get the running time in milliseconds for each module. See the results below:
xml.tree | xmltodict | untangle | |
1st time | 8.52783203125 | 20.31787109375 | 243.1640625 |
2nd time | 4.681884765625 | 18.230224609375 | 79.925048828125 |
3rd time | 5.751953125 | 18.85791015625 | 108.099853515625 |
4th time | 8.52001953125 | 19.294921875 | 94.8759765625 |
5th time | 6.883056640625 | 18.343017578125 | 95.56103515625 |
6th time | 6.980224609375 | 20.738037109375 | 76.714111328125 |
As you can see the xml.tree
module is at least twice faster than the xmltodict
module, and much faster than the untangle
module.
To summarise, there are a few modules available in Python that allow you to convert XML to CSV. Which one you choose depends on what you are trying to achieve. If you are only converting XML to CSV and I will recommend to use xml.etree
built-in python module which provides the best performance. However, if you need to also create an XML file, then the xmltodict
is a really handy tool.
I hope this article was useful and thank you for reading, and supporting this blog. Happy coding!
Steady pace book with lots of worked examples. Starting with the basics, and moving to projects, data visualisation, and web applications
Unique lay-out and teaching programming style helping new concepts stick in your memory
Great guide for those who want to improve their skills when writing python code. Easy to understand. Many practical examples
Perfect Boook for anyone who has an alright knowledge of Java and wants to take it to the next level.
Excellent read for anyone who already know how to program and want to learn Best Practices
Perfect book for anyone transitioning into the mid/mid-senior developer level
Great book and probably the best way to practice for interview. Some really good information on how to perform an interview. Code Example in Java