How to convert XML to CSV in Python – Step by Step guide

Posted by Marta on October 9, 2021 Viewed 1593 times

Learn to convert XML to CSV in Python using three different Python modules: xml.etree, xmltodict and untangle. Find out which one is more efficient.

Card image cap

In this article, you will learn how you can convert an XML to CSV in Python. I will show you how to convert from XML to CSV in different ways using different libraries.

XML is a format that is slowly fading out and being replaced by JSON, however it is still present and used mainly in old platforms.

Converting XML to CSV generally involves the following steps:

  • Parse the XML file
  • Create a new CSV
  • Add a header to the CSV file, with the fields that should be included
  • For each xml element, extract the relevant fields and add new row to the CSV file.

In this tutorial, we will work with a file containing employees, where each employee has a name, a role and age. See xml file below:

<employees>
    <employee>
        <name>Dave</name>
        <role>Sale Assistant</role>
        <age>34</age>
    </employee>
    <employee>
        <name>Robert</name>
        <role>Product Expert</role>
        <age>38</age>
    </employee>
    <employee>
        <name>Jane</name>
        <role>Marketing Manager</role>
        <age>31</age>
    </employee>
</employees>

Let see the details to code this task using three different Python modules. Let’s get started.

Convert XML to CSV in plain Python

Firstly we will see an approach for converting XML to CSV which doesn’t require you to install any extra libraries. Python includes a module to manipulate XML called xml.etree.ElementTree. This module will allow you to easily parse and create XML files. See official documentation here.

The code below will import the xml data, extract the employees details( name, role and age) and create a row in the CSV file for each employee. To do so we will use three essential methods:

  • ElementTree.parse("<REPLACE BY XML FILE PATH">): to import the xml data into your program.
  • .findall("<REPLACE_BY_XML_TAG>"): it will find all elements with a given tag that are direct children of the current element.
  • .find("<REPLACE_BY_XML_TAG>"): This method will find the first child element with the a given tag.
from xml.etree import ElementTree
import csv

# PARSE XML
xml = ElementTree.parse("data.xml")

# CREATE CSV FILE
csvfile = open("data.csv",'w',encoding='utf-8')
csvfile_writer = csv.writer(csvfile)

# ADD THE HEADER TO CSV FILE
csvfile_writer.writerow(["name","role","age"])

# FOR EACH EMPLOYEE
for employee in xml.findall("employee"):
    
    if(employee):
       
       # EXTRACT EMPLOYEE DETAILS  
      name = employee.find("name")
      role = employee.find("role")
      age = employee.find("age")
      csv_line = [name.text, role.text, age.text]

      # ADD A NEW ROW TO CSV FILE
      csvfile_writer.writerow(csv_line)
      

The output is a file called data.csv, generated in the same folder as your python program, containing the following:

name,role,age
Dave,Sale Assistant,34
Robert,Product Expert,38
Jane,Marketing Manager,31

XML to CSV using Python xmltodict module

Another way of working with XML is converting the XML file to a Python object. This will allow you to avoid all xml processing to search for element through the XML file. The code below will do exactly the same as the code above, but using the xmltodict module to parse the xml, instead of xml.etree.

The xmltodict is a third-party module, meaning you will need to install it separately. To do so, run the following command from your terminal:

>> pip install xmltodict

The module xmltodict offers a .parse() which will read the xml and then convert it to an ordered dictionary so the data is easily accesible.

import xmltodict, csv

# PARSE XML FILE
with open("data.xml") as xmlfile:
    xml = xmltodict.parse(xmlfile.read())

# CREATE CSV FILE
csvfile = open("data2.csv",'w',encoding='utf-8')
csvfile_writer = csv.writer(csvfile)

# ADD HEADER
csvfile_writer.writerow(["name","role","age"])

#FOR EACH EMPLOYEE
for employee in xml["employees"]["employee"]:

    # EXTRACT EMPLOYEE DETAILS
    csv_line = [employee["name"], employee["role"],employee["age"]]

    # ADD A NEW ROW TO CSV FILE
    csvfile_writer.writerow(csv_line)

The output of the above code snippet is a file called data2.csv containing the employees information.

Another useful thing to know about the xmltodict module is that it allows you to convert a dictionary to xml.

XML to CSV using Python untangle module

Lastly you can convert XML to CSV using a Python module called untangle. Similarly to the xmltodict, the untangle module parses an XMl document and make the data easily accessible.

untangle is a third-party module as well, therefore you will need to install it separetaly usign pip as follows:

>> pip install untangle

Below there is a code snippet that will import the content of an XML file, create a CSV file, and then extract the data in the XML file and write in the CSV, using the untangle module.

import untangle, csv

# PARSE XML FILE
xml = untangle.parse("data.xml")

# CREATE CSV FILE
csvfile = open("data3.csv",'w',encoding='utf-8')
csvfile_writer = csv.writer(csvfile)

# ADD HEADER
csvfile_writer.writerow(["name","role","age"])

# FOR EACH EMPLOYEE
for employee in xml.employees.employee:

    # EXTRACT EMPLOYEE DETAILS
    csv_line = [employee.name.cdata, employee.role.cdata, employee.age.cdata]

    # ADD NEW ROW TO CSV FILE
    csvfile_writer.writerow(csv_line)

Which one is more efficient?

Since the approaches to convert xml and csv are quite similar, which one should you choose? Choosing the fastest is always a good idea.

To compare these three modules, I have created an XML file with 600 employees and get the running time in milliseconds for each module. See the results below:

xml.treexmltodictuntangle
1st time8.5278320312520.31787109375243.1640625
2nd time4.68188476562518.23022460937579.925048828125
3rd time5.75195312518.85791015625108.099853515625
4th time8.5200195312519.29492187594.8759765625
5th time6.88305664062518.34301757812595.56103515625
6th time6.98022460937520.73803710937576.714111328125

As you can see the xml.tree module is at least twice faster than the xmltodict module, and much faster than the untangle module.

Conclusion

To summarise, there are a few modules available in Python that allow you to convert XML to CSV. Which one you choose depends on what you are trying to achieve. If you are only converting XML to CSV and I will recommend to use xml.etree built-in python module which provides the best performance. However, if you need to also create an XML file, then the xmltodict is a really handy tool.

I hope this article was useful and thank you for reading, and supporting this blog. Happy coding!

Project-Based Programming Introduction

Steady pace book with lots of worked examples. Starting with the basics, and moving to projects, data visualisation, and web applications

100% Recommended book for Java Beginners

Unique lay-out and teaching programming style helping new concepts stick in your memory

90 Specific Ways to Write Better Python

Great guide for those who want to improve their skills when writing python code. Easy to understand. Many practical examples

Grow Your Java skills as a developer

Perfect Boook for anyone who has an alright knowledge of Java and wants to take it to the next level.

Write Code as a Professional Developer

Excellent read for anyone who already know how to program and want to learn Best Practices

Every Developer should read this

Perfect book for anyone transitioning into the mid/mid-senior developer level

Great preparation for interviews

Great book and probably the best way to practice for interview. Some really good information on how to perform an interview. Code Example in Java