How to parse large XML files with PHP

Scroll this

The simplest way to parse XML file is to use simplexml_load_file which will convert XML to the object. Problem with simplexml_load_file is that it will parse the whole file to the memory, which is not desirable when we are dealing with large XML documents.

For more info about simplexml_load_file memory consumption please check: “Get the real amount of memory allocated by the PHP  – including the resource types” .

The XMLReader provides a way to read XML file in a memory efficient way. XMLReader is streaming pull XML parser – which means it is very low-level and it will fetch next fragment of the document when it is told to do so. This makes XMLReader very memory efficient, but not so programmer friendly. Fortunately XMLReader and SimpleXML  can be combined.


Large XML file: feed_big.xml.gz . Around 40000 nodes, uncompressed size on disk 109Mb. This XML is very simple, lots of <prod>…</prod> nodes.

For example:


Example 1: simplexml_load_file


Example 2: XMLReader and SimpleXMLElement

The right way to process large XML file using XMLReader and SimpleXMLElement (to make programmer life a little bit easier):


Opens XML document. Since document is gziped ‘compress.zlib://’ compression wrapper is used:

Skips all the nodes until the first product is reached:

When the above while loop finishes – that means that XMLReader has either reached the first product, or the end of file is reached. In case the first product is reached document stream cursor will be at the first product node in the XML document, and we will enter the while loop below.

The XMLReader::readOuterXML() returns the contents of the current node as a string, only one node at the time will be parsed. When we are finished with this node, it is destroyed with unset so that PHP garbage collection can free it.

XMLReader::next() will jump to the next product node.

And at the end close the input which XMLReader is parsing:


method memory (kb)
custom memory_get_process_usage()
XMLReader and SimpleXMLElement

XMLReader and SimpleXMLElement used 30 times less memory, and memory consumption is not depended on the size of the XML document (number of nodes which we want to process in the XML Document).



  1. I used simplexml_load_file only on a very large XML – processing lasts about 90 minutes.
    With your solution only 90 Seconds 🙂

  2. I am facing an issue as my xml is much larger (500mb). i have to check two xml tags, one is “gs-local-feed” and under that I also got “school” tag. when I use
    while($xml->read() && $xml->name != ‘gs-local-feed’){;}
    while($xml->name == ‘gs-local-feed’){
    $element = new SimpleXMLElement($xml->readOuterXML());

    its only taking the first “school” value but i need the all values of “school” tag under “gs-local-fee”
    and when I use
    while($xml->read() && $xml->name != ‘school”){;}
    while($xml->name == ‘school’){
    $element = new SimpleXMLElement($xml->readOuterXML());

    then its only giving me all the values of “school” from the first “gs-local-feed” tag and stops the process.
    I hope I can make you understand the issue. need your help.

    • Hi Wasid,
      I think I do understand the problem. But I don’t see what is wrong with the first loop. The $element should contain all the content of gs-local-feed (including the schools).
      Did you try to post the question to the stackoverflow including the small sample of the XML and simple (but complete) test php script which demonstrates the problem?
      Also, is your XML valid (check it with one of many online validators)?

  3. Thank you, this is a neat solution. It gets “the best of both worlds” of XMLReader and SimpleXMLElement 🙂

  4. It does not seem like the script is available, there are no links to download

    • Hi, the complete example is included in the page (below label test02.php), you can copy and paste it to your editor.
      There is only link to download example xml file.

    • Hi, I am not sure what do you mean. The article example is only about reading the large xml file.

  5. Absolutely great piece of article. Thanks! Keep up the good work! 🙂

  6. Thank you I finally understood XML parsing with your examples!

  7. This is great!

    Would it be any different to have just


    instead of


    while($xml->read() && $xml->name != ‘prod’)

  8. What about comparing 2 big xml files and output the difference ?
    Is it better to use both XMLReader and SimpleXMLElement ?

    • Since they are big, it is always better to use XMLReader since it is more memory efficient.
      Lets assume that xml files can have the same content but not necessarily in the same order.
      In that case you might build in memory assoc array with id and md5 of other fields of each item.
      Then just read the other xml and compare the items via id and md5 value.

      If this array is too big to be kept in the memory save it temporary in the db, or directly to the filesystem.

  9. Very nice… your articles are very educational and extensively documented. Would love to read more… but it’s probably taking you a whole lot of time!


    • Hi, thank you very much for the feedback. Yes it does take some time – this is why I have few more articles half finished – but didn’t have time to publish them.

Submit a comment