PHP has two functions which will return amount of memory allocated by the process: memory_get_usage and memory_get_peak_usage.
int memory_get_usage ([ bool $real_usage = false ] )
Returns current amount of the memory in bytes. If $real_usage is false it will return only used amount of memory. If set to true it will return all memory allocated from the system.
int memory_get_peak_usage ([ bool $real_usage = false ] )
Returns maximum amount of the memory in bytes from the start of the process till the moment it was called. If $real_usage is false it will return only used amount of memory. If set to true it will return all memory allocated from the system.
Problem
In most cases these to functions return usefully data. But they don’t return the amount of the memory which is used by the PHP resources.
Quick reminder, list of PHP types:
- boolean
- integer
- float floating-point number
- string
- array
- object
- callable
- iterable
- resource
- NULL
Resource is a variable that holds the reference to the external resource. Memory for resource is allocated outside of the ZEND engine in the external libraries. So memory_get_usage and memory_get_peak_usage will not return that memory, although your PHP script is using that memory, and that memory was in a fact allocated from the system.
In order to be sure how efficient you script is, you need to be aware which resources you are using and how much burden they put on the whole system.
For list of functions that create or destroy resources please check: php.net/manual/en/resource.php.
Typical example of resource type usage, which can consume huge amounts of memory is XML parsing.
SimpleXMLElement as returned by the, for example,
simplexml_load_file is in a fact a resource.
PHP programmers works with
SimpleXMLElement as it is “ordinary” object, but since it is not an object you can not store it in
$_SESSION and its node values must be cast to some other type. And of course it’s memory usage is not reported by PHP memory_get_* functions.
Solution
If you are using Unix type OS, you can check process memory consumption of the process in the /proc virtual file system. On my Linux system it is in:
1 |
/proc//status |
Example
example_01.php:
1 2 3 4 5 |
<?php $status = file_get_contents('/proc/' . getmypid() . '/status'); print $status . "\n" |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
[damir@buffy memory]$ php -q example_01.php Name: php State: R (running) Tgid: 8795 Ngid: 0 Pid: 8795 PPid: 4809 TracerPid: 0 Uid: 1000 1000 1000 1000 Gid: 1000 1000 1000 1000 FDSize: 256 Groups: 10 48 1000 1001 1002 VmPeak: 326408 kB VmSize: 326400 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 13144 kB VmRSS: 13140 kB RssAnon: 6004 kB RssFile: 7136 kB RssShmem: 0 kB VmData: 5036 kB VmStk: 132 kB VmExe: 4028 kB VmLib: 25544 kB VmPTE: 428 kB VmSwap: 0 kB Threads: 1 SigQ: 1/62712 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000001000 SigCgt: 0000000184000000 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000001fffffffff CapAmb: 0000000000000000 Seccomp: 0 Cpus_allowed: ff Cpus_allowed_list: 0-7 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001 Mems_allowed_list: 0 voluntary_ctxt_switches: 0 nonvoluntary_ctxt_switches: 4 |
From above output we are interested in VmRSS which stands for Virtual Memory Resident Set Size. Resident Set Size is allocated memory by the process which resides in the main memory (RAM). Rest is in the swap space. In the above example VmSwap is 0 kb.
Lets write function that will return VmRss+VmSwap:
example_02.php
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
<?php print memory_get_process_usage() . "\n"; /** * Returns memory usage from /proc<PID>/status in bytes. * * @return int|bool sum of VmRSS and VmSwap in bytes. On error returns false. */ function memory_get_process_usage() { $status = file_get_contents('/proc/' . getmypid() . '/status'); $matchArr = array(); preg_match_all('~^(VmRSS|VmSwap):\s*([0-9]+).*$~im', $status, $matchArr); if(!isset($matchArr[2][0]) || !isset($matchArr[2][1])) { return false; } return intval($matchArr[2][0]) + intval($matchArr[2][1]); } |
Output:
1 2 |
[damir@buffy memory]$ php -q example_02.php 13160 |
Excellent, now it is time for realistic test and comparison between PHP memory_get_peak_usage and memory_get_usage and our get_process_memory_usage .
Test
Parse small and then large xml file and analyze memory consumption as reported by three function mentioned above. We will use fragment of XML feed from one of the bicycle shop partners from my bicycle shopping directory www.bicycle-discounts.com.
Small XML file: feed_sml.xml.gz . Around 13 items , uncompressed size on disk 29Kb.
Large XML file: feed_big.xml.gz . Around 40000 items, uncompressed size on disk 109Mb.
Note: ungzip them before running the example.
test.php
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
<?php if(empty($argv[1])) { die("Please specify xml file to parse.\n"); } $xml = simplexml_load_file($argv[1]); if($xml === false) { die('Unable to load and parse the xml file: ' . error_get_last()['message'] ); } foreach($xml->datafeed->prod as $element) { //var_dump($element->text->name); $prod = array( 'name' => strval($element->text->name), 'price' => strval($element->price->buynow), 'currency' => strval($element->price->attributes()->curr) ); print_r($prod); echo "\n"; } print "memory_get_usage() =" . memory_get_usage()/1024 . "kb\n"; print "memory_get_usage(true) =" . memory_get_usage(true)/1024 . "kb\n"; print "memory_get_peak_usage() =" . memory_get_peak_usage()/1024 . "kb\n"; print "memory_get_peak_usage(true) =" . memory_get_peak_usage(true)/1024 . "kb\n"; print "custom memory_get_process_usage() =" . memory_get_process_usage() . "kb\n"; /** * Returns memory usage from /proc<PID>/status in bytes. * * @return int|bool sum of VmRSS and VmSwap in bytes. On error returns false. */ function memory_get_process_usage() { $status = file_get_contents('/proc/' . getmypid() . '/status'); $matchArr = array(); preg_match_all('~^(VmRSS|VmSwap):\s*([0-9]+).*$~im', $status, $matchArr); if(!isset($matchArr[2][0]) || !isset($matchArr[2][1])) { return false; } return intval($matchArr[2][0]) + intval($matchArr[2][1]); } |
Test 1:
Parse small xml file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
[damir@buffy memory]$ php -q test.php feed_sml.xml ... [currency] => GBP ) Array ( [name] => ODI Ruffian Lock-On Bonus Pack Grips [price] => 16.49 [currency] => GBP ) memory_get_usage() = 352.2734375kb memory_get_usage(true) = 2048kb memory_get_peak_usage() = 391.5625kb memory_get_peak_usage(true) = 2048kb custom memory_get_process_usage() = 13688kb |
Test 2:
Parse large xml file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
[damir@buffy memory]$ php -q test.php feed_big.xml ... [currency] => GBP ) Array ( [name] => ODI Ruffian Lock-On Bonus Pack Grips [price] => 16.49 [currency] => GBP ) memory_get_usage() =352.2578125kb memory_get_usage(true) =2048kb memory_get_peak_usage() =391.546875kb memory_get_peak_usage(true) =2048kb custom memory_get_process_usage() =478368kb |
Comparison
function | small xml (kb) | big xml (kb) |
---|---|---|
memory_get_usage()/1024 | 352.25 | 352.25 |
memory_get_usage(true)/1024 | 2048 | 2048 |
memory_get_peak_usage()/1024 | 391.54 | 391.54 |
memory_get_peak_usage(true)/1024 | 2048 | 2048 |
memory_get_process_usage() | 13688 | 478096 |
As we can see, memory usage as reported by PHP memory_get_* functions is the same when processing small xml file and big xml file. But there is a huge difference in real memory consumption which is reported by our custom memory_get_process_usage which relays on /proc/<PID>/status. 13688kb vs 478096kb!
This demonstrate how PHP memory_get_* function are not good choice when analyzing memory consumption of PHP scripts that are using resources. Also if we were to write PHP script which processes big XML files, using simplexml_* functions such as simplexml_load_file or even worse simplexml_load_string are not good choices. Since they read the whole XML structure into the memory, instead of reading it bit by bit.
In the next post I will show you how to write memory efficient PHP script which processes big XML files, while not sacrificing conveniences provided by simplexml_* functions. Stay tuned. 🙂