I wanted fresh WQX domain values from the STORET web service that are updated daily ( http://www.epa.gov/storet/wqx/wqx_getdomainvalueswebservice.html ) – which are provided in XML format. But, it’s not a friendly format for most software. Excel and Access “see it” but they don’t make useful tables out of them. It turns out that MySQL’s LOAD XML command (https://dev.mysql.com/doc/refman/5.5/en/load-xml.html) works for this, but you need to change the structure of the XML a bit. When the manual says that the XML must be one of the 3 formats – it’s not kidding and it’s not as flexible as the text makes it sound. So, the following instructions and horribly-brute force script will transform the EPA XML files into something that MySQL will understand and then load the data for you.
First I created this table to hold my data (you might need to add more columns).
CREATE TABLE `wqx21_domains` ( `DOM_ID` int(11) NOT NULL AUTO_INCREMENT, `WQXElementName` varchar(500) DEFAULT NULL, `UniqueIdentifier` varchar(500) DEFAULT NULL, `Code` varchar(500) DEFAULT NULL, `Type` varchar(500) DEFAULT NULL, `TribalCode` varchar(500) DEFAULT NULL, `LastChangeDate` varchar(500) DEFAULT NULL, `Description` varchar(500) DEFAULT NULL, `Name` varchar(500) DEFAULT NULL, `ContextCode` varchar(500) DEFAULT NULL, `QualifierType` varchar(500) DEFAULT NULL, `Rank` varchar(500) DEFAULT NULL, `ExternalID` varchar(500) DEFAULT NULL, `STORETID` varchar(500) DEFAULT NULL, `SRSID` varchar(500) DEFAULT NULL, `SampleFractionRequired` varchar(500) DEFAULT NULL, `PickList` varchar(500) DEFAULT NULL, `CASNumber` varchar(500) DEFAULT NULL, `CountyFIPSCode` varchar(500) DEFAULT NULL, `CountyName` varchar(500) DEFAULT NULL, `StateCode` varchar(500) DEFAULT NULL, `CREATED_DT` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`DOM_ID`) ) ENGINE=InnoDB AUTO_INCREMENT=52401 DEFAULT CHARSET=utf8
then run the following bash code from within a directory that looks like below.

in the last sed line below, note that you must reset the delimeter to | (or something besides / as you need to escape those chars in the values you need to replace. This is the easiest way I think.
UPDATE: the post following this one shows how to create the directory of .zip files this script needs with a one-liner using wget.
#!/bin/sh clear # optional stuff for checking output # echo "I see this many zip files: " # ls *.zip | wc -l # echo "and they are: " # ls *.zip echo "" echo "" echo "" echo "" echo "************* START ******************" pwd echo "" for zipper in *.zip do echo $zipper echo "" mkdir "$zipper"_folder echo "" unzip -u $zipper -d "$zipper"_folder echo "" echo "" cd "$zipper"_folder pwd # I'm going to just leave these around as the directories are easy to delete later. cat Results.xml | sed -e 's|WQXElementRowColumn|field|' > Results2.xml cat Results2.xml | sed -e 's/WQXElementRowColumn>/field>/' > Results3.xml cat Results3.xml | sed -e 's/colname="/name="/' > Results4.xml cat Results4.xml | sed -e 's/value="/">/' > Results5.xml cat Results5.xml | sed -e 's/" ">/">/' > Results6.xml cat Results6.xml | sed -e 's|"></field>|</field>|' > Results7.xml mysql -uuser -p'password' --local-infile -e "use wqx;LOAD XML LOCAL INFILE 'Results7.xml' INTO TABLE wqx21_domains ROWS IDENTIFIED BY '<WQXElementRow>';" # rm *.xml cd .. echo "" pwd echo " ----- DONE ------- "$zipper done mysqldump -uuser -p'password' wqx > wqx_lookup_dump.sql
and you will get a MySQL table with contents as follows
+---------------------------------+---------+ | WQXElementName | Records | +---------------------------------+---------+ | ActivityGroupType | 8 | | ActivityMedia | 16 | | ActivityMediaSubdivision | 64 | | ActivityType | 100 | | AddressType | 6 | | AnalyticalMethod | 6204 | | Assemblage | 15 | | BiologicalIntent | 6 | | CellForm | 5 | | CellShape | 10 | | Characteristic | 3766 | | CharacteristicPickListValue | 1747 | | ContainerColor | 7 | | ContainerType | 33 | | Country | 16 | | County | 3292 | | Detection/QuantitationLimitType | 12 | | ElectronicAddressType | 3 | | FrequencyClassDescriptor | 65 | | Habit | 9 | | HorizontalCollectionMethod | 38 | | HorizontalReferenceDatum | 16 | | MeasurementUnit | 335 | | MethodSpeciation | 15 | | MetricType | 21 | | MetricTypeContext | 3 | | MonitoringLocationType | 73 | | NetType | 3 | | Organization | 817 | | PhoneType | 10 | | ReferenceLocationType | 4 | | RelativeDepth | 16 | | ResultDetectionCondition | 5 | | ResultLabComment | 34 | | ResultMeasureQualifier | 53 | | ResultStatisticalBase | 28 | | ResultStatus | 17 | | ResultTemperatureBasis | 19 | | ResultTimeBasis | 106 | | ResultValueType | 5 | | ResultWeightBasis | 4 | | SampleCollectionEquipment | 177 | | SampleFraction | 24 | | SampleTissueAnatomy | 83 | | SamplingDesignType | 2 | | State | 68 | | Taxon | 65502 | | ThermalPreservative | 20 | | TimeZone | 46 | | ToxicityTestType | 4 | | Tribe | 1126 | | VerticalCollectionMethod | 28 | | VerticalReferenceDatum | 12 | | Voltinism | 10 | | WellFormationType | 6 | | WellType | 40 | +---------------------------------+---------+ 56 rows in set (0.92 sec)
Other resources:
http://stackoverflow.com/questions/8582837/load-xml-local-infile-with-inconsistent-column-names
http://blog.mclaughlinsoftware.com/2010/09/26/load-xml-local-infile/