Jump to content

Extracting XML table data from a file with generic node names


bosoxbill

Recommended Posts

I would like to extract data from Associated Press files and put them in tabular form for my work at a newspaper. I understand how to extract data when the nodes are obvious such as: <booklist>, <author>, but the files from the AP contain just <table>, <th>, <tr>, <td>, and I can't seem to extract the data. The file format is called NITF XML. I want to put them in a similar style table that would be printed in a newspaper. Can anyone help me with this? Thank you! Here is an sample file as it appears when downloaded from their wire: <nitf xmlns="http://ap.org/schemas/03/2005/nitf"> <head> <meta name="ap-transref" content="s0225" /> <meta name="ap-origin" content="dss" /> <meta name="ap-selector" content="-----" /> <meta name="ap-category" content="s" /> <meta name="ap-format" content="at" /> <!-- Routing Type="Passcode" Expanded="true" Outed="false" --> <meta name="ap-routing" content="s,s1,sag" /> <meta name="ap-cycle" content="BC" /> <meta name="ap-xhl" content="BBO--Baseball Expanded Glance" /> <docdata> <doc-id regsrc="AP" /> <del-list> <from-src level-number="s0225" /> </del-list> <urgency ed-urg="3" /> <date.issue norm="2012812TZ" /> <du-key key="BC-BBO--Baseball Expanded Glance" /> <doc.rights owner="http://www.ap.org" agent="http://license.icopyright.net" type="none" /> <doc.copyright /> </docdata> </head> <body> <body.head> <hedline> <hl1>Baseball Expanded Standings</hl1> <byline>The Associated Press<byttl></byttl></byline> </hedline> <distributor>The Associated Press</distributor> </body.head> <body.content> <block> <table> <tr> <th>AMERICAN LEAGUE</th> </tr> <tr> <th>East Division</th> </tr> <tr> <th></th> <th>W</th> <th>L</th> <th>Pct</th> <th>GB</th> <th>WCGB</th> <th>L10</th> <th>Str</th> <th>Home</th> <th>Away</th> </tr> <tr> <td>New York</td> <td>67</td> <td>46</td> <td>.593</td> <td>—</td> <td>—</td> <td>7-3</td> <td>W-4</td> <td>34-22</td> <td>33-24</td> </tr> <tr> <td>Tampa Bay</td> <td>61</td> <td>52</td> <td>.540</td> <td>6</td> <td>—</td> <td>8-2</td> <td>W-5</td> <td>32-27</td> <td>29-25</td> </tr> <tr> <td>Baltimore</td> <td>61</td> <td>53</td> <td>.535</td> <td>6½</td> <td>½</td> <td>6-4</td> <td>L-1</td> <td>29-28</td> <td>32-25</td> </tr> <tr> <td>Boston</td> <td>56</td> <td>59</td> <td>.487</td> <td>12</td> <td>6</td> <td>3-7</td> <td>L-1</td> <td>29-34</td> <td>27-25</td> </tr> <tr> <td>Toronto</td> <td>53</td> <td>60</td> <td>.469</td> <td>14</td> <td>8</td> <td>2-8</td> <td>L-5</td> <td>28-25</td> <td>25-35</td> </tr> <tr> <th>Central Division</th> </tr> <tr> <th></th> <th>W</th> <th>L</th> <th>Pct</th> <th>GB</th> <th>WCGB</th> <th>L10</th> <th>Str</th> <th>Home</th> <th>Away</th> </tr> <tr> <td>Chicago</td> <td>61</td> <td>51</td> <td>.545</td> <td>—</td> <td>—</td> <td>6-4</td> <td>L-1</td> <td>31-26</td> <td>30-25</td> </tr> <tr> <td>Detroit</td> <td>61</td> <td>53</td> <td>.535</td> <td>1</td> <td>½</td> <td>7-3</td> <td>L-1</td> <td>33-23</td> <td>28-30</td> </tr> <tr> <td>Cleveland</td> <td>53</td> <td>61</td> <td>.465</td> <td>9</td> <td>8½</td> <td>3-7</td> <td>W-1</td> <td>30-28</td> <td>23-33</td> </tr> <tr> <td>Kansas City</td> <td>49</td> <td>64</td> <td>.434</td> <td>12½</td> <td>12</td> <td>6-4</td> <td>W-1</td> <td>21-32</td> <td>28-32</td> </tr> <tr> <td>Minnesota</td> <td>49</td> <td>64</td> <td>.434</td> <td>12½</td> <td>12</td> <td>5-5</td> <td>L-3</td> <td>23-34</td> <td>26-30</td> </tr> <tr> <th>West Division</th> </tr> <tr> <th></th> <th>W</th> <th>L</th> <th>Pct</th> <th>GB</th> <th>WCGB</th> <th>L10</th> <th>Str</th> <th>Home</th> <th>Away</th> </tr> <tr> <td>Texas</td> <td>66</td> <td>46</td> <td>.589</td> <td>—</td> <td>—</td> <td>7-3</td> <td>W-1</td> <td>35-22</td> <td>31-24</td> </tr> <tr> <td>Oakland</td> <td>61</td> <td>52</td> <td>.540</td> <td>5½</td> <td>—</td> <td>5-5</td> <td>W-1</td> <td>34-26</td> <td>27-26</td> </tr> <tr> <td>Los Angeles</td> <td>60</td> <td>54</td> <td>.526</td> <td>7</td> <td>1½</td> <td>3-7</td> <td>L-1</td> <td>31-23</td> <td>29-31</td> </tr> <tr> <td>Seattle</td> <td>52</td> <td>63</td> <td>.452</td> <td>15½</td> <td>10</td> <td>4-6</td> <td>W-1</td> <td>25-29</td> <td>27-34</td> </tr> </table> <p>___</p> <table> <tr> <th>NATIONAL LEAGUE</th> </tr> <tr> <th>East Division</th> </tr> <tr> <th></th> <th>W</th> <th>L</th> <th>Pct</th> <th>GB</th> <th>WCGB</th> <th>L10</th> <th>Str</th> <th>Home</th> <th>Away</th> </tr> <tr> <td>Washington</td> <td>71</td> <td>43</td> <td>.623</td> <td>—</td> <td>—</td> <td>9-1</td> <td>W-8</td> <td>32-22</td> <td>39-21</td> </tr> <tr> <td>Atlanta</td> <td>66</td> <td>47</td> <td>.584</td> <td>4½</td> <td>—</td> <td>7-3</td> <td>W-3</td> <td>32-26</td> <td>34-21</td> </tr> <tr> <td>New York</td> <td>54</td> <td>60</td> <td>.474</td> <td>17</td> <td>9½</td> <td>4-6</td> <td>L-2</td> <td>27-30</td> <td>27-30</td> </tr> <tr> <td>Miami</td> <td>52</td> <td>62</td> <td>.456</td> <td>19</td> <td>11½</td> <td>4-6</td> <td>W-1</td> <td>28-28</td> <td>24-34</td> </tr> <tr> <td>Philadelphia</td> <td>51</td> <td>62</td> <td>.451</td> <td>19½</td> <td>12</td> <td>5-5</td> <td>L-1</td> <td>25-33</td> <td>26-29</td> </tr> <tr> <th>Central Division</th> </tr> <tr> <th></th> <th>W</th> <th>L</th> <th>Pct</th> <th>GB</th> <th>WCGB</th> <th>L10</th> <th>Str</th> <th>Home</th> <th>Away</th> </tr> <tr> <td>Cincinnati</td> <td>68</td> <td>46</td> <td>.596</td> <td>—</td> <td>—</td> <td>5-5</td> <td>W-2</td> <td>36-20</td> <td>32-26</td> </tr> <tr> <td>Pittsburgh</td> <td>63</td> <td>50</td> <td>.558</td> <td>4½</td> <td>—</td> <td>4-6</td> <td>L-3</td> <td>35-20</td> <td>28-30</td> </tr> <tr> <td>St. Louis</td> <td>62</td> <td>52</td> <td>.544</td> <td>6</td> <td>1½</td> <td>6-4</td> <td>W-1</td> <td>34-23</td> <td>28-29</td> </tr> <tr> <td>Milwaukee</td> <td>51</td> <td>61</td> <td>.455</td> <td>16</td> <td>11½</td> <td>5-5</td> <td>L-2</td> <td>33-26</td> <td>18-35</td> </tr> <tr> <td>Chicago</td> <td>44</td> <td>68</td> <td>.393</td> <td>23</td> <td>18½</td> <td>1-9</td> <td>L-2</td> <td>28-26</td> <td>16-42</td> </tr> <tr> <td>Houston</td> <td>38</td> <td>77</td> <td>.330</td> <td>30½</td> <td>26</td> <td>3-7</td> <td>W-2</td> <td>27-31</td> <td>11-46</td> </tr> <tr> <th>West Division</th> </tr> <tr> <th></th> <th>W</th> <th>L</th> <th>Pct</th> <th>GB</th> <th>WCGB</th> <th>L10</th> <th>Str</th> <th>Home</th> <th>Away</th> </tr> <tr> <td>San Francisco</td> <td>62</td> <td>52</td> <td>.544</td> <td>—</td> <td>—</td> <td>6-4</td> <td>W-1</td> <td>33-24</td> <td>29-28</td> </tr> <tr> <td>Los Angeles</td> <td>61</td> <td>53</td> <td>.535</td> <td>1</td> <td>2½</td> <td>5-5</td> <td>L-1</td> <td>33-25</td> <td>28-28</td> </tr> <tr> <td>Arizona</td> <td>57</td> <td>57</td> <td>.500</td> <td>5</td> <td>6½</td> <td>4-6</td> <td>L-2</td> <td>30-26</td> <td>27-31</td> </tr> <tr> <td>San Diego</td> <td>51</td> <td>64</td> <td>.443</td> <td>11½</td> <td>13</td> <td>7-3</td> <td>W-6</td> <td>27-30</td> <td>24-34</td> </tr> <tr> <td>Colorado</td> <td>41</td> <td>70</td> <td>.369</td> <td>19½</td> <td>21</td> <td>4-6</td> <td>L-1</td> <td>21-37</td> <td>20-33</td> </tr> </table> <p>___</p> <table> <tr> <th>AMERICAN LEAGUE</th> </tr> <tr> <th>Saturday's Games</th> </tr> </table> <p>N.Y. Yankees 5, Toronto 2</p> <p>Cleveland 5, Boston 2</p> <p>Kansas City 7, Baltimore 3</p> <p>Oakland 9, Chicago White Sox 7</p> <p>Tampa Bay 4, Minnesota 2</p> <p>Texas 2, Detroit 1</p> <p>Seattle 7, L.A. Angels 4</p> <table> <tr> <th>Sunday's Games</th> </tr> </table> <p>Boston at Cleveland, 1:05 p.m.</p> <p>N.Y. Yankees at Toronto, 1:07 p.m.</p> <p>Kansas City at Baltimore, 1:35 p.m.</p> <p>Oakland at Chicago White Sox, 2:10 p.m.</p> <p>Tampa Bay at Minnesota, 2:10 p.m.</p> <p>Detroit at Texas, 3:05 p.m.</p> <p>Seattle at L.A. Angels, 3:35 p.m.</p> <table> <tr> <th>Monday's Games</th> </tr> </table> <p>Texas (Dempster 1-0) at N.Y. Yankees (Undecided), 7:05 p.m.</p> <p>Chicago White Sox (Peavy 9-8) at Toronto (Villanueva 6-2), 7:07 p.m.</p> <p>Detroit (A.Sanchez 1-2) at Minnesota (Deduno 3-0), 8:10 p.m.</p> <p>Cleveland (Masterson 8-10) at L.A. Angels (C.Wilson 9-8), 10:05 p.m.</p> <p>Tampa Bay (Cobb 6-8) at Seattle (Beavan 7-6), 10:10 p.m.</p> <table> <tr> <th>Tuesday's Games</th> </tr> </table> <p>Boston at Baltimore, 7:05 p.m.</p> <p>Texas at N.Y. Yankees, 7:05 p.m.</p> <p>Chicago White Sox at Toronto, 7:07 p.m.</p> <p>Detroit at Minnesota, 8:10 p.m.</p> <p>Oakland at Kansas City, 8:10 p.m.</p> <p>Cleveland at L.A. Angels, 10:05 p.m.</p> <p>Tampa Bay at Seattle, 10:10 p.m.</p> <p>___</p> <table> <tr> <th>NATIONAL LEAGUE</th> </tr> <tr> <th>Saturday's Games</th> </tr> </table> <p>Cincinnati 4, Chicago Cubs 2</p> <p>San Francisco 9, Colorado 3</p> <p>Houston 6, Milwaukee 5, 10 innings</p> <p>San Diego 5, Pittsburgh 0</p> <p>St. Louis 4, Philadelphia 1</p> <p>Atlanta 9, N.Y. Mets 3</p> <p>Miami 7, L.A. Dodgers 3</p> <p>Washington 6, Arizona 5</p> <table> <tr> <th>Sunday's Games</th> </tr> </table> <p>L.A. Dodgers at Miami, 1:10 p.m.</p> <p>San Diego at Pittsburgh, 1:35 p.m.</p> <p>St. Louis at Philadelphia, 1:35 p.m.</p> <p>Milwaukee at Houston, 2:05 p.m.</p> <p>Cincinnati at Chicago Cubs, 2:20 p.m.</p> <p>Colorado at San Francisco, 4:05 p.m.</p> <p>Washington at Arizona, 4:10 p.m.</p> <p>Atlanta at N.Y. Mets, 8:05 p.m.</p> <table> <tr> <th>Monday's Games</th> </tr> </table> <p>L.A. Dodgers (Harang 7-7) at Pittsburgh (Karstens 4-2), 7:05 p.m.</p> <p>Philadelphia (Hamels 12-6) at Miami (Eovaldi 3-7), 7:10 p.m.</p> <p>San Diego (Stults 2-2) at Atlanta (Minor 6-8), 7:10 p.m.</p> <p>Houston (Galarraga 0-2) at Chicago Cubs (Samardzija 7-10), 8:05 p.m.</p> <p>Milwaukee (Fiers 6-4) at Colorado (Francis 3-4), 8:40 p.m.</p> <p>Washington (G.Gonzalez 14-6) at San Francisco (Vogelsong 10-5), 10:15 p.m.</p> <table> <tr> <th>Tuesday's Games</th> </tr> </table> <p>L.A. Dodgers at Pittsburgh, 7:05 p.m.</p> <p>N.Y. Mets at Cincinnati, 7:10 p.m.</p> <p>Philadelphia at Miami, 7:10 p.m.</p> <p>San Diego at Atlanta, 7:10 p.m.</p> <p>Houston at Chicago Cubs, 8:05 p.m.</p> <p>Arizona at St. Louis, 8:15 p.m.</p> <p>Milwaukee at Colorado, 8:40 p.m.</p> <p>Washington at San Francisco, 10:15 p.m.</p> <p /> </block> </body.content> <body.end /> </body></nitf>

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...