# VRP15

33 %
67 %
Education

Published on February 26, 2008

Author: Alohomora

Source: authorstream.com

A Reinforcement Learning Approach for Product Delivery by Multiple Vehicles:  A Reinforcement Learning Approach for Product Delivery by Multiple Vehicles Scott Proper Oregon State University Prasad Tadepalli Hong Tang Rasaratnam Logendran Vehicle Routing & Product Delivery:  Vehicle Routing & Product Delivery Contributions of our Research:  Contributions of our Research Multiple vehicle product delivery is a well-studied problem in operations research We have formulated this problem as an average reward reinforcement learning (RL) problem We have combined inventory control with vehicle routing We have scaled RL methods to work with large state spaces Markov Decision Processes:  Markov Decision Processes Action a Actions are stochastic: Pi,j(a) Actions have costs or rewards: ri(a) Move Unload Unload Average Reward Reinforcement Learning:  Average Reward Reinforcement Learning Goal: Maximize average reward/time step Minimize stockout penalty + movement penalty Policy: states → actions Value function: states → real values expected long-term reward from a state, relative to other states, when following the optimal policy H-Learning:  H-Learning The value function satisfies the Bellman equation: The optimal action a* maximizes the immediate reward + expected value of the next state H-Learning is a real-time algorithm for solving the value function H-Learning: an example 1:  H-Learning: an example 1 -.1, 1/1 0, 9/9 0, 0/9 A E D C B H-Learning: an example 2:  H-Learning: an example 2 Stockout penalty: -20 A E D C B -.1, 1/1 0, 9/10 -20, 1/10 H-Learning: an example 3:  H-Learning: an example 3 A E D C B -.1, 1/1 0, 9/10 -20, 1/10 H-Learning: an example 4:  H-Learning: an example 4 Move penalty: -.1 A E D C B -.1, 2/2 0, 9/10 -20, 1/10 On-line Product Delivery:  On-line Product Delivery Deliver 1 product 9 truck actions: 4 levels of unload 4 move directions wait P(Inventory decrease | shop) Stockout penalty: -20 Movement penalty: -.1 5 Shops Depot The problem of state-space explosion:  The problem of state-space explosion The loads of trucks and shop inventories are discretized into 5 levels States grow exponentially in shops and trucks 10 locations, 5 shops, 2 trucks = (102)(55)(52) = 7,812,500 states 5 trucks = 976,592,500,000 states Table-based methods take too much time and space Piecewise Linear Function Approximation:  Piecewise Linear Function Approximation We use a different linear function for each possible 5-tuple of locations l1,…, l5 of trucks Each function is linear in truck loads and shop inventories Every function represents 10 million states million-fold reduction of learnable parameters Piecewise linear function approximation vs. table-based:  Piecewise linear function approximation vs. table-based Storing and using the action models:  Storing and using the action models Problem: exponential time to determine the expected value of the next state: - Each shop’s consumption is independent - Value function is piecewise linear ? ? ? ? Ignoring Truck Identity:  Ignoring Truck Identity m = number of locations (10) k = number of trucks (2-5) 5 trucks: 105 functions Learnable parameters: 1.1 million 2002 functions Learnable parameters: 22,022 mk The problem of action-space explosion:  The problem of action-space explosion Every action a is a vector of individual “truck actions” a = (a1, a2,…,an) Actions grow exponentially in the number of trucks 9 “truck actions” For 2 trucks: 92 = 81 total actions For 5 trucks: 95 = 59,049 total actions Hill Climbing Search:  Hill Climbing Search We initialize the vector of truck actions a to all “wait” actions We use hill climbing to reach a local optimum Randomly perturb a truck action, repeat This results in an order-of-magnitude improvement in search time Hill climbing vs. exhaustive search for 4 and 5 trucks:  Hill climbing vs. exhaustive search for 4 and 5 trucks Conclusion:  Conclusion Average-reward RL and Piecewise linear function approximation are promising approaches for real-time product delivery Hill climbing shows great potential for speeding up search in domains with a large action space Problems of scaling are surmountable Future Work:  Future Work Scaling! More trucks, more locations, more shops, more depots, and more items Allowing trucks to move with non-uniform speeds (event-based model needed) Real-valued shop inventory and truck load levels

 User name: Comment:

## Related presentations

#### Dissertation Writing Services in UK Provided by Th...

September 24, 2017

#### Cultural Festivals

September 24, 2017

#### How to Make a Good Dissertation Writing Plan

September 24, 2017

#### Driving School in Edmonton

September 24, 2017

#### Career Kaushal : UniSkills & CGC Newsletter Septem...

September 24, 2017

#### 101_The Mongols(Week 6 TAP)

September 24, 2017

## Related pages

### VRP15-1LW > Plug-ins > Vizia RF + ® > Security ...

Welcome to the VRP15-1LW Section of the Leviton electrical and electronic products Web site. This product has the following description: Vizia RF + 15A ...

### Leviton VRP15-1LW - 15A Appliance Z-Wave Plug-in Module ...

The Leviton VRP15-1LW is a Z-Wave plug-in module that is designed to let you remotely turn your plug-in appliances on and off via Z-Wave. The Leviton VRP15 ...

### Leviton VRPA1-1LW Vizia RF Plus Scene Capable Plug-in ...

The Vizia RF + Scene Capable Plug-in Appliance Module (VRP15-1LW) is ideal for any residential setting that needs remote ON/OFF switching of appliances ...

### VRP Langnau Massnahmenliste Verkehrsrichtplan 2015 Langnau ...

VRP Langnau Massnahmenliste Verkehrsrichtplan 2015 Langnau i.E.: Massnahmenliste Bearbeitungsstand: 12.02.2015 Vision VRP15:

### VRP15-1LW - Platt Electric Supply

VRP15-1LW Color: White UPC Code: 07847739035 Country of Origin: China Item Description Vizia RF + 15A Scene Capable Plug-In Appliance Module. White.

### Veronica Pitocchi (@vrp15) • Instagram photos and videos

Two roads diverged in a wood, and I— I took the one less traveled by, And that has made all the difference.

### Vrp15 Long Tools Ratchet Drive Mac 15 - autoplierslo.com

Review (mpn:vrp15 for sale) Tools Long Mac Drive 15 Vrp15 Ratchet 1 2 . I do not ship internationally or to canada. This item is in good used condition.

### VRPA1-1LW > Plug-ins > Vizia RF + ® > Security ...

VRPA1-1LW. Additional Views; UPC Code: 07847765518 Country of Origin: China ... The Vizia RF + Scene Capable Plug-in Appliance Module (VRP15-1LW) ...