Saturday, 16 April 2011

42 - Well, well, well...

So, it used to be February 2010, now it's April 2011. Over a year since my last blog post, and in that time a fair amount has changed. We have a new Prime Minister in the form of David Cameron, who you may remember I called a "tosser" back in January. I still stick to that, but mainly because he's currently in the process of fucking up the country and most of its treasured institutions, rather than just playing politics with the sexual assault of two pre-adolescent boys.

The Lib Dems are also in Government - who would have guessed that?! - as part of a coalition with the Tories. Yes, David Cameron, despite running against the most unpopular Prime Minister since, well, John Major, and flying high in the polls for about two years, somehow managed not to get an overall majority. And so to Nick Clegg he ran, whose arms (if not his legs) were wide open and waiting for the inevitable embrace. And so the Lib Dems are gallantly aiding the Tories in screwing up the nation, merrily taking a gamble (in Nick's own words) with our economy. So, bang goes higher education, lower education, an NHS which actually works properly, all the money for our armed forces, the transport systems, etc. etc. I voted Lib Dem, ostensibly to keep the Tories out in my constituency which is Lib Dem. Fat lot of good that did. Still, at least there's going to be a vote on AV which isn't quite proportional representation but is better than FPTP. The vote will probably be lost, as most of the newspapers have got their tongues firmly lodged inside Cameron's rectum, but at least a vote will be had. Should've happened about 10 years ago, mind, but then Labour rather messed that one up, didn't they? Wouldn't have even needed a referendum. I don't really fully dislike the Lib Dems all that much, but they are ultimately allowing the Tories to wreak the kind of destruction that made them spunk in their trousers sixteen years ago.

Labour have a new leader, in the form of Ed Miliband - a man who nobody expected to head up the party. He seems alright to me, though time will tell, but out of all the candidates I think he's the one that impressed me the most. I reckon he's got the ability to make the Labour party one that I feel I can vote for again. And if he's got any nous he'll do away with all the anti-civil liberties gobshite that New Labour
decided was so important. At the moment, however, they're still growing their teeth. And whilst it's fairly easy to be in opposition when you've got such an immediately unpopular government, they have been fairly anonymous thus far. Perhaps that's for the best, though.

But, ignoring the imminent and ruthless destruction of our country, most importantly: I've finished my PhD! Woop! Its completion was a very long, drawn out process, as I'm sure many of those who have done one are all too aware: the simulations were finished, after a very long, drawn out three months in the Autumn of 2009, at the start of December. The writing-up took place during that period, with my wife playing school mistress and demanding chapters were written by certain deadlines. It helped, too: by the end of December I had submitted my final chapters to my supervisors and had a full draft thesis. Then the fun began.

The first stage, after the Christmas holidays ended, was to get some soft-bound theses printed so they could be sent off for examination. This was rather tricky as they were being printed in Loughborough and I live about 135 miles from there. As it was, my supervisor managed to get across to the printing place to OK the first print (they wouldn't go ahead unless it was OK'd), and things progressed well. Forms were filled in, theses were submitted, examiners were found, and three months later in April 2010, a viva happened. I passed. This was good. Celebrations occurred. Required alterations were sent out, alterations were made, final copy was signed off, printing of final copies was scheduled.

Then, more interesting stuff happened. The intended graduation date was in July 2010 and in order to be able to graduate at that time, I needed to submit about a month before, in mid-June. I hurried along the corrections that I needed to make, and then email them off to the printing place in mid-May, giving me a month. Turned out that the pdf I'd created had errors in the contents table (grrr Microsoft) and so the first print was, not to put too fine a point on it, bollocks. However, I did not discover this for a while as my supervisor was very very busy and unable to get down to the printing place for ages after I'd sent it and they'd printed it. Time ticked by. She finally got down there to check it, and discovered the contents error. Arseholes. Redid it, sent it back up, made sure it worked this time, and then tried to get her to check it again. She was still very busy and couldn't commit to doing it. Thankfully, I'd changed jobs back in February and was regularly going up to Nottingham University. I made an excuse to drop in on Loughborough on the way back (they're on the same train line) and check the second first print myself. This time, there were no problems and the whole set of printing and hard-binding could complete. Time was still ticking by, and so by the time I was able to finally go up to Loughborough to pay for and fully submit the final, sexy, hard-bound theses, it was about two days before the deadline. Argh.

Note, though, that the deadline was, like the vitality of Schrödinger's cat, somewhat nebulous and uncertain. It could have been one of several dates, depending on who you spoke to at the University. I took the earliest one, naturally, and worked to that. I managed to blag a day off work to go up to Loughborough to pay for and submit my theses. This had required some painstaking effort - checking that my supervisor would be in to sign the relevant forms, that I had all the other forms I needed and that the Research Student Office was open to receive them. All was in alignment and so off I trotted in my car to the University.

Now then, it is an important point to note that the day before I was due to go up for the big submission, I had made the fatal mistake of trying to use my card to get some money out of a cash machine. The Barclays cash point in the western end of Euston station, just outside Paul, is the psychotic fucker which gleefully took my card off me, then refused to give me either cash or card back. So, I ring up Barclays and get that card cancelled. Not my personal account card, no no, checked that with the phone person. Not personal account, but joint account. Definitely joint account card, yes? Yes. Good. Not personal account card. Need personal account card for paying for theses. Joint account card swallowed. Cancel joint account. Not personal account. Need personal account card. But that's ok, because the joint account card had been cancelled, and not the personal account card.

One day later, the beautiful hot June summer sun saw a very sweaty and flustered Sam swearing really rather profusely in front of a bemused and possibly quite frightened printing service assistant as his personal account card wouldn't work properly, because it had been cancelled. My life was slowly melting before my eyes: if I couldn't pay for the theses, I couldn't submit that day, would be unlikely to get more time off the next day to go back up with a different card, to pay for them and try to submit, and even if I did, there was no saying that my supervisor would be around to sign them off, the deadline would be missed, and I'd have to graduate in December instead.

Luckily, however, for reasons I won't go into, I had my wife's credit card on me. A quick and panicky phone call later saw her give my the PIN and the theses were paid for. They got signed and submitted and a happy Sam graduated that July.

Monday, 1 February 2010

41 - Multiples of (B-1) where B is the number base being used always ultimately sum to (B-1) when summed in the number base B

After I finally got to the end of my seemingly interminable series of posts on my PhD, I thought I'd follow it up straight away with something, in the same sort of way that one might eat a chocolate straight after swallowing back down some vomit. However, I didn't actually post this at the time, but thankfully it's about maths. Well, numbers, really, and how they work, and so it's fairly timeless.

I like numbers. This is apparent to anyone who knows me well. One of my favourite numbers is the number 9. 9 is a good number, and it has lots of properties which make it lovely. For instance, any multiple of nine has digits which sum to a multiple of nine. Take the number, 27602742108312 has digits which add up to 45, whose digits add up to 9, therefore 27602742108312 is a multiple of 9.

Some years ago, this property intrigued me. I was on a train and fairly bored, so I had a think about it. Why is this true?

If one writes a list of the multiples of 9, you get the following:

9
18
27
36
45
54
63
72
81
...

Remember your primary school maths lessons? Remember how, when you were being taught about long addition, they spoke about "units, tens, hundreds, thousands"? Well, this is being used here. Every time you add nine to the previous number, if the unit is greater than 0, it is reduced by 1, while 1 is added to the tens. Thus, the numbers balance out, and the sum remains at 9.

But it's not quite that simple. The reason why we count in units, tens, hundreds, and so on is because we use the decimal number system. That is, we count in base 10. It's possible to count in other bases - if you've ever used a tally counting system, that's using base 1. If you understand binary, that's base 2. Hexadecimal is base 16, and so on. Whatever base you're in (let's call it B), the digits in the numbers you use are arranged in a very specific format:

... B5 B4 B3 B2 B1 B0

So, in base 10, we count in:

... 105 104 103 102 101 100

Which works out as:

... 100000 10000 1000 100 10 1

So, if you want to represent the number "35673" in decimal, you're saying you have three ten thousands, five thousands, six hundreds, seven tens and three units. If you are counting in binary, these numbers are:

... 32 16 8 4 2 1

So, if you have the binary number "101101", you're saying you have one thirty-two, no sixteens, one eight, one four, no twos and one one, which is the same as 45 in decimal.

If you are counting in ternary (base 3), your columns are:

...243 81 27 9 3 1

And thus the number "201212" represents two times 243, no eighty-ones, one twenty-seven, two nines, one three and two ones. 486+0+27+18+3+2=536 in decimal.

Whenever we count on our hands, we use each finger to represent a one - we are counting in base 1. However, if we count in base two on our hands, we can get a much larger range of numbers:

Hold your hands in front of you, palms facing you. Put down all your fingers into fists. Imagine that this represents zero. Whenever you raise a finger, that puts a '1' into the number that that finger represents, while a finger being down represents a '0'. (Note that this assumes complete independence of finger movement, which isn't quite true for the ring and little fingers, but it is good enough for a demonstration). Raise your right-hand thumb. This number is therefore 0000000001 in binary, or 1 in decimal. Put your thumb down, and raise your right index finger. This is 0000000010, or 2 in decimal. Raise your thumb again. This is 0000000011, or 3 in decimal. Put both these down and raise your middle finger. Apologise to whoever is now looking at you in a very offended way, and tell them that this is 0000000100, or 4 in decimal. If you keep counting in this way, by the time you get to raise the thumb on your left hand, you've counted all the way up to 512. Raise all your fingers, and this represents 1023. On two hands which you previously thought could only count up to ten.

But all this is a minor distraction. Hopefully, you're now familiar with number bases. Let's say that we have a number base A, where A merely represents any number. If A is 2, we are counting in binary. If A is 10, we are in decimal, and so on. If we take the number A-1, any multiple of A-1 will have digits which sum to A-1. To show this, in senary (number base 6), we count as follows, with senary on the left hand and decimal on the right:

0 - 0
1 - 1
2 - 2
3 - 3
4 - 4
5 - 5
10 - 6
11 - 7
12 - 8
13 - 9
14 - 10
15 - 11
20 - 12
21 - 13
22 - 14
23 - 15
24 - 16
25 - 17
30 - 18
31 - 19
32 - 20
33 - 21
34 - 22
35 - 23
40 - 24
41 - 25
42 - 26
...

We are counting in base 6, and so I am saying that any multiple of five (6-1, for those not keeping up) will have digits which sum to a multiple of 5. I have highlighted these in bold above. The multiples of 5 in senary are 5, 14, 23, 32, 41, 50, 55, 104, 113, 122, 131, 140, 145, and so on. Each of the sums of these numbers adds to a multiple of five. However, look at that last number: 145. The numbers add up to 10, which is clearly 5 x 2, but these digits don't add up to 5. However, we need to add the numbers in senary, not decimal: 1 + 4 + 5 equals 14 in senary, and these digits add up to 5. This property is true for all bases.

Again, in hexadecimal, the multiples of 15 (represented as F) are: F, 1E, 2D, 3C, 4B, ..., F0, FF, 10E, and so on. For the number FF, the digits in decimal add to 30, which is, again, a multiple of F. In hexadecimal, F + F = 1E, and 1 + E add to F.

I'm now going to generalise this a bit. It gets a bit technical, so bear with me.

Whenever we count in base A, we set up our columns so that, as above, we have

...A5 A4 A3 A2 A 1

We set all the values to zero, and begin incrementing the units column by one. When we get to the number "A", we set the units column to zero and the "A" column to 1. Thus, any number which is smaller than A lies only within the units column. Likewise, any number which is smaller than A2 lies purely within the "A" and units columns, and so on. Similarly, if a number is greater than (A-1), the number must lie in more than just the units column. This is a very important property. Using this property, a table can be constructed showing all of the digits of any multiple of any base A, and their sum:

NN x (A-1)A3A2A1Digit Sum
0000000
1(A-1)000(A-1)(A-1)
2A+(A-1)-1001(A-1)-1(A-1)
32A+(A-1)-2002(A-1)-2(A-1)
...





(A-2)(A-3)A+200(A-3)2(A-1)
(A-1)(A-2)A+100(A-2)1(A-1)
A(A-1)A00(A-1)0(A-1)
A+1(A-1)A+(A-1)00(A-1)(A-1)2(A-1)
A+2A2+(A-1)-1010(A-1)-1(A-1)
A+3A2+A+(A-1)-2011(A-1)-2(A-1)
...





A2-1(A-2)A2+(A-1)A+10A-2A-112(A-1)
A2(A-1)A20A-100(A-1)
A2+1(A-1)A2+(A-1)0(A-1)0(A-1)2(A-1)
A2+2(A-1)A2+A+(A-1)-10(A-1)1(A-1)-12(A-1)
...







From this table we can see that, at least up to a certain point, all of the multiples of (A-1) have numbers which sum to a multiple of (A-1). I believe that it extrapolates to all multiples of any base. Now, recall that I, at the time, was considering all of this on a train. Rather like Fermat, I came up with a terrifically brilliant explanation for why this was so. But I didn't write it down and now I can't remember. This latter point I put down to old age. Bear in mind also, that I've forgotten much of the mathematics I used in my degree, which is something I intend to remedy at a later date.

Still, the concept is a very interesting one, and it's something that I've not come across elsewhere. If one of my (mumble) readers wants to point me in the direction of an interesting explanation for this from someone else, please do. I've yet to find one.

Friday, 22 January 2010

40 - David Cameron: Tosser.

So. Imagine the scene. It is September 2009, in Edlington, Yorkshire. Two boys, aged nine and eleven, head out on their bikes, with their dog, and go to the shops and then to a skateboard park. They are approached by two brothers, aged 10 and 11, who ask to use their bikes. They then ask the boys if they want to see a dead fox. The brothers grab the boys and drag them through a barbed wire fence, threatening to kill them. Their money and phone are stolen from them. Shards of glass from a broken beer bottle are held against their necks. The boys are stamped upon, have bricks and stones thrown at their heads. Pieces of a ceramic sink are dropped on the head of the older boy. A metal ring is used by the elder brother to attempt to strangle one of the boys. The victims are forced to strip and perform sex acts on each other. As people are heard approaching the areas, the two brothers pull a plastic sheet over the boys and set fire to it, giving them both burns. Part of the attack is filmed on the phone of one of the boys, in order to humiliate them. They are regularly threatened with death. An old clothes line is pulled round the nine year old's neck, while one of the brothers asks if he has died yet. After the attackers feel their arms beginning to ache, they leave to go and meet their father. The older boy tells the younger to leave him there to die (at eleven years old).

The nine year old wanders into town. He is seen by an elderly couple - the husband laughs at first because he thinks the child's red colour is a bad paint job. They quickly realise it is blood and rush out to help. The child is wearing no shoes, no socks, is wet from the waist down and covered in blood. They manage to learn from the boy that his friend is still out in the wooded ravine, close to death and blinded; his face cannot be seen through the blood. The husband rings his son, who goes out to search for him. The boy is found and the man and a policeman wait with him while an air ambulance arrives. The 40 year old man who waited is, afterwards, so distressed by the sight of the boy that he cannot speak to anyone - he just cries and cries. He still cannot walk past the scene of the horrific incident, it distresses him so much. The policeman is similarly reduced to tears by the sight, and says that in his 22 years of service, it is the most distressing thing he has had to deal with. The consultant doctor says that if the elder boy had been found much later, he would almost certainly not have survived.

The brothers are caught, found guilty of grievous bodily harm and sentenced to an indefinite detention of at least five years, with three years on the sex offenders list. They show no remorse or emotion in interview or during the trial, and claim that they did it because they were bored. A child protection expert tells the presiding judge that the younger brother is a very high risk to the community, and has the potential to become a "seriously disturbed psychopathic offender" unless his treatment was appropriate. She says he has shown hardly any empathy for his victims. News reports claim that they had access to their father's pornographic DVDs and horror films. Their home is filled with violence - their father once threatened to slash their mother's face. From nine years old on, the older brother began to smoke cannabis and drink cider. He has been expelled from school and has a habit of headbutting or hitting teachers. He has a number of convictions, and his younger brother has previously been reprimanded for assault.

This is the case of a truly gruesome and horrific attack. It is difficult to believe that something could be done to children by other children. But these sorts of attacks, while not frequent, remain in the public conscience for a long time, gaining notoriety. Think of some previous cases: Jamie Bulger, the Soham murders, Baby P, Dunblane, the Moors Murders, Fanny Adams. These cases span time and the government of that day. It is not unreasonable to think that no matter what the investment in social care, these cases would not cease - no society has yet found any solution to them. The level of pain and grief that the victims and parents have to go through is immense, so you would be forgiven for thinking that any politician seeking to make any kind of political capital from these extreme cases is acting so distastefully that it's difficult to think of any word to describe them other than "cunt".

Extreme cases do not define the social care that a country offers. They are outliers, anomalies - they cannot in any way describe the nature of the country in which they occur. There is a saying: "extreme cases make bad law", referring to attempting to legislate for something because something truly horrific has happened. Responding to this kind of event with legislation, or even policy change, is a bad idea because they lack any description of a wider problem. That's not to say that systems shouldn't be in place to attempt to prevent these kinds of abhorrent acts, but any reaction should be considered only in response to the broader issues.

Enter David Cameron, the leader of the Tories. In Gillingham today, he referred to this case as being indicative of "Broken Britain", saying it is symptomatic of wider social issues, that people must ask wider questions about social breakdown. He said,
I think when things like this happen it is right to stand back, reflect and ask ourselves some deep questions about what is going wrong in our society.
The BBC reports, "Mr Cameron denied that his frequent references to a "broken Britain" was an over-statement and "terrible crimes" such as those which had happened in Doncaster could not be ignored." He then went on to accuse Labour of covering up the report into the incident. The Treasury Minister Liam Byrne pretty much hits the nail on the head when he says,
What Mr Cameron appears to be trying to do is seizing on one absolutely horrific crime and almost tarring the people of Doncaster, if not the people of Britain, with the same kind of standards and I think that people will recoil from that.
Think about what David Cameron is saying. He's saying that because of an horrific physical and sexual assault by two children on two other children, this is symptomatic of "Broken Britain". He doesn't say it outright, but he implies that Broken Britain is Labour's fault. He claims, meanwhile, that these extreme cases are not down to any single Government, despite everything else he says implying the total opposite.

David Cameron is trying to turn this event so that it helps him become Prime Minister. He is a vile, horrible man who is feeding off the pain and suffering of two little boys and their families, in order to gain political ground.

And thus, I have no hesitation in saying that David Cameron is a tosser of the highest magnitude. The day that he gets elected as Prime Minister is the day that this country welcomes with open arms a Government which will, without doubt, manage to be even more bilious, suspicious, selfish, arrogant and useless than the present one.

Wednesday, 25 November 2009

39 - Petri Nets

So here it is, Merry Christmas. I wish I could confirm that everybody is having fun, but seeing as this has been a series of blogs about various aspects of a very dull topic, I’ll be amazed if anybody is still reading it. Never mind, though, because this is the final post of this series! Woo!

Thus far, I have explained the importance of measuring various reliability characteristics for a given item. I have briefly explained how this is done, both qualitatively and quantitatively. I have given an explanation of how my work stems from this – with relevance to working out the probability of successfully completing a mission or series of missions. This final section explains how, in my PhD, all the various concepts that I need to model are modelled.

As you can imagine, it is very difficult to model factors such as bringing online a redundant system to cover the failure of a main counterpart, or the method of prediction of future component failure. In order to be able to do this, then, a tool needs to be employed which at least has the capability of modelling these. This is not really true for fault tree methods or Markov methods, for instance, due to the limitations of the models produced. Luckily, I was introduced to Petri nets.

A Petri net uses several components to represent things:

  • Places – these are shown graphically as circles, and are used to store values, known as tokens.
  • Tokens – these are the values that are stored within places. They can move around through the switching of transitions. An integer number of tokens is stored, and this number can be infinite.
  • Transitions – these allow tokens to be transferred, created or destroyed. They can have a time delay attached or not. They operate through a strict logical set of states.
  • Arcs – these connect places to transitions, and vice versa. Places only connect to transitions, and vice versa, and there can be any number of arcs between a given place and transition. If there are more than one, however, these are grouped together into one, and a weighting or multiplicity is attached to the arc, indicating its size.
Now, this may seem perfectly simple. It may seem really quite complex, but thankfully, I am here to help you see how these simple components can end up producing some very interesting things. The diagram below shows a simple Petri net shown before and after a time t. If it helps, think of t as being 10 seconds, so at the point 10 seconds, the diagram changes from the first net to the second.



The diagram, which we can call Jeff, shows a number of places (the circles), with arcs (the arrows) leading to or from a transition (the rectangle). Notice that in Jeff, some of the arcs have a weighting greater than one, shown by a small slash with a number next to it. Each of the places has tokens (the small dots) in them.

The diagram demonstrates the mechanism by which the dynamic capability of Petri nets is achieved: transition switching. A transition will usually have places which input to it (in Jeff, there are three of these), and those which take outputs from it. The switching process works as follows:
  1. Enabling: A transition is enabled when there is a token which can travel down each arc into the transition. For instance, in Jeff, the left-hand net shows the three input places as having two, one and five tokens respectively. The arcs from these places to the transition have weightings of two, one and four respectively. Thus, the arcs have all got tokens which can travel down them, and so the transition is enabled.
  2. Time-delay: Once the transition is enabled, a time-delay may exist which must expire. This delay can be a set value, of, say, 10 seconds, or an hour. Alternatively, it can be randomly sampled from a given distribution of times.
  3. Switching: Once the delay, if it exists, has expired, the switching takes place. This removes the arc-number of tokens from each of the input places, and deposits an arc-number of tokens in each of the output places.
And that’s it. Beyond that, there are some more complications, such as inhibitor arcs, but that’s pretty much it.

So I have explained the mechanism by which Petri nets work, but have not really mentioned what they’re actually for. This may be a problem as my wife is always complaining that I’m useless at explaining things to the layman. But I’ll give it a go.

Consider Jeff again. It could really represent anything you want, but imagine instead that it’s a model of how to make a particular type of biscuit. You need 2 oz of flour, 1 egg and 4 oz of sugar. But you put in 2 ounces of flour, one egg, and five ounces of sugar (it’s not a healthy biscuit, and it may not work in reality) – represented in the input places. You wait a while to cook it in the oven, then afterwards you are left with four biscuits, and an ounce of sugar, because you used too much of that, and the biscuit complained at you.

As another, more relevant, example, consider the case of a component. A component works at the start of the model. It then fails at some time. After another length of time it is repaired. After some more time, it fails again, and so on. What we have here is two different states – working and failed, and two different ways of switching between them. If we have two places, one for each state, we can use a single token to represent which state the component is currently in. If we have two transitions, these can allow the switching between the states, effectively modelling the processes of “failure” and “repair”. This PN will look something like that in the diagram below, which I’ve called Albert.


In Albert, the top transition is enabled by the single token. It will wait for a certain length of time (the time it takes the component to fail) before switching, representing the component as “failed”. Once this is true, the bottom transition is enabled, which again waits for a length of time (the repair time) before switching the component back to “working” again.

This simple example is at the small end of a whole world of possibilities of modelling: using this system we can easily model the process of a mission from phase to phase. We can create Petri net representations of fault trees. We can make these fault trees cause phase failure, and thus mission and MFOP failure. We can activate or deactivate certain components, allowing for redundant systems to be modelled. And so on. The possibilities are endless. No, really, the possibilities really are endless: PNs as they are usually used in everyday life (haha) are Turing complete.

Using PNs, I have managed to create a modelling method for everything considered in my PhD: prognostic systems, sensors, a fleet of aircraft performing MFOPs which contain multiple phased missions, mission abandonment, phase insertion, redundant systems, and so on. These are all packaged together in a rather nifty computer program, which takes inputs on things such as mission data, component failure rates, phase failure logic, enabler data and so on, and creates all these lovely PNs.

What my program then does is to generate random times for component failure, and see what happens when these failures occur – do they cause phase failure? Does it put the aircraft out of action, or just abandon the mission?

If you build up enough simulations on this sort of thing, you get a very good idea of how well the overall platforms perform, and thus where the major problems are.

And that’s my PhD. It’s taken over five years, it’s made me cry and want to flagellate myself, and it’s nearly over. And when it is, you and I can laugh and drink and eat and forget all about it, pretend that it never happened (other than you having to call me “Doctor”), and get on with our lives in happy ignorance of reality.

Tuesday, 3 November 2009

38 - Maintenance-Free Operating Periods

There's just two more of these posts, then you can consider yourselves educated and can talk to me about my doctorate without me having to start the conversation with the words "Right, well, you know military aircraft, yeah? They have to fly lots of missions, yeah?..."

So. You know military aircraft? They have to fly lots of missions. Missions missions missions. All day long. A mission here, a mission there, a mission everywhere. But, as we also know, things can go wrong in missions. Evil Muslim Terrorists can fire Russian Rockets from their Russian Rocket Launchers and destroy planes. Idiot Americans can accidentally Bomb Aylesbury. Wings can fall off. Luxury cars can fall out of the back of the aircraft, landing bonnet-first in a swamp where an Indian man holding a goat on a piece of string stands looking puzzled.

So, some time ago, around 1995-6, the Ministry of Defence (MoD) posited the creation of a new way of measuring the effectiveness of military aircraft, with respect to reliability. This was called a "Maintenance-Free Operating Period", or MFOP for short. The idea was that it's much more useful to the RAF to be able to send out an aircraft to complete lots of missions back-to-back, without the need for any emergency maintenance, and with a high degree of confidence that this will actually work. Once this period (called the MFOP) is finished, the platform undergoes lots of maintenance all at the same time, with parts swapped in and out, inspections made, damage repaired, and so on. This second period is called a Maintenance Recovery Period, or MRP. After that, the plane goes off again to destroy whatever Innocent Civilians has taken the Government's fancy this week.

Before 2004, when I started the PhD, the little research that existed had investigated this concept, and decided that several potential "improvements" to a platform could be made in order to reach the desired MFOPs and confidence levels. These are:
  1. Improving the inherent reliability characteristics of the components in the platform - understand each of their typical failure distributions, parameters, causes of failure and how these can be minimised, and so on.
  2. Put in place systems or components which usually are switched off. These can be used as a back-up to take over from important systems which may fail. (This is known as redundancy).
  3. For electronic components, make use of a relatively new concept called reconfigurability - the ability of avionics to sense a failure of one of their modules, and adapt their configuration to take account of this, and continue operations as normal.
  4. Design platforms and plan missions and repairs such that finding where failures have occurred in systems (diagnostics) is easy, quick and cheap.
  5. Use systems which can predict the future failure of components and the effect these are likely to have on upcoming missions (prognostics).
All fairly boring stuff, I'm afraid. There do exist, as with Phased Missions, one or two very simple mathematical models but these fail to cut too deep into the issues at heart. So my PhD has to, in addition to considering phased missions modelling, model the performance of MFOPs. In a fleet of aircraft.

There's not really a great deal to say about this subject, short of the fact that it's unlikely that I'll be publishing a thesis set to light the reliability world ablaze with amazing discoveries. It's an idea, but one which probably, ultimately, will not work, because people like things the way they are.

The final post will be about Petri nets. That one will have lots of pretty pictures.

Monday, 2 November 2009

37 - Phased Missions

It should, hopefully, be clear to you now that my work involves estimating the probability of systems failing. So far, this has not been too difficult: break things down, put numbers in, get things and numbers out.

Things can rapidly get more complicated, however. Sometimes, systems go through different periods, where certain sub-systems are activated or deactivated at certain times. An example of this would be an aeroplane – the wheels will be up (stowed away) or down (in use) depending on whether the plane is in flight or not. A failure of the landing gear during flight wouldn’t be an issue at that time – it’s only when the plane is coming into land that panic would set in, and, no doubt, some big black dude attempts to get the muthaf***in' snakes off this muthaf***in' plane. So to speak.

Imagine, then, that the plane is performing a mission. This particular aircraft is of the military variety, and it’s flying off to bomb some innocent Iraqi civilians. The different stages of the Innocent Iraqi Civilian Bombing Mission could be:

1. Taxi to runway
2. Take-off
3. Ascent
4. Transit Flight to Innocent Iraqi Town
5. Descent to Bombing Height
6. Bombing of Innocent Iraqi Civilians
7. Ascent to Transit Height
8. Transit Flight Back to Base
9. Descent
10. Landing
11. Taxi to Hangar
12. Dressing-gown, Whisky, Cigar, Long-Haired Cat, Estimated Death Count, Tirade About Dirty Arabs, Job Well Done.

In each of the stages above, the aeroplane will have different systems in use. These different stages are known as "phases". Because of the differing systems, the ways in which the aeroplane failure can be expressed will change from phase to phase. Also, the stresses on the various sub-systems will change, possibly affecting component failure rates. As such, to get an accurate picture of the probability of aeroplane getting through the Innocent Iraqi Civilian Bombing Mission without being shot down by Evil Muslim Terrorists With Russian Rocket Launchers, one must consider each of these stages, or phases, separately.

And so off we trot, putting together fault trees for each phase of the system failure event "Plane In Innocent Iraqi Civilian Bombing Mission Shot Down By Evil Muslim Terrorists With Russian Rocket Launchers", (or PIIICBMSDBEMTWRRL for short). How, then, do we come up with a figure for the success of the overall mission?

Well, one factor is that if any one phase fails, the entire mission fails. But, just to complicate matters, one has to consider whether or not the plane's failure is one where Evil Muslim Terrorists Shot Down The Aircraft And Then Stole All Our Technological Secrets And Killed The Crew, or whether We Forgot Which Country We Were In And Accidentally Destroyed Aylesbury. One is a catastrophic failure, where the platform is lost, the other is a mission failure, where the objectives have not been completed but further missions are possible. The two levels of failure are quite distinct, and may require completely different phase fault trees for each one.

Other interesting factors include inserting phases into the middle of missions, such as when a mid-air refuelling is needed. Hilariously, this very situation occurred with the Nimrod aircraft some time ago. And a catastrophic failure occurred, everyone died, and much hand-wringing began. Or we may need to alter our strategy midflight, because We Accidentally Bombed Aylesbury and so May As Well Bomb Milton Keynes While We're At It. Or the Accidental Bombing of Aylesbury means we have to Abandon Mission and get back to base before anyone realises what's happened. Or the weather got in the way of our Innocent Iraqi Civilian Bombing campaign, and so We Had To Bomb The Afghanis Instead, something which only happens every Thursday.

Considering all these factors can often be very tricky. My PhD is partly to do with evaluating the situations where inserted phases have occurred, or missions have been abandoned, or a probabilistic event (like the weather) forces a change of tack a certain proportion of the time. For a very simple phased mission, mathematical methods already exist to solve them, and have done since the mid-seventies. One of the standard methods assumes a non-repairable system (which can be a very unworldly assumption indeed) and converts all those pretty phase fault trees into a giant behemoth of a mission fault tree. This can then be solved in the usual way. However, this method is a bit big and takes a long time (which engineers hate), so some other methods have been devised to sort through the nitty-gritty and try to get accurate answers more quickly. I won't mention them here, as I'm only giving a brief overview of the problem, but further reading can always take place by reading the relevant papers. Leave a comment if you care.

The penultimate blog in the series is coming next. This is about the other half of what I have to investigate - Maintenance-Free Operating Periods. Yummy.

Friday, 16 October 2009

36 - Anyway, back to the point

What was I talking about? Oh yes, my PhD. Might as well continue with that.

I believe we were at the point where I had explained how to create a fault tree, and was going to explain how to quantitively analyse it. Because you really want to know this.

So let's assume that you do. This is very easy in one regard and very complex in another.

In a fault tree, if two or more events are connected by an AND gate, the probability of the outcome is simply the multiple of the probabilities of the events. Easy. If, however, the events are connected by an OR gate, the probability can be much more difficult to work out.

Consider the case where you have two dice. What is the probability of getting a six on one OR a six on the other?

Many people would look at this situation and think that the answer is the sum of the probabilities (1/6 + 1/6 = 2/6 = 1/3). This is the intuitive answer, but it is, however, wrong. The reason for this is one that, to be honest, I haven't completely got my head around, but comes down to the fact that if you get a six on one die, you don't care what comes up on the other, even if it is a six. There is an inherent double-counting in the special case of two sixes which needs to be eliminated from the probability.

I will try to show this in the form of a table, if I can quickly google and learn some HTML tabling skills in the next few minutes.


123456
1Y
2Y
3Y
4Y
5Y
6YYYYYY


You see? It's magic what you can find on the internet.

Looking at the above table, there are only eleven cases of getting a six on one die or the other.

Thus, when working out the probability of A OR B (usually written as A + B), this is the sum of the two events minus the product. In the above example, this is 1/6 + 1/6 - 1/36 = 11/36.

If we extend this to be about three dice, the probability becomes much more interesting. There are two ways to work this out. Thinking about the way we solved the above problem, the table needs extending into three dimensions. However, this is slightly impossible on a 2D screen, so I won't bother. Imagine the above table in two states - where the third die has any of the numbers 1 to 5, or where the third die has the number 6. In the first case, the table shown above will apply. The probability of this equals the probability of the third die having the numbers 1 to 5 (5/6) multiplied by the probability of either of the other two dice having a six (11/36, as worked out above). This equals 55/216.

The other case is simply a 6 x 6 table filled with "Y"s . The probability of this is that of the third die having a 6 - 1/6. Thus, the total probability

P(D1=6 OR D2=6 OR D3=6) = 55/216 + 1/6 = 91/216.

This can be worked out more easily by thinking about the general case. I'm not going to derive it, but simply state as fact that it's true. The main way of working out the probability of a number of events, Nc, connected by OR logic, is called the inclusion-exclusion expansion. This is quite tricky to show in HTML form, but I'll give it a go.

P(c1 + c2 + ... + cNc) = P(c1) + P(c2) + ... P(cNc) - P(c1)P(c2) - P(c1)P(c3) - ... - P(c1)P(cNc) - P(c2)P(c3)-...-P(c2)P(cNc)-...-P(cNc-1)(cNc) + P(c1)P(c2)P(c3) + P(c1)P(c2)P(c4)+ ... + P(c1)P(c3)P(c4) + ... + P(c1)P(cNc-1) P(cNc) + ... + P(cNc-2)P(cNc-1)P(cNc) - ... + (-1)Nc-1 P(c1)P(c2)P(c3)...P(cNc-1)P(cNc)

You see? Not easy.

To try put this into as easy an explanation as possible, you sum each of the events' probabilities. After this, sum the probabilities of each combination of two events and subtract this from the previous total. Then sum the probabilities of each combination of three events and add this to the total. Subtract combinations of four events, add combinations of five, and so on, until you either add or subtract the combination of all of the events occurring together, depending on whether the total number of events is odd or even.

To apply this to the dice example,

P(D1=6 + D2=6 + D3=6) = P(D1=6) + P(D2=6) + P(D3=6) - [P(D1=6).P(D2=6) + P(D1=6).P(D3=6) + P(D2=6).P(D3=6) ] + P(D1=6).P(D2=6).P(D3=6)

P = 1/6 + 1/6 + 1/6 - 3/36 + 1/216
P = 108/216 - 18/216 + 1/216 = 91/216

This is the same answer as above.

When working with fault trees, this is the way to solve them. It sounds difficult, but in reality it's not too bad. Something I forgot to mention in the previous blog was that there are these things called cut sets. These describe combinations of component failures (or basic events) which, if they all occur, will definitely cause the system to fail. These can be minimised to get the set of minimal cut sets. The probability of each of these is the product of the failure probabilities of the components within them. Just one inclusion-exclusion expansion is needed for the whole fault tree to be solved. Voila.

I hope this has been an education for you, and that it is understandable. Right, I need a poo. Ta-ra.

P.S. Next up - Phased Missions. Fun.